Generating interacting protein sequences using domain-to-domain translation.

Barthelemy Meynard-Piganeau, Caterina Fabbri, Martin Weigt, Andrea Pagnani, Christoph Feinauer

Bioinformatics (Oxford, England) 2023 Jul 01

filter terms:

processes

amino acid sequences (2)

protein domains (5)

target proteins (1)

Sizes of these terms reflect their relevance to your search.

Being able to artificially design novel proteins of desired function is pivotal in many biological and biomedical applications. Generative statistical modeling has recently emerged as a new paradigm for designing amino acid sequences, including in particular models and embedding methods borrowed from natural language processing (NLP). However, most approaches target single proteins or protein domains, and do not take into account any functional specificity or interaction with the context. To extend beyond current computational strategies, we develop a method for generating protein domain sequences intended to interact with another protein domain. Using data from natural multidomain proteins, we cast the problem as a translation problem from a given interactor domain to the new domain to be generated, i.e. we generate artificial partner sequences conditional on an input sequence. We also show in an example that the same procedure can be applied to interactions between distinct proteins. Evaluating our model's quality using diverse metrics, in part related to distinct biological questions, we show that our method outperforms state-of-the-art shallow autoregressive strategies. We also explore the possibility of fine-tuning pretrained large language models for the same task and of using Alphafold 2 for assessing the quality of sampled sequences. Data and code on https://github.com/barthelemymp/Domain2DomainProteinTranslation. © The Author(s) 2023. Published by Oxford University Press.

Citation

Barthelemy Meynard-Piganeau, Caterina Fabbri, Martin Weigt, Andrea Pagnani, Christoph Feinauer. Generating interacting protein sequences using domain-to-domain translation. Bioinformatics (Oxford, England). 2023 Jul 01;39(7)

Mesh Tags

Substances

PMID: 37399105

View Full Text

FAQ

Generating interacting protein sequences using domain-to-domain translation.

filter terms:

Citation

var meshTagsSectionCollapsed = true; Mesh Tags

var substancesSectionCollapsed = true; Substances

Mesh Tags

Substances