Yuta Nagano, Andrew G T Pyo, Martina Milighetti, James Henderson, John Shawe-Taylor, Benny Chain, Andreas Tiffeau-Mayer
Cell systems 2025 Jan 15Computational prediction of the interaction of T cell receptors (TCRs) and their ligands is a grand challenge in immunology. Despite advances in high-throughput assays, specificity-labeled TCR data remain sparse. In other domains, the pre-training of language models on unlabeled data has been successfully used to address data bottlenecks. However, it is unclear how to best pre-train protein language models for TCR specificity prediction. Here, we introduce a TCR language model called SCEPTR (simple contrastive embedding of the primary sequence of T cell receptors), which is capable of data-efficient transfer learning. Through our model, we introduce a pre-training strategy combining autocontrastive learning and masked-language modeling, which enables SCEPTR to achieve its state-of-the-art performance. In contrast, existing protein language models and a variant of SCEPTR pre-trained without autocontrastive learning are outperformed by sequence alignment-based methods. We anticipate that contrastive learning will be a useful paradigm to decode the rules of TCR specificity. A record of this paper's transparent peer review process is included in the supplemental information. Copyright © 2024 The Author(s). Published by Elsevier Inc. All rights reserved.
Yuta Nagano, Andrew G T Pyo, Martina Milighetti, James Henderson, John Shawe-Taylor, Benny Chain, Andreas Tiffeau-Mayer. Contrastive learning of T cell receptor representations. Cell systems. 2025 Jan 15;16(1):101165
PMID: 39778580
View Full Text