Abstract
The adaptive immune response relies on the ability of T-cell receptors (TCRs) to recognize specific antigens. The vast diversity of TCRs allows T-cells to recognize a broad spectrum of antigens, but this complexity also poses challenges for understanding and predicting TCR-antigen binding specificity. Despite the development of various machine learning and deep learning methods for prediction and clustering, there remains a need for a versatile and effective TCR language framework that can be flexibly applied to various downstream tasks, including sequence generation. Here we present TDLM, a T-cell Receptor (TCR) diffusion language model, designed to decode complex patterns within TCR sequences and apply them across various downstream tasks. Firstly, TDLM can be trained on unlabeled TCR sequence data, enabling it to utilize vast datasets to generate comprehensive embeddings. When compared to other embedding methods, TDLM embeddings enhance TCR-antigen binding prediction accuracy and enable effective TCR sequence clustering and similarity analysis, helping identify TCRs with shared antigen specificity. Furthermore, as a diffusion-based generative model, TDLM can generate highly diverse and specific TCR sequences. This ability is invaluable for the rapid screening and optimization of TCRs with target antigen specificities, offering significant potential in disease diagnosis, personalized immunotherapy, and vaccine research. The code is available at: https://github.com/skybluewhy/TDLM
| Original language | English |
|---|---|
| Title of host publication | Proceedings - 2024 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2024 |
| Editors | Mario Cannataro, Huiru Zheng, Lin Gao, Jianlin Cheng, Joao Luis de Miranda, Ester Zumpano, Xiaohua Hu, Young-Rae Cho, Taesung Park |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 113-120 |
| Number of pages | 8 |
| ISBN (Electronic) | 9798350386226 |
| DOIs | |
| State | Published - 2024 |
| Externally published | Yes |
| Event | 2024 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2024 - Lisbon, Portugal Duration: 3 Dec 2024 → 6 Dec 2024 |
Publication series
| Name | Proceedings - 2024 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2024 |
|---|
Conference
| Conference | 2024 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2024 |
|---|---|
| Country/Territory | Portugal |
| City | Lisbon |
| Period | 3/12/24 → 6/12/24 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
Keywords
- TCR clustering
- TCR sequence generation
- TCR-antigen binding prediction
- diffusion generative model
- immunotherapy
Fingerprint
Dive into the research topics of 'TDLM: A Diffusion Language Model for TCR Sequence Exploration and Generation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver