Skip to main navigation Skip to search Skip to main content

TDLM: A Diffusion Language Model for TCR Sequence Exploration and Generation

  • Faculty of Computing, Harbin Institute of Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The adaptive immune response relies on the ability of T-cell receptors (TCRs) to recognize specific antigens. The vast diversity of TCRs allows T-cells to recognize a broad spectrum of antigens, but this complexity also poses challenges for understanding and predicting TCR-antigen binding specificity. Despite the development of various machine learning and deep learning methods for prediction and clustering, there remains a need for a versatile and effective TCR language framework that can be flexibly applied to various downstream tasks, including sequence generation. Here we present TDLM, a T-cell Receptor (TCR) diffusion language model, designed to decode complex patterns within TCR sequences and apply them across various downstream tasks. Firstly, TDLM can be trained on unlabeled TCR sequence data, enabling it to utilize vast datasets to generate comprehensive embeddings. When compared to other embedding methods, TDLM embeddings enhance TCR-antigen binding prediction accuracy and enable effective TCR sequence clustering and similarity analysis, helping identify TCRs with shared antigen specificity. Furthermore, as a diffusion-based generative model, TDLM can generate highly diverse and specific TCR sequences. This ability is invaluable for the rapid screening and optimization of TCRs with target antigen specificities, offering significant potential in disease diagnosis, personalized immunotherapy, and vaccine research. The code is available at: https://github.com/skybluewhy/TDLM

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2024
EditorsMario Cannataro, Huiru Zheng, Lin Gao, Jianlin Cheng, Joao Luis de Miranda, Ester Zumpano, Xiaohua Hu, Young-Rae Cho, Taesung Park
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages113-120
Number of pages8
ISBN (Electronic)9798350386226
DOIs
StatePublished - 2024
Externally publishedYes
Event2024 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2024 - Lisbon, Portugal
Duration: 3 Dec 20246 Dec 2024

Publication series

NameProceedings - 2024 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2024

Conference

Conference2024 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2024
Country/TerritoryPortugal
CityLisbon
Period3/12/246/12/24

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Keywords

  • TCR clustering
  • TCR sequence generation
  • TCR-antigen binding prediction
  • diffusion generative model
  • immunotherapy

Fingerprint

Dive into the research topics of 'TDLM: A Diffusion Language Model for TCR Sequence Exploration and Generation'. Together they form a unique fingerprint.

Cite this