Skip to main navigation Skip to search Skip to main content

ILMCNet: A Deep Neural Network Model That Uses PLM to Process Features and Employs CRF to Predict Protein Secondary Structure

  • Benzhi Dong
  • , Hui Su
  • , Dali Xu
  • , Chang Hou
  • , Zheng Liu
  • , Na Niu
  • , Guohua Wang*
  • *Corresponding author for this work
  • Northeast Forestry University

Research output: Contribution to journalArticlepeer-review

Abstract

Background: Protein secondary structure prediction (PSSP) is a critical task in computational biology, pivotal for understanding protein function and advancing medical diagnostics. Recently, approaches that integrate multiple amino acid sequence features have gained significant attention in PSSP research. Objectives: We aim to automatically extract additional features represented by evolutionary information from a large number of sequences while simultaneously incorporating positional information for more comprehensive sequence features. Additionally, we consider the interdependence between secondary structures during the prediction stage. Methods: To this end, we propose a deep neural network model, ILMCNet, which utilizes a language model and Conditional Random Field (CRF). Protein language models (PLMs) pre-trained on sequences from multiple large databases can provide sequence features that incorporate evolutionary information. ILMCNet uses positional encoding to ensure that the input features include positional information. To better utilize these features, we propose a hybrid network architecture that employs a Transformer Encoder to enhance features and integrates a feature extraction module combining a Convolutional Neural Network (CNN) with a Bidirectional Long Short-Term Memory Network (BiLSTM). This design enables deep extraction of localized features while capturing global bidirectional information. In the prediction stage, ILMCNet employs CRF to capture the interdependencies between secondary structures. Results: Experimental results on benchmark datasets such as CB513, TS115, NEW364, CASP11, and CASP12 demonstrate that the prediction performance of our method surpasses that of comparable approaches. Conclusions: This study proposes a new approach to PSSP research and is expected to play an important role in other protein-related research fields, such as protein tertiary structure prediction.

Original languageEnglish
Article number1350
JournalGenes
Volume15
Issue number10
DOIs
StatePublished - Oct 2024
Externally publishedYes

Keywords

  • conditional random field
  • hybrid neural network architecture
  • protein secondary structure
  • unsupervised protein language model

Fingerprint

Dive into the research topics of 'ILMCNet: A Deep Neural Network Model That Uses PLM to Process Features and Employs CRF to Predict Protein Secondary Structure'. Together they form a unique fingerprint.

Cite this