Skip to main navigation Skip to search Skip to main content

Mal-Prec: computational prediction of protein Malonylation sites via machine learning based feature integration: Malonylation site prediction

  • Xin Liu*
  • , Liang Wang
  • , Jian Li
  • , Junfeng Hu
  • , Xiao Zhang*
  • *Corresponding author for this work
  • Xuzhou Medical University
  • Tulane University

Research output: Contribution to journalArticlepeer-review

Abstract

Background: Malonylation is a recently discovered post-translational modification that is associated with a variety of diseases such as Type 2 Diabetes Mellitus and different types of cancers. Compared with experimental identification of malonylation sites, computational method is a time-effective process with comparatively low costs. Results: In this study, we proposed a novel computational model called Mal-Prec (Malonylation Prediction) for malonylation site prediction through the combination of Principal Component Analysis and Support Vector Machine. One-hot encoding, physio-chemical properties, and composition of k-spaced acid pairs were initially performed to extract sequence features. PCA was then applied to select optimal feature subsets while SVM was adopted to predict malonylation sites. Five-fold cross-validation results showed that Mal-Prec can achieve better prediction performance compared with other approaches. AUC (area under the receiver operating characteristic curves) analysis achieved 96.47 and 90.72% on 5-fold cross-validation of independent data sets, respectively. Conclusion: Mal-Prec is a computationally reliable method for identifying malonylation sites in protein sequences. It outperforms existing prediction tools and can serve as a useful tool for identifying and discovering novel malonylation sites in human proteins. Mal-Prec is coded in MATLAB and is publicly available at https://github.com/flyinsky6/Mal-Prec, together with the data sets used in this study.

Original languageEnglish
Article number812
JournalBMC Genomics
Volume21
Issue number1
DOIs
StatePublished - Dec 2020
Externally publishedYes

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Keywords

  • Machine learning
  • Malonylation
  • Post-translational modification
  • Principal component analysis
  • Support vector machine

Fingerprint

Dive into the research topics of 'Mal-Prec: computational prediction of protein Malonylation sites via machine learning based feature integration: Malonylation site prediction'. Together they form a unique fingerprint.

Cite this