Skip to main navigation Skip to search Skip to main content

Identification of Type VI Effector Proteins Using a Novel Ensemble Classifier

  • Chunyu Wang
  • , Jialin Li
  • , Ying Zhang*
  • , Maozu Guo
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

The type VI secretion system (T6SS) delivers effector proteins (Type VI secretion system effectors, termed T6SEs) into neighboring target cells. Many human pathogens express T6SEs, including Vibrio cholera, Burkholderia spp., and Pseudomonas aeruginosa. T6SEs play vital roles in the competitive survival and pathogenesis of bacterial populations. Several machine-learning methods are able to distinguish T6SEs from non-T6SEs. However, we believe there is room for further development. Therefore, herein we propose a more powerful ensemble predictor for identifying T6SEs. Initially, we construct a benchmark dataset from existing studies and databases. Then we use k-separated-bigrams-PSSM (a type of feature encoding) to convert the protein sequences to mathematical vectors. A synthetic minority oversampling technique (SMOTE) is next employed to solve the training data imbalance problem. Finally, we employ a soft voting strategy to construct an integrated model combining six fine-tuned base classifiers. The model we propose performs excellently in terms of accuracy (ACC, 99.0%), Matthew's correlation coefficient (MCC, 97.8%), sensitivity (SN, 97.1%), and specificity (SP, 100%) in independent testing.

Original languageEnglish
Article number9055359
Pages (from-to)75085-75093
Number of pages9
JournalIEEE Access
Volume8
DOIs
StatePublished - 2020
Externally publishedYes

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Keywords

  • SMOTE
  • T6SE
  • classification
  • ensemble predictor
  • k-separated-bigrams-PSSM

Fingerprint

Dive into the research topics of 'Identification of Type VI Effector Proteins Using a Novel Ensemble Classifier'. Together they form a unique fingerprint.

Cite this