TY - GEN
T1 - GONET
T2 - 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020
AU - Li, Junyi
AU - Wang, Lixin
AU - Zhang, Xiaoshuai
AU - Liu, Bo
AU - Wang, Yadong
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/12/16
Y1 - 2020/12/16
N2 - Finding out the functions of protein in life activities precisely is nontrivial, which is the core of current proteomics research. Gene Ontology standardizes the function of protein into a series of GO terms, each of which belongs to exactly one of the three subontologies: Biological Process (BP), Cellular Component (CC), and Molecular Function (MF). The prediction of protein function can be considered as a multi-label classification problem. Traditional methods often spend a lot of costs to extract handcrafted features and plenty of domain knowledge is needed when solving these tasks, while using deep learning technology can overcome these shortcomings. Here, we propose a deep model GONET based on recurrent convolutional neural networks, which annotates protein in an end-to-end manner. Our model combines protein sequences and protein-protein interaction (PPI) network data, and utilizes representation learning to learn distributed representation of proteins to overcome the sparse nature and semantic independence problem. Moreover, we adopt a quite deep CNNRNN-Attention model, which is able to effectively extract high-order features of protein sequences. We have carried out experiments on several datasets, which achieve the state-of-the-art in some metrics compared with the existing competitive methods.
AB - Finding out the functions of protein in life activities precisely is nontrivial, which is the core of current proteomics research. Gene Ontology standardizes the function of protein into a series of GO terms, each of which belongs to exactly one of the three subontologies: Biological Process (BP), Cellular Component (CC), and Molecular Function (MF). The prediction of protein function can be considered as a multi-label classification problem. Traditional methods often spend a lot of costs to extract handcrafted features and plenty of domain knowledge is needed when solving these tasks, while using deep learning technology can overcome these shortcomings. Here, we propose a deep model GONET based on recurrent convolutional neural networks, which annotates protein in an end-to-end manner. Our model combines protein sequences and protein-protein interaction (PPI) network data, and utilizes representation learning to learn distributed representation of proteins to overcome the sparse nature and semantic independence problem. Moreover, we adopt a quite deep CNNRNN-Attention model, which is able to effectively extract high-order features of protein sequences. We have carried out experiments on several datasets, which achieve the state-of-the-art in some metrics compared with the existing competitive methods.
KW - Gene Ontology
KW - protein function prediction
KW - recurrent convolutional neural networks
KW - representation learning
UR - https://www.scopus.com/pages/publications/85100354980
U2 - 10.1109/BIBM49941.2020.9313235
DO - 10.1109/BIBM49941.2020.9313235
M3 - 会议稿件
AN - SCOPUS:85100354980
T3 - Proceedings - 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020
SP - 29
EP - 34
BT - Proceedings - 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020
A2 - Park, Taesung
A2 - Cho, Young-Rae
A2 - Hu, Xiaohua Tony
A2 - Yoo, Illhoi
A2 - Woo, Hyun Goo
A2 - Wang, Jianxin
A2 - Facelli, Julio
A2 - Nam, Seungyoon
A2 - Kang, Mingon
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 16 December 2020 through 19 December 2020
ER -