Protein Function Prediction Based on Active Semi-supervised Learning

WANG Xuesong; CHENG Yuhu; LI Lijing

doi:10.1049/cje.2016.07.005

WANG Xuesong, CHENG Yuhu, LI Lijing. Protein Function Prediction Based on Active Semi-supervised Learning[J]. Chinese Journal of Electronics, 2016, 25(4): 595-600. DOI: 10.1049/cje.2016.07.005

Citation:

WANG Xuesong, CHENG Yuhu, LI Lijing. Protein Function Prediction Based on Active Semi-supervised Learning[J]. Chinese Journal of Electronics, 2016, 25(4): 595-600. DOI: 10.1049/cje.2016.07.005

Citation:

WANG Xuesong, CHENG Yuhu, LI Lijing. Protein Function Prediction Based on Active Semi-supervised Learning[J]. Chinese Journal of Electronics, 2016, 25(4): 595-600. DOI: 10.1049/cje.2016.07.005

Protein Function Prediction Based on Active Semi-supervised Learning

Graphical Abstract

Graphical Abstract

Abstract

Abstract

In our study, the active learning and semi-supervised learning methods are comprehensively used for label delivery of proteins with known functions in Protein-protein interaction (PPI) network so as to predict the functions of unknown proteins. Because the real PPI network is generally observed with overlapping protein nodes with multiple functions, the mislabeling of overlapping protein may result in accumulation of prediction errors. For this reason, prior to executing the label delivery process of semi-supervised learning, the adjacency matrix is used to detect overlapping proteins. As the topological structure description of interactive relation between proteins, PPI network is observed with party hub protein nodes that play an important role, in co-expression with its neighborhood. Therefore, to reduce the manual labeling cost, party hub proteins most beneficial for improvement of prediction accuracy are selected for class labeling and the labeled party hub proteins are added into the labeled sample set for semi-supervised learning later. As the experimental results of real yeast PPI network show, the proposed algorithm can achieve high prediction accuracy with few labeled samples.