Diagnostic Classification of Lung Cancer Using Deep Transfer Learning Technology and Multi-Omics Data
-
Graphical Abstract
-
Abstract
In recent years, with the increasing application of highthroughput sequencing technology, researchers have obtained and accumulated a large amount of multi-omics data, making it possible to diagnose cancer at the gene expression level. The proliferation of various omics data can provide a large amount of biological information, which brings new opportunities and great challenges as well to cancer classification and diagnosis. Machine learning algorithms for early diagnosis of lung cancer have emerged that distinguish cancers of the early and late stages by using genomic features. Omics data are generally characterized with low sample size, high dimensionality and high noise. Therefore, simple direct application of common classification methods cannot achieve better performance and must be improved in a targeted manner. This paper puts forward a combined convolutional neural network and convolutional autoencoders approach to construct a deep migratory learning classification model for early lung cancer diagnosis. First, the convolutional auto-encoders algorithm is used to reduce the dimensionality of the dataset in order to make it better meet the requirements of migration learning. Second, a neural network model is constructed with the original dataset and the existing labeled dataset, and the model migration rules are set as well. Finally, a small number of labeled target datasets are used in the training to complete the construction of the classification model. The proposed convolutional neural network method based on model migration and five other popular machine learning models are used to classify and predict the three lung cancer gene datasets and the integrated dataset. The experimental results show that such four evaluation metrics as accuracy, precision, recall, and f1-score with our proposed method have obtained better prediction performance, and the average area under curve result also shows our proposed method is optimal.
-
-