Abstract
The diagnosis of cancer is presently undergoing a change of paradigm for the diagnostic panel using molecular biomarkers. MicroRNA (miRNA) is one of the most important genomic datasets presenting the genome sequences. Since several studies have shown the relationship between miRNAs and cancers, data mining and machine learning methods can be incorporated to extract a large amount of knowledge from cancer genomic datasets. However, previous research works on the identification of cancers from miRNAs have made it possible to diagnose cancer, and the accuracy of some classes is not quite satisfactory. Therefore, this research is aimed at promoting a super-class (meta-label) approach and deep learning in a three-phase method to diagnose cancers from miRNAs. The steps in the first phase of the proposed method, named Representation learning, are partitioning data into super-classes, meta-data creation and super-classes classification. This phase helps data to be split into some subsets to improve classification accuracy. In other words, the first phase groups labels based on the separability of classes into a meta-label, and then a multi-label learner is built to predict these meta-labels. In the second phase, a feature selection to reduce the dimensions of the problem is applied to each super-class to help to focus the attention of an induction algorithm in those features that are more important to predict the target concept. In the third phase of the proposed method, an evolutionary deep neural network for the classification of labels in each super-class is performed. The last two phases are done separately for each subset in which five super-classes and subsequently five deep neural networks are trained. The experimental results reveal that the proposed method achieved more efficient results than 19 recent machine learning methods. Despite the fact that evaluating the dataset which consists of 29 types of cancers provides a more complicated situation for the convolutional neural network to be learned, the performance of the method is noticeably better than other existing methods. The other success which can be considered here is a significant reduction in running time comparing to other methods.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abdel-Basset M et al (2018) A hybrid whale optimization algorithm based on local search strategy for the permutation flow shop scheduling problem. Future Gener Comput Syst 85:129–145
Abualigah LM et al (2016) A krill herd algorithm for efficient text documents clustering. In: IEEE symposium on computer applications and industrial electronics (ISCAIE). IEEE
Abualigah LM et al (2017) β-hill climbing technique for the text document clustering. In: New Trends in Information Technology (NTIT)–2017, p 60
Abualigah LMQ (2019) Feature selection and enhanced krill herd algorithm for text document clustering. Springer, Berlin
Abualigah L (2020) Multi-verse optimizer algorithm: a comprehensive survey of its results, variants, and applications. Neural Comput Appl 32:12381–12401
Abualigah LMQ, Hanandeh ES (2015) Applying genetic algorithms to information retrieval using vector space model. Int J Comput Sci Eng Appl 5(1):19
Abualigah LM, Khader AT, Hanandeh ES (2018) A novel weighting scheme applied to improve the text document clustering techniques. In: Innovative computing, optimization and its applications. Springer, Berlin, pp 305–320
Aghdam HH, Heravi EJ (2017) Guide to convolutional neural networks, vol 10. Springer, New York, pp 978–983
Alevizos I, Illei GG (2010) MicroRNAs as biomarkers in rheumatic diseases. Nat Rev Rheumatol 6(7):391
Altman NS (1992) An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat 46(3):175–185
Aydilek IB (2018) A hybrid firefly and particle swarm optimization algorithm for computationally expensive numerical problems. Appl Soft Comput 66:232–249
Barger JF, Nana-Sinkam SP (2015) MicroRNA as tools and therapeutics in lung cancer. Respir Med 109(7):803–812
Bartel DP (2004) MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116(2):281–297
Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin
Bolón-Canedo V, Sánchez-Marono N, Alonso-Betanzos A (2014) Data classification using an ensemble of filters. Neurocomputing 135:13–20
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual workshop on computational learning theory. ACM
Breiman L (1999) Pasting small votes for classification in large databases and on-line. Mach Learn 36(1–2):85–103
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Breiman L et al (1984) Classification and regression trees. CRC Press, Boca Raton
Brown TA (2007) Genomes 3. Garland Science Pub., New York
Chen X et al (2018) Novel human miRNA-disease association inference based on random forest. Mol Therapy Nucleic Acids 13:568–579
Chin Y-H et al (2017) Music emotion recognition using PSO-based fuzzy hyper-rectangular composite neural networks. IET Signal Process 11(7):884–891
Cox DR (1958) The regression analysis of binary sequences. J R Stat Soc Ser B (Methodol) 20(2):215–232
Crammer K et al (2006) Online passive-aggressive algorithms. J Mach Learn Res 7(Mar):551–585
Edgar JR (2016) Q&A: what are exosomes, exactly? BMC Biol 14(1):46
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232
Fujino A, Isozaki H, Suzuki J (2008) Multi-label text categorization with model combination based on f1-score maximization. In: Proceedings of the third international joint conference on natural language processing, vol II
Garzelli A, Capobianco L, Nencini F (2008) Fusion of multispectral and panchromatic images as an optimisation problem. In: Image fusion, p 223
Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42
Ghasemzadeh A, Azad SS, Esmaeili E (2019) Breast cancer detection based on Gabor-wavelet transform and machine learning methods. Int J Mach Learn Cybern 10(7):1603–1612
Han S et al (2018) Optimizing filter size in convolutional neural networks for facial action unit recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Hastie T et al (2009) Multi-class adaboost. Stat Interface 2(3):349–360
Hearst MA et al (1998) Support vector machines. IEEE Intell Syst Appl 13(4):18–28
Ho TK, Basu M (2000) Measuring the complexity of classification problems. In: Proceedings 15th international conference on pattern recognition, ICPR-2000. IEEE
Holland JH (1992) Adaptation in natural and artificial systems. MIT Press, Cambridge
Hubel DH, Wiesel TN (1959) Receptive fields of single neurones in the cat’s striate cortex. J Physiol 148(3):574–591
Hubel D, Wiesel T (1960) Receptive fields of optic nerve fibres in the spider monkey. J Physiol 154(3):572–580
Javaid N et al (2017) A hybrid genetic wind driven heuristic optimization algorithm for demand side management in smart grid. Energies 10(3):319
Jovanovic M et al (2010) A quantitative targeted proteomics approach to validate predicted microRNA targets in C. elegans. Nat Methods 7(10):837–842
Karaboga D, Akay B (2009) A comparative study of artificial bee colony algorithm. Appl Math Comput 214(1):108–132
Lewis DP, Jebara T, Noble WS (2006) Support vector machine learning from heterogeneous data: an empirical analysis using protein sequence and structure. Bioinformatics 22(22):2753–2760
Liu X-Q et al (2019) Prediction of long non-coding RNAs based on deep learning. Genes 10(4):273
Lopez-Rincon A et al (2018) Evolutionary optimization of convolutional neural networks for cancer miRNA biomarkers classification. Appl Soft Comput 65:91–100
Lu J et al (2005) MicroRNA expression profiles classify human cancers. Nature 435(7043):834
Mirjalili S, Mirjalili SM, Hatamlou A (2016) Multi-verse optimizer: a nature-inspired algorithm for global optimization. Neural Comput Appl 27(2):495–513
Montavon G, Braun ML, MÃŧller K-R (2011) Kernel analysis of deep networks. J Mach Learn Res 12(Sep):2563–2581
Morán-Fernández L, Bolón-Canedo V, Alonso-Betanzos A (2017) Centralized vs. distributed feature selection methods based on data complexity measures. Knowl Based Syst 117:27–45
Öztürk Ş et al (2018) Convolution kernel size effect on convolutional neural network in histopathological image processing applications. In: International symposium on fundamentals of electrical engineering (ISFEE). IEEE
Pedregosa F et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12(Oct):2825–2830
Peralta D et al (2015) Evolutionary feature selection for big data classification: a MapReduce approach. Math Probl Eng. https://doi.org/10.1155/2015/246139
Pian C et al (2020) Discovering cancer-related miRNAs from miRNA-target interactions by support vector machines. Mol Therapy Nucleic Acids 19:1423–1433
Potharaju SP, Sreedevi M (2019) Distributed feature selection (DFS) strategy for microarray gene expression data to improve the classification performance. Clin Epidemiol Glob Health 7(2):171–176
Rajabioun R (2011) Cuckoo optimization algorithm. Appl Soft Comput 11(8):5508–5518
Sabzehzari M, Naghavi M (2018) Phyto-miRNA: a molecule with beneficial abilities for plant biotechnology. Gene 683:28–34
Salem H, Attiya G, El-Fishawy N (2017) Classification of human cancer diseases by gene expression profiles. Appl Soft Comput 50:124–134
Sarbazi-Azad S, Abadeh MS (2018) Gene selection for cancer classification from microarray data using data overlap measure. In: 25th National and 3rd international Iranian conference on biomedical engineering (ICBME). IEEE
Sarbazi-Azad S, Abadeh MS, Abadi MIN (2018) Feature selection in microarray gene expression data using fisher discriminant ratio. In: 8th International conference on computer and knowledge engineering (ICCKE). IEEE
Scherer D, Müller A, Behnke S (2010) Evaluation of pooling operations in convolutional architectures for object recognition. In: International conference on artificial neural networks. Springer, Berlin
Sherafatian M (2018) Tree-based machine learning algorithms identified minimal set of miRNA biomarkers for breast cancer diagnosis and molecular subtyping. Gene 677:111–118
Soon FC et al (2017) Hyper-parameters optimisation of deep CNN architecture for vehicle logo recognition. IET Intell Trans Syst 12(8):939–946
Tibshirani R et al (2002) Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci 99(10):6567–6572
Tikhonov AN (1943) The stability of inverse problems. Dokl Akad Nauk SSSR 39:195–198
Torres R, Judson-Torres RL (2019) Research techniques made simple: feature selection for biomarker discovery. J Investig Dermatol 139(10):2068–2074
Vasudevan S, Tong Y, Steitz JA (2007) Switching from repression to activation: microRNAs can up-regulate translation. Science 318(5858):1931–1934
Wang Y, Zhang H, Zhang G (2019) cPSO-CNN: an efficient PSO-based algorithm for fine-tuning hyper-parameters of convolutional neural networks. Swarm Evol Comput 49:114–123
Yang X-S (2012) Flower pollination algorithm for global optimization. In: International conference on unconventional computing and natural computation. Springer, Berlin
Ye Z, Sun B, Xiao Z (2020) Machine learning identifies 10 feature miRNAs for Lung squamous cell carcinoma. Gene 749:144669
Yoon S et al (2019) Biclustering analysis of transcriptome big data identifies condition-specific microRNA targets. Nucleic Acids Res 47:e53
Young SR et al (2015) Optimizing deep learning hyper-parameters through an evolutionary algorithm. In: Proceedings of the workshop on machine learning in high-performance computing environments
Zhang Y-H et al (2020) Identifying circulating miRNA biomarkers for early diagnosis and monitoring of lung cancer. Biochim Biophys Acta (BBA) Mol Basis Dis 1866:165847
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest regarding this manuscript.
Additional information
Communicated by V. Loia.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Bagheri Khoulenjani, N., Saniee Abadeh, M., Sarbazi-Azad, S. et al. Cancer miRNA biomarkers classification using a new representation algorithm and evolutionary deep learning. Soft Comput 25, 3113–3129 (2021). https://doi.org/10.1007/s00500-020-05366-w
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-020-05366-w