Abstract
The diagnosis and prognosis of cancer are among the more critical challenges that modern medicine confronts. In this sense, personalized medicine aims to use data from heterogeneous sources to estimate the evolution of the disease for each specific patient in order to fit the more appropriate treatments. In recent years, DNA sequencing data have boosted cancer prediction and treatment by supplying genetic information that has been used to design genetic signatures or biomarkers that led to a better classification of the different subtypes of cancer as well as to a better estimation of the evolution of the disease and the response to diverse treatments. Several machine learning models have been proposed in the literature for cancer prediction. However, the efficacy of these models can be seriously affected by the existing imbalance between the high dimensionality of the gene expression feature sets and the number of samples available, what is known as the curse of dimensionality. Although linear predictive models could give worse performance rates when compared to more sophisticated non-linear models, they have the main advantage of being interpretable. However, the use of domain-specific information has been proved useful to boost the performance of multivariate linear predictors in high dimensional settings. In this work, we design a set of linear predictive models that incorporate domain-specific information from genetic pathways for effective feature selection. By combining these linear model with other classical machine learning models, we get state-of-art performance rates in the prediction of vital status on a public cancer dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aronson, S.J., Rehm, H.L.: Building the foundation for genomics in precision medicine. Nature 526(7573), 336–342 (2015)
Kourou, K., Exarchos, T.P., Exarchos, K.P., Karamouzis, M.V., Fotiadis, D.I.: Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J. 13, 8–17 (2015)
Bashiri, A., Ghazisaeedi, M., Safdari, R., Shahmoradi, L., Ehtesham, H.: Improving the prediction of survival in cancer patients by using machine learning techniques: experience of gene expression data: a narrative review. Iran. J. Public Health 46(2), 165–172 (2017)
Johnstone, I.M., Titterington, D.M.: Statistical challenges of high-dimensional data. Philos. Trans. A Math. Phys. Eng. Sci. 367(1906), 4237–4253 (2009)
Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)
van’t Veer, L.J., et al.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871), 530–536 (2002)
Simon, N., Friedman, J., Hastie, T., Tibshirani, R.: A sparse-group lasso. J. Comput. Graph. Stat. 22(2), 231–245 (2013)
Urda, D., Jerez, J.M., Turias, I.J.: Data dimension and structure effects in predictive performance of deep neural networks. In: New Trends in Intelligent Software Methodologies, Tools and Techniques, pp. 361–372 (2018)
Kanehisa, M., Goto, S.: KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28(1), 27–30 (2000)
Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M., Tanabe, M.: KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44(D1), D457–D462 (2016)
Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y., Morishima, K.: KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 45(D1), D353–D361 (2017)
Tibshirani, R.: Regression shrinkage and selection via the lasso: a retrospective. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 58(1), 267–288 (1996)
Meier, L., Van De Geer, S., Bühlmann, P.: The group lasso for logistic regression. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 70(1), 53–71 (2008)
Jacob, L., Obozinski, G., Vert, J.P.: Group lasso with overlap and graph lasso. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 433–440 (2009)
Zeng, Y., Breheny, P.: Overlapping group logistic regression with applications to genetic pathway selection. Cancer Inf. 15(1), 179–187 (2016)
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theor. 13(1), 21–27 (2006)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Rossi, F., Villa, N.: Support vector machine for functional data classification. Neurocomputing 69(7), 730–742 (2006)
Bischl, B., et al.: mlr: machine learning in R. J. Mach. Learn. Res. 17(170), 1–5 (2016)
Li, B., Dewey, C.N.: RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinf. 12(1), 323 (2011)
Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, IJCAI 1995, vol. 2, pp. 1137–1143 (1995)
Bischl, B., Richter, J., Bossek, J., Horn, D., Thomas, J., Lang, M.: mlrMBO: A Modular Framework for Model-Based Optimization of Expensive Black-Box Functions (2017)
Dees, N.D., et al.: MuSiC: identifying mutational significance in cancer genomes. Genome Res. 8, 1589–98 (2012)
Shimomura, A., et al.: Novel combination of serum microrna for detecting breast cancer in the early stage. Cancer Sci. 107(3), 326–334 (2016)
Zhao, H., Shen, J., Medico, L., Wang, D., Ambrosone, C.B., Liu, S.: A pilot study of circulating miRNAs as potential biomarkers of early stage breast cancer. PLoS ONE 5(10), 1–12 (2010)
Chen, G.Q., Zhao, Z.W., Zhou, H.Y., Liu, Y.J., Yang, H.J.: Systematic analysis of microRNA involved in resistance of the MCF-7 human breast cancer cell to doxorubicin. Med. Oncol. 27(2), 406–415 (2010)
Acknowledgments
This work is part of the coordinated research projects TIN2014-58516-C2-1-R, TIN2014-58516-C2-2-R and TIN2017-88728-C2 from MINECO-SPAIN which include FEDER funds.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Urda, D., Veredas, F.J., Turias, I., Franco, L. (2019). Addition of Pathway-Based Information to Improve Predictions in Transcriptomics. In: Rojas, I., Valenzuela, O., Rojas, F., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2019. Lecture Notes in Computer Science(), vol 11466. Springer, Cham. https://doi.org/10.1007/978-3-030-17935-9_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-17935-9_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-17934-2
Online ISBN: 978-3-030-17935-9
eBook Packages: Computer ScienceComputer Science (R0)