Addition of Pathway-Based Information to Improve Predictions in Transcriptomics

Daniel Urda¹⁸,
Francisco J. Veredas¹⁹,
Ignacio Turias¹⁸ &
…
Leonardo Franco¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 11466))

Included in the following conference series:

International Work-Conference on Bioinformatics and Biomedical Engineering

1213 Accesses

Abstract

The diagnosis and prognosis of cancer are among the more critical challenges that modern medicine confronts. In this sense, personalized medicine aims to use data from heterogeneous sources to estimate the evolution of the disease for each specific patient in order to fit the more appropriate treatments. In recent years, DNA sequencing data have boosted cancer prediction and treatment by supplying genetic information that has been used to design genetic signatures or biomarkers that led to a better classification of the different subtypes of cancer as well as to a better estimation of the evolution of the disease and the response to diverse treatments. Several machine learning models have been proposed in the literature for cancer prediction. However, the efficacy of these models can be seriously affected by the existing imbalance between the high dimensionality of the gene expression feature sets and the number of samples available, what is known as the curse of dimensionality. Although linear predictive models could give worse performance rates when compared to more sophisticated non-linear models, they have the main advantage of being interpretable. However, the use of domain-specific information has been proved useful to boost the performance of multivariate linear predictors in high dimensional settings. In this work, we design a set of linear predictive models that incorporate domain-specific information from genetic pathways for effective feature selection. By combining these linear model with other classical machine learning models, we get state-of-art performance rates in the prediction of vital status on a public cancer dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Cancer gene expression profiles associated with clinical outcomes to chemotherapy treatments

Article Open access 18 September 2020

Identifying Cancer Biomarkers from High-Throughput RNA Sequencing Data by Machine Learning

A comparative study of survival models for breast cancer prognostication revisited: the benefits of multi-gene models

Article Open access 03 November 2018

Notes

References

Aronson, S.J., Rehm, H.L.: Building the foundation for genomics in precision medicine. Nature 526(7573), 336–342 (2015)
Article Google Scholar
Kourou, K., Exarchos, T.P., Exarchos, K.P., Karamouzis, M.V., Fotiadis, D.I.: Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J. 13, 8–17 (2015)
Article Google Scholar
Bashiri, A., Ghazisaeedi, M., Safdari, R., Shahmoradi, L., Ehtesham, H.: Improving the prediction of survival in cancer patients by using machine learning techniques: experience of gene expression data: a narrative review. Iran. J. Public Health 46(2), 165–172 (2017)
Google Scholar
Johnstone, I.M., Titterington, D.M.: Statistical challenges of high-dimensional data. Philos. Trans. A Math. Phys. Eng. Sci. 367(1906), 4237–4253 (2009)
Article MathSciNet Google Scholar
Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)
Article Google Scholar
van’t Veer, L.J., et al.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871), 530–536 (2002)
Article Google Scholar
Simon, N., Friedman, J., Hastie, T., Tibshirani, R.: A sparse-group lasso. J. Comput. Graph. Stat. 22(2), 231–245 (2013)
Article MathSciNet Google Scholar
Urda, D., Jerez, J.M., Turias, I.J.: Data dimension and structure effects in predictive performance of deep neural networks. In: New Trends in Intelligent Software Methodologies, Tools and Techniques, pp. 361–372 (2018)
Google Scholar
Kanehisa, M., Goto, S.: KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28(1), 27–30 (2000)
Article Google Scholar
Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M., Tanabe, M.: KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44(D1), D457–D462 (2016)
Article Google Scholar
Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y., Morishima, K.: KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 45(D1), D353–D361 (2017)
Article Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso: a retrospective. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 58(1), 267–288 (1996)
MATH Google Scholar
Meier, L., Van De Geer, S., Bühlmann, P.: The group lasso for logistic regression. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 70(1), 53–71 (2008)
Article MathSciNet Google Scholar
Jacob, L., Obozinski, G., Vert, J.P.: Group lasso with overlap and graph lasso. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 433–440 (2009)
Google Scholar
Zeng, Y., Breheny, P.: Overlapping group logistic regression with applications to genetic pathway selection. Cancer Inf. 15(1), 179–187 (2016)
Google Scholar
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theor. 13(1), 21–27 (2006)
Article Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Rossi, F., Villa, N.: Support vector machine for functional data classification. Neurocomputing 69(7), 730–742 (2006)
Article Google Scholar
Bischl, B., et al.: mlr: machine learning in R. J. Mach. Learn. Res. 17(170), 1–5 (2016)
MathSciNet MATH Google Scholar
Li, B., Dewey, C.N.: RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinf. 12(1), 323 (2011)
Article Google Scholar
Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, IJCAI 1995, vol. 2, pp. 1137–1143 (1995)
Google Scholar
Bischl, B., Richter, J., Bossek, J., Horn, D., Thomas, J., Lang, M.: mlrMBO: A Modular Framework for Model-Based Optimization of Expensive Black-Box Functions (2017)
Google Scholar
Dees, N.D., et al.: MuSiC: identifying mutational significance in cancer genomes. Genome Res. 8, 1589–98 (2012)
Article Google Scholar
Shimomura, A., et al.: Novel combination of serum microrna for detecting breast cancer in the early stage. Cancer Sci. 107(3), 326–334 (2016)
Article Google Scholar
Zhao, H., Shen, J., Medico, L., Wang, D., Ambrosone, C.B., Liu, S.: A pilot study of circulating miRNAs as potential biomarkers of early stage breast cancer. PLoS ONE 5(10), 1–12 (2010)
Google Scholar
Chen, G.Q., Zhao, Z.W., Zhou, H.Y., Liu, Y.J., Yang, H.J.: Systematic analysis of microRNA involved in resistance of the MCF-7 human breast cancer cell to doxorubicin. Med. Oncol. 27(2), 406–415 (2010)
Article Google Scholar

Download references

Acknowledgments

This work is part of the coordinated research projects TIN2014-58516-C2-1-R, TIN2014-58516-C2-2-R and TIN2017-88728-C2 from MINECO-SPAIN which include FEDER funds.

Author information

Authors and Affiliations

Departamento de Ingeniería Informática, EPS de Algeciras, Universidad de Cádiz, Cádiz, Spain
Daniel Urda & Ignacio Turias
Departamento de Lenguajes y Ciencias de la Computación, ETSI de Informática, Universidad de Málaga, Málaga, Spain
Francisco J. Veredas & Leonardo Franco

Authors

Daniel Urda
View author publications
You can also search for this author in PubMed Google Scholar
Francisco J. Veredas
View author publications
You can also search for this author in PubMed Google Scholar
Ignacio Turias
View author publications
You can also search for this author in PubMed Google Scholar
Leonardo Franco
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Urda .

Editor information

Editors and Affiliations

Department of Computer Architecture and Computer Technology Higher Technical School of Information Technology and Telecommunications Engineering, CITIC-UGR, Granada, Spain
Ignacio Rojas
ETSIIT, University of Granada, Granada, Spain
Olga Valenzuela
CITIC-UGR, University of Granada, Granada, Spain
Fernando Rojas
Fundacion Progreso y Salud, Granada, Spain
Francisco Ortuño

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Urda, D., Veredas, F.J., Turias, I., Franco, L. (2019). Addition of Pathway-Based Information to Improve Predictions in Transcriptomics. In: Rojas, I., Valenzuela, O., Rojas, F., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2019. Lecture Notes in Computer Science(), vol 11466. Springer, Cham. https://doi.org/10.1007/978-3-030-17935-9_19

Download citation

DOI: https://doi.org/10.1007/978-3-030-17935-9_19
Published: 13 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-17934-2
Online ISBN: 978-3-030-17935-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Addition of Pathway-Based Information to Improve Predictions in Transcriptomics

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Cancer gene expression profiles associated with clinical outcomes to chemotherapy treatments

Identifying Cancer Biomarkers from High-Throughput RNA Sequencing Data by Machine Learning

A comparative study of survival models for breast cancer prognostication revisited: the benefits of multi-gene models

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Addition of Pathway-Based Information to Improve Predictions in Transcriptomics

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Cancer gene expression profiles associated with clinical outcomes to chemotherapy treatments

Identifying Cancer Biomarkers from High-Throughput RNA Sequencing Data by Machine Learning

A comparative study of survival models for breast cancer prognostication revisited: the benefits of multi-gene models

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation