Abstract
The discovery of diseases at a molecular level is a great challenge for researchers in the field of bioinformatics and cancer classification. Understanding the genes that contribute to the cancer malady is a great challenge to many researchers. Cancer classification based on the molecular level investigation has gained the interest of researches as it provides a systematic, accurate and objective diagnosis for different cancer types. This Paper aims to present some classification methods for gene expression data. We compared the efficiency of three different classification methods; support vector machines, k-nearest neighbor and random forest. Two publicly available gene expression data sets were used in the classifications; Freije and Philips dataset. By performing the classification methods, results revealed that the best performance was achieved by using support vector machine classifier for both datasets comparing with other used classifiers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Stewart, B.W., Wild, C.P.: World Cancer report 2014. In: International Agency for Research on Cancer (IARC), World Health Organization (WHO). WHO Press, Switzerland (2014)
Wang, J.J.-Y., Bensmail, H., Gao, X.: Multiple graph regularized nonnegative matrix factorization. Pattern Recogn. 46(10), 2840–2847 (2013)
Wang, J.J.-Y., Wang, X., Gao, X.: Non-negative matrix factorization by maximizing correntropy for cancer clustering. BMC Bioinform. 14, 107–118 (2013)
Wang, J.-Y., Almasri, I., Gao, X.: Adaptive graph regularized nonnegative matrix factorization via feature selection. In: 21st International Conference on Pattern Recognition (ICPR), pp. 963–966 (2012)
Tusher, V.G., Tibshirani, R., Chu, G.: Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. U.S.A. 98(9), 5116–5121 (2001)
Spang, R.: Diagnostic signatures from microarrays: a bioinformatics concept for personalized medicine. BIOSILICO 1, 64–68 (2003)
Brunet, J.P., Tamayo, P., Golub, T.R., Mesirov, J.P.: Metagenes and molecular pattern discovery using matrix factorization. Proc. Natl. Acad. Sci. U.S.A. 101(12), 4164–4169 (2004)
McLachlan, G.J., Bean, R., Peel, D.: A mixture model-based approach to the clustering of microarray expression data. Bioinformatics 18(3), 413–422 (2002)
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
Li, Y., Kang, K., Krahn, J.M., Croutwater, N., Lee, K., Umbach, D.M., Li, L.: A comprehensive genomic pan-cancer classification using the Cancer Genome Atlas gene expression data. BMC Genom. 18(1), 508 (2017)
Li, L., Weinberg, C.R., Darden, T., Pedersen, L.G.: Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics 17(12), 1131–1142 (2001)
Li, L., Darden, T.A., Weinberg, C.R., Levine, A.J., Pedersen, L.G.: Gene assessment and sample classification for gene expression data using a genetic algorithm/k-nearest neighbor method. Comb. Chem. High Throughput Screen. 4(8), 727–739 (2001)
Singha, R.K., Sivabalakrishnan, M.: Feature selection of gene expression data for cancer classification: a review. Procedia Comput. Sci. 50, 52–57 (2015)
Zhong, W., Lu, X., Wu, J.: Feature selection for cancer classification using microarray gene expression data. Biostat. Biometr. 1(2), 1–7 (2017)
Li, T., Zhang, C., Ogihara, M.A.: comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20(15), 2429–2437 (2004)
Wu, X., Kumar, V., Ross Quinlan, J., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B., Yu, P.S., Zhou, Z.H., Steinbach, M., Hand, D.J., Steinberg, D.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)
Nello, C., Taylor, J.S.: An Introduction to support vector machines and other kernel-based learning methods. Cambridge Univ. Press 22(2), 204–210 (2001)
The Freije dataset. http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE4271. last accessed 10 Aug 2018
The Phillips dataset. http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE4412. last accessed 10 Aug 2018
Schlkopf, B., Tsuda, K., Vert, J.P.: Kernel methods in computational biology. MIT Press series on Computational Molecular Biology, Berlin (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Zakaria, L., Ebeid, H.M., Dahshan, S., Tolba, M.F. (2020). Analysis of Classification Methods for Gene Expression Data. In: Hassanien, A., Azar, A., Gaber, T., Bhatnagar, R., F. Tolba, M. (eds) The International Conference on Advanced Machine Learning Technologies and Applications (AMLTA2019). AMLTA 2019. Advances in Intelligent Systems and Computing, vol 921. Springer, Cham. https://doi.org/10.1007/978-3-030-14118-9_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-14118-9_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-14117-2
Online ISBN: 978-3-030-14118-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)