Abstract
F-score is a widely used filter criteria for gene selection in multiclass cancer classification. This ranking criterion may become biased towards classes that have surplus of between-class sum of squares, resulting in inferior classification performance. To alleviate this problem, we propose to compute individual class wise between-class sum of squares with Pareto frontal analysis to rank genes. We tested our approach on four multiclass cancer gene expression datasets and the results show improvement in classification performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Inza, I., Larranaga, P., Blanco, R., Cerrolaza, A.: Filter versus wrapper gene selection approaches in DNA microarray domains. Artificial Intelligence Medicine 31, 91–103 (2004)
Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinformatics Computational Biology 3, 185–205 (2005)
Ooi, C., Chetty, M., Teng, S.: Differential prioritization between relevance and redundancy in correlation-based feature selection techniques for multiclass gene expression data. BMC Bioinformatics 7, 320–339 (2006)
Kai-Bo, D., Rajapakse, J., Wang, H., Azuaje, F.: Multiple SVM-RFE for gene selection in cancer classification with expression data. IEEE Trans. Nanobioscience 4, 228–234 (2005)
Mundra, P., Rajapakse, J.: SVM-RFE with relevancy and redundancy criteria for gene selection. In: Rajapakse, J.C., Schmidt, B., Volkert, L.G. (eds.) PRIB 2007. LNCS (LNBI), vol. 4774, pp. 242–252. Springer, Heidelberg (2007)
Li, T., Zhang, C., Ogihara, M.: A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20(15), 2429–2437 (2004)
Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classification of tumors using gene expression data. J. American Statistical Association 97(457), 77–86 (2002)
Chen, D., Liu, Z., Ma, X., Hua, D.: Selecting genes by test statistics. J. Biomedicine and Biotechnology 2, 132–138 (2005)
Cho, J.-H., Lee, D., Park, J.H., Lee, I.-B.: New gene selection method for classification of cancer subtypes considering within-class variation. FEBS Letters 551, 3–7 (2003)
Zhou, X., Tuck, T.P.: MSVM-RFE: Extensions of SVM-RFE for multiclass gene selection on dna microarray data. Bioinformatics 23(9), 1106–1114 (2007)
Duan, K.B., Rajapakse, J., Nguyen, M.: One-versus-one and one-versus-all multiclass SVM-RFE for gene selection in cancer classification. In: Marchiori, E., Moore, J.H., Rajapakse, J.C. (eds.) EvoBIO 2007. LNCS, vol. 4447, pp. 47–56. Springer, Heidelberg (2007)
Jirapech-Umpai, T., Aitken, S.: Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes. BMC Bioinformatics 6, 148–158 (2005)
Ooi, C.H., Tan, P.: Genetic algorithms applied to multi-class prediction for the analysis of gene expression data. Bioinformatics 19(1), 37–44 (2003)
Xuan, J., Wang, Y., Dong, Y., Feng, Y., et al.: Gene selection for multiclass prediction by weighted fisher criterion. EURASIP J. Bioinformatics and Systems Biology 2007(article id 64628) (2007)
Forman, G.: A pitfall and solution in multi-class feature selection for text classification. In: Proceedings of the twenty-first international conference on Machine learning (2004)
Hero, A., Fleury, G.: Pareto-optimal methods for gene ranking. J. VLSI Signal Processing 38, 259–275 (2004)
Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evolutionary Computation 6(2), 182–197 (2002)
Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S.: Others: Multiclass cancer diagnosis using tumor gene expression signatures. Proceedings of the National Academy of Sciences 98(26), 15149–15154 (2001)
Bhattacharjee, A., Richards, W., Staunton, J., Li, C.: Others: Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proceedings of the National Academy of Sciences 98(24), 13790–13795 (2001)
Armstrong, S., Staunton, J., Silverman, L., Pieters, R., et al.: MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nature Genetics 30(1), 41–47 (2002)
Ross, D.T., Scherf, U., Eisen, M.B., Perou, C., et al.: Systematic variation in gene expression patterns in human cancer cell. Nature Genetics 24(3), 227–235 (2000)
Culhane, A., Perriere, G., Higgins, D.: Cross-platform comparison and visualisation of gene expression data using co-inertia analysis. BMC Bioinformatics 4(1), 59 (2003)
Gorodkin, J.: Comparing two K-category assignment by a K-category correlation coefficient. Computational Biology and Chemistry 28, 367–374 (2004)
Liu, J., Kang, S., Tang, C., Ellis, L.B., Li, T.: Meta-prediction of protein subcellular localization with reduced voting. Nucleic Acid Research 35(15), e96 (2007)
Chang, C., Lin, C.: Libsvm: A library for support vector machines (2001), www.csie.ntu.edu.tw/~cjlin/libsvm
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, Heidelberg (2001)
Jensen, M.: Reducing run-time complexity of multiobjective EAs: The NSGA-II and other algorithms. IEEE Trans. Evolutionary Computation 7(5), 503–515 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mundra, P.A., Rajapakse, J.C. (2009). F-score with Pareto Front Analysis for Multiclass Gene Selection. In: Pizzuti, C., Ritchie, M.D., Giacobini, M. (eds) Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. EvoBIO 2009. Lecture Notes in Computer Science, vol 5483. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01184-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-01184-9_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01183-2
Online ISBN: 978-3-642-01184-9
eBook Packages: Computer ScienceComputer Science (R0)