[go: up one dir, main page]

Skip to main content

F-score with Pareto Front Analysis for Multiclass Gene Selection

  • Conference paper
Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics (EvoBIO 2009)

Abstract

F-score is a widely used filter criteria for gene selection in multiclass cancer classification. This ranking criterion may become biased towards classes that have surplus of between-class sum of squares, resulting in inferior classification performance. To alleviate this problem, we propose to compute individual class wise between-class sum of squares with Pareto frontal analysis to rank genes. We tested our approach on four multiclass cancer gene expression datasets and the results show improvement in classification performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Inza, I., Larranaga, P., Blanco, R., Cerrolaza, A.: Filter versus wrapper gene selection approaches in DNA microarray domains. Artificial Intelligence Medicine 31, 91–103 (2004)

    Article  Google Scholar 

  2. Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinformatics Computational Biology 3, 185–205 (2005)

    Article  Google Scholar 

  3. Ooi, C., Chetty, M., Teng, S.: Differential prioritization between relevance and redundancy in correlation-based feature selection techniques for multiclass gene expression data. BMC Bioinformatics 7, 320–339 (2006)

    Article  Google Scholar 

  4. Kai-Bo, D., Rajapakse, J., Wang, H., Azuaje, F.: Multiple SVM-RFE for gene selection in cancer classification with expression data. IEEE Trans. Nanobioscience 4, 228–234 (2005)

    Article  Google Scholar 

  5. Mundra, P., Rajapakse, J.: SVM-RFE with relevancy and redundancy criteria for gene selection. In: Rajapakse, J.C., Schmidt, B., Volkert, L.G. (eds.) PRIB 2007. LNCS (LNBI), vol. 4774, pp. 242–252. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  6. Li, T., Zhang, C., Ogihara, M.: A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20(15), 2429–2437 (2004)

    Article  Google Scholar 

  7. Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classification of tumors using gene expression data. J. American Statistical Association 97(457), 77–86 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  8. Chen, D., Liu, Z., Ma, X., Hua, D.: Selecting genes by test statistics. J. Biomedicine and Biotechnology 2, 132–138 (2005)

    Article  Google Scholar 

  9. Cho, J.-H., Lee, D., Park, J.H., Lee, I.-B.: New gene selection method for classification of cancer subtypes considering within-class variation. FEBS Letters 551, 3–7 (2003)

    Article  Google Scholar 

  10. Zhou, X., Tuck, T.P.: MSVM-RFE: Extensions of SVM-RFE for multiclass gene selection on dna microarray data. Bioinformatics 23(9), 1106–1114 (2007)

    Article  Google Scholar 

  11. Duan, K.B., Rajapakse, J., Nguyen, M.: One-versus-one and one-versus-all multiclass SVM-RFE for gene selection in cancer classification. In: Marchiori, E., Moore, J.H., Rajapakse, J.C. (eds.) EvoBIO 2007. LNCS, vol. 4447, pp. 47–56. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  12. Jirapech-Umpai, T., Aitken, S.: Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes. BMC Bioinformatics 6, 148–158 (2005)

    Article  Google Scholar 

  13. Ooi, C.H., Tan, P.: Genetic algorithms applied to multi-class prediction for the analysis of gene expression data. Bioinformatics 19(1), 37–44 (2003)

    Article  Google Scholar 

  14. Xuan, J., Wang, Y., Dong, Y., Feng, Y., et al.: Gene selection for multiclass prediction by weighted fisher criterion. EURASIP J. Bioinformatics and Systems Biology 2007(article id 64628) (2007)

    Google Scholar 

  15. Forman, G.: A pitfall and solution in multi-class feature selection for text classification. In: Proceedings of the twenty-first international conference on Machine learning (2004)

    Google Scholar 

  16. Hero, A., Fleury, G.: Pareto-optimal methods for gene ranking. J. VLSI Signal Processing 38, 259–275 (2004)

    Article  Google Scholar 

  17. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evolutionary Computation 6(2), 182–197 (2002)

    Article  Google Scholar 

  18. Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S.: Others: Multiclass cancer diagnosis using tumor gene expression signatures. Proceedings of the National Academy of Sciences 98(26), 15149–15154 (2001)

    Article  Google Scholar 

  19. Bhattacharjee, A., Richards, W., Staunton, J., Li, C.: Others: Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proceedings of the National Academy of Sciences 98(24), 13790–13795 (2001)

    Article  Google Scholar 

  20. Armstrong, S., Staunton, J., Silverman, L., Pieters, R., et al.: MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nature Genetics 30(1), 41–47 (2002)

    Article  Google Scholar 

  21. Ross, D.T., Scherf, U., Eisen, M.B., Perou, C., et al.: Systematic variation in gene expression patterns in human cancer cell. Nature Genetics 24(3), 227–235 (2000)

    Article  Google Scholar 

  22. Culhane, A., Perriere, G., Higgins, D.: Cross-platform comparison and visualisation of gene expression data using co-inertia analysis. BMC Bioinformatics 4(1), 59 (2003)

    Article  Google Scholar 

  23. Gorodkin, J.: Comparing two K-category assignment by a K-category correlation coefficient. Computational Biology and Chemistry 28, 367–374 (2004)

    Article  MATH  Google Scholar 

  24. Liu, J., Kang, S., Tang, C., Ellis, L.B., Li, T.: Meta-prediction of protein subcellular localization with reduced voting. Nucleic Acid Research 35(15), e96 (2007)

    Article  Google Scholar 

  25. Chang, C., Lin, C.: Libsvm: A library for support vector machines (2001), www.csie.ntu.edu.tw/~cjlin/libsvm

  26. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, Heidelberg (2001)

    Book  MATH  Google Scholar 

  27. Jensen, M.: Reducing run-time complexity of multiobjective EAs: The NSGA-II and other algorithms. IEEE Trans. Evolutionary Computation 7(5), 503–515 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mundra, P.A., Rajapakse, J.C. (2009). F-score with Pareto Front Analysis for Multiclass Gene Selection. In: Pizzuti, C., Ritchie, M.D., Giacobini, M. (eds) Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. EvoBIO 2009. Lecture Notes in Computer Science, vol 5483. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01184-9_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-01184-9_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-01183-2

  • Online ISBN: 978-3-642-01184-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics