[go: up one dir, main page]

skip to main content
article

Improving the Computational Efficiency of Recursive Cluster Elimination for Gene Selection

Published: 01 January 2011 Publication History

Abstract

The gene expression data are usually provided with a large number of genes and a relatively small number of samples, which brings a lot of new challenges. Selecting those informative genes becomes the main issue in microarray data analysis. Recursive cluster elimination based on support vector machine (SVM-RCE) has shown the better classification accuracy on some microarray data sets than recursive feature elimination based on support vector machine (SVM-RFE). However, SVM-RCE is extremely time-consuming. In this paper, we propose an improved method of SVM-RCE called ISVM-RCE. ISVM-RCE first trains a SVM model with all clusters, then applies the infinite norm of weight coefficient vector in each cluster to score the cluster, finally eliminates the gene clusters with the lowest score. In addition, ISVM-RCE eliminates genes within the clusters instead of removing a cluster of genes when the number of clusters is small. We have tested ISVM-RCE on six gene expression data sets and compared their performances with SVM-RCE and linear-discriminant-analysis-based RFE (LDA-RFE). The experiment results on these data sets show that ISVM-RCE greatly reduces the time cost of SVM-RCE, meanwhile obtains comparable classification performance as SVM-RCE, while LDA-RFE is not stable.

References

[1]
K.B. Duan, J.C. Rajapakse, H. Wang, and F. Azuaje, "Multiple SVM-RFE for Gene Selection in Cancer Classification with Expression Data," IEEE Trans. Nanobioscience, vol. 4, no. 3, pp. 228-234, Sept. 2005.
[2]
X. Zhou and D.P. Tuck, "MSVM-RFE: Extensions of SVM-RFE for Multiclass Gene Selection on DNA Microarray Data," Bioinformatics, vol. 23, pp. 1106-1114, 2006.
[3]
S. Deegalla and H. Boström, "Classification of Microarrays with kNN: Comparison of Dimensionality Reduction Methods," Proc. Eighth Int'l Conf. Intelligent Data Eng. and Automated Learning, pp. 800-809, 2007.
[4]
I. Inza, P. Larranaga, R. Blanco, and A.J. Cerrolaza, "Filter versus Wrapper Gene Selection Approaches in DNA Microarray Domains," Artificial Intelligence in Medicine, vol. 31, no. 2, pp. 91-103, 2004.
[5]
I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, "Gene Selection for Cancer Classification Using Support Vector Machines," Machine Learning, vol. 46, nos. 1-3, pp. 389-422, 2002.
[6]
Y. Ding and D. Wilkins, "Improving the Performance of SVM-RFE to Select Genes in Microarray Data," BMC Bioinformatics, vol. 7, sup. 2, no. S12, pp. 1-8, July 2006.
[7]
M. Yousef, S. Jung, L.C. Showe, and M.K. Showe, "Recursive Cluster Elimination (RCE) for Classification and Feature Selection from Gene Expression Data," BMC Bioinformatics, vol. 8, article no. 144, pp. 1-12, 2007.
[8]
J. Weston, A. Elisseeff, B. Schölkopf, and M. Tipping, "Use of the Zero-Norm with Linear Models and Kernel Methods," J. Machine Learning Research, vol. 3, pp. 1439-1461, 2003.
[9]
L. Yu and H. Liu, "Efficient Feature Selection via Analysis of Relevance and Redundancy," J. Machine Learning Research, vol. 5, pp. 1205-1224, 2004.
[10]
J.H. Hong and S.B. Cho, "Efficient Huge-Scale Feature Selection with Speciated Genetic Algorithm," Pattern Recognition Letters, vol. 27, pp. 143-150, 2006.
[11]
X. Zhou, X.Y. Wu, K.Z. Mao, and D.P. Tuck, "Fast Gene Selection for Microarray Data Using SVM-Based Evaluation Criterion," Proc. IEEE Int'l Conf. Bioinformatics and Biomedicine, pp. 386-389, 2008.
[12]
R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis. Wiley, 1973.
[13]
J. MacQueen, "Some Methods for Classification and Analysis of Multivariate Observations," Proc. Fifth Berkeley Symp. Math. Statics and Probability, pp. 281-297, 1967.
[14]
V. Vapnik, Statistical Learning Theory. Wiley, 1998.
[15]
M.P.S. Brown, W.N. Grundy, D. Lin, N. Cristianini, C.W. Sugnet, T.S. Furey, M.A. Jr, and D. Haussler, "Knowledge-Based Analysis of Microarray Gene Expression Data by Using Support Vector Machines," Proc. Nat'l Academy of Sciences USA, vol. 97, no. 1 pp. 262-267, 2000.
[16]
T.S. Furey, N. Cristianini, N. Duffy, D.W. Bednarski, M. Schummer, and D. Haussler, "Support Vector Machine Classification and Validation of Cancer Tissue Samples Using Microarray Expression Data," Bioinformatics, vol. 16, no. 10, pp. 906-914, 2000.
[17]
Showe Laboratory, http://showelab.wistar.upenn.edu/, 2009.
[18]
U. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, and A.J. Levine, "Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Cancer Tissues Probed by Oligonucleotide Arrays," Proc. Nat'l. Academy of Sciences USA, vol. 96, pp. 6745-6750, 1999.
[19]
A. Statnikov, I. Tsamardinos, Y. Dosbayev, and C.F. Aliferis http://www.gems-system.org/, 2009.
[20]
A. Statnikov, C.F. Aliferis, and I. Tsamardinos, "Methods for Multi-Category Cancer Diagnosis from Gene Expression Data: A Comprehensive Evaluation to Inform Decision Support System Development," Proc. 11th World Congress on Medical Informatics (MEDINFO), pp. 813-817, 2004.
[21]
A. Shashua, "On the Relationship between the Support Vector Machine for Classification and Sparsified Fisher's Linear Discriminant," Neural Processing Letters, vol. 9, pp. 129-139, 1999.

Cited By

View all
  • (2021)Feature Selection and Feature Stability Measurement Method for High-Dimensional Small Sample Data Based on Big Data TechnologyComputational Intelligence and Neuroscience10.1155/2021/35970512021Online publication date: 1-Jan-2021
  • (2021)IBRDM: An Intelligent Framework for Brain Tumor Classification Using Radiomics- and DWT-based Fusion of MRI SequencesACM Transactions on Internet Technology10.1145/343477522:1(1-30)Online publication date: 28-Sep-2021
  • (2018)Optimal Feature Selection using Fuzzy Combination of Feature Subset for Transcriptome Data2018 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)10.1109/FUZZ-IEEE.2018.8491683(1-8)Online publication date: 8-Jul-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics
IEEE/ACM Transactions on Computational Biology and Bioinformatics  Volume 8, Issue 1
January 2011
282 pages

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 01 January 2011
Published in TCBB Volume 8, Issue 1

Author Tags

  1. Recursive cluster elimination
  2. feature selection.
  3. gene expression data

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)3
Reflects downloads up to 06 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Feature Selection and Feature Stability Measurement Method for High-Dimensional Small Sample Data Based on Big Data TechnologyComputational Intelligence and Neuroscience10.1155/2021/35970512021Online publication date: 1-Jan-2021
  • (2021)IBRDM: An Intelligent Framework for Brain Tumor Classification Using Radiomics- and DWT-based Fusion of MRI SequencesACM Transactions on Internet Technology10.1145/343477522:1(1-30)Online publication date: 28-Sep-2021
  • (2018)Optimal Feature Selection using Fuzzy Combination of Feature Subset for Transcriptome Data2018 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)10.1109/FUZZ-IEEE.2018.8491683(1-8)Online publication date: 8-Jul-2018
  • (2016)Supervised, Unsupervised, and Semi-Supervised Feature SelectionIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2015.247845413:5(971-989)Online publication date: 1-Sep-2016
  • (2016)Hybrid framework using multiple-filters and an embedded approach for an efficient selection and classification of microarray dataIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2015.247438413:1(12-26)Online publication date: 1-Jan-2016
  • (2015)On Efficient Feature Ranking Methods for High-Throughput Data AnalysisIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2015.241579012:6(1374-1384)Online publication date: 1-Nov-2015
  • (2014)Gene selection using locality sensitive laplacian scoreIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2014.232833411:6(1146-1156)Online publication date: 1-Nov-2014
  • (2013)Integrating piecewise linear representation and weighted support vector machine for stock trading signal predictionApplied Soft Computing10.1016/j.asoc.2012.10.02613:2(806-816)Online publication date: 1-Feb-2013
  • (2013)Recursive feature elimination based on linear discriminant analysis for molecular selection and classification of diseasesProceedings of the 9th international conference on Intelligent Computing Theories and Technology10.1007/978-3-642-39482-9_28(244-251)Online publication date: 28-Jul-2013
  • (2012)Gene Selection Using Iterative Feature Elimination Random Forests for Survival OutcomesIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2012.639:5(1422-1431)Online publication date: 1-Sep-2012
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media