Abstract
The advance of high-throughput techniques, such as gene microarrays and protein chips have a major impact on contemporary biology and medicine. Due to the high-dimensionality and complexity of the data, it is impossible to analyze it manually. Therefore machine learning techniques play an important role in dealing with such data. In this paper we propose to use a one-class approach to classifying microarrays. Unlike canonical classifiers, these models rely only on objects coming from single class distributions. They distinguish observations coming from the given class from any other possible states of the object, that were unseen during the classification step. While having less information to dichotomize between classes, one-class models can easily learn the specific properties of a given dataset and are robust to difficulties embedded in the nature of the data. We show, that using one-class ensembles can give as good results as canonical multi-class classifiers, while allowing to deal with imbalanced distribution and unexpected noise in the data. To cope with high dimensionality of the feature space, we propose a novel hybrid one-class ensemble utilizing combination of weighted Bagging and Random Subspaces. Experimental investigations, carried on public datasets, prove the usefulness of the proposed approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bariamis, D., Maroulis, D., Iakovidis, D.K.: Unsupervised SVM-based gridding for DNA microarray images. Comput. Med. Imaging Graph. 34(6), 418–425 (2010)
Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach. Learn. 36(1–2), 105–139 (1999)
Bicego, M., Figueiredo, M.A.T.: Soft clustering using weighted one-class support vector machines. Pattern Recogn. 42(1), 27–32 (2009)
Cyganek, B.: One-class support vector ensembles for image segmentation and classification. J. Math. Imaging Vis. 42(2–3), 103–117 (2012)
Czarnecki, W.M., Tabor, J.: Two ellipsoid support vector machines. Expert Syst. Appl. 41(18), 8211–8224 (2014)
Desir, C., Bernard, S., Petitjean, C., Heutte, L.: One class random forests. Pattern Recogn. 46(12), 3490–3506 (2013)
Finak, G., Bertos, N., Pepin, F., Sadekova, S., Souleimanova, M., Zhao, H., Chen, H., Omeroglu, G., Meterissian, S., Omeroglu, A., Hallett, M., Park, M.: Stromal gene expression predicts clinical outcome in breast cancer. Nat. Med. 14(5), 518–527 (2008)
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20, 832–844 (1998)
Inza, I., Larraaga, P., Blanco, R., Cerrolaza, A.J.: Filter versus wrapper gene selection approaches in dna microarray domains. Artif. Intell. Med. 31(2), 91–103 (2004)
Krawczyk, B.: Forming ensembles of soft one-class classifiers with weighted bagging. New Gener. Comput. 33(4), 449–466 (2015)
Krawczyk, B., Woźniak, M., Herrera, F.: On the usefulness of one-class classifier ensembles for decomposition of multi-class problems. Pattern Recogn. 48(12), 3969–3982 (2015)
Larranaga, P., Calvo, B., Santana, R., Bielza, C., Galdiano, J., Inza, I., Lozano, J.A., Armananzas, R.: Machine learning in bioinformatics. Briefings Bioinform. 7(1), 86–112 (2006)
Liu, K., Huang, D.: Cancer classification using rotation forest. Comput. Biol. Med. 38(5), 601–610 (2008)
Lynch, C.C., Hikosaka, A., Acuff, H.B., Martin, M.D., Kawai, N., Singh, R.K., Vargo-Gogola, T.C., Begtrup, J.L., Peterson, T.E., Fingleton, B., Shirai, T., Matrisian, L.M., Futakuchi, M.: MMP-7 promotes prostate cancer-induced osteolysis via the solubilization of RANKL. Cancer Cell 7(5), 485–496 (2005)
Moorthy, K., Mohamad, M.S.: Random forest for gene selection and microarray data classification. In: Lukose, D., Ahmad, A.R., Suliman, A. (eds.) KTW 2011. CCIS, vol. 295, pp. 174–183. Springer, Heidelberg (2012)
Noto, K., Brodley, C., Slonim, D.: FRaC: a feature-modeling approach for semi-supervised and unsupervised anomaly detection. Data Min. Knowl. Discov. 25(1), 109–133 (2012)
Ringner, M., Peterson, C., Khan, J.: Analyzing array data using supervised methods. Pharmacogenomics 3(3), 403–415 (2002). Cited By (since 1996): 43
Schatton, T., Murphy, G.F., Frank, N.Y., Yamaura, K., Waaga-Gasser, A.M., Gasser, M., Zhan, Q., Jordan, S., Duncan, L.M., Weishaupt, C., Fuhlbrigge, R.C., Kupper, T.S., Sayegh, M.H., Frank, M.H.: Identification of cells initiating human melanomas. Nature 451(7176), 345–349 (2008)
Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. Adaptive Computation and Machine Learning. MIT Press, Cambridge (2002)
Silveira, V.S., Scrideli, C.A., Moreno, D.A., Yunes, J.A., Queiroz, R.G.P., Toledo, S.C., Lee, M.L.M., Petrilli, A.S., Brandalise, S.R., Tone, L.G.: Gene expression pattern contributing to prognostic factors in childhood acute lymphoblastic leukemia. Leukemia Lymphoma 54(2), 310–314 (2013)
Tax, D.M.J., Duin, R.P.W.: Support vector data description. Mach. Learn. 54(1), 45–66 (2004)
Tax, D.M.J., Duin, R.P.W.: Combining one-class classifiers. In: Kittler, J., Roli, F. (eds.) MCS 2001. LNCS, vol. 2096, p. 299. Springer, Heidelberg (2001)
Tax, D.M.J., Juszczak, P., Pekalska, E., Duin, R.P.W.: Outlier detection using ball descriptions with adjustable metric. In: Yeung, D.-Y., Kwok, J.T., Fred, A., Roli, F., de Ridder, D. (eds.) SSPR 2006 and SPR 2006. LNCS, vol. 4109, pp. 587–595. Springer, Heidelberg (2006)
R Development Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2008)
Tinker, A.V., Boussioutas, A., Bowtell, D.D.L.: The challenges of gene expression microarrays for the study of human cancer. Cancer Cell 9(5), 333–339 (2006)
Trawiński, B.: Evolutionary fuzzy system ensemble approach to model real estate market based on data stream exploration. J. UCS 19(4), 539–562 (2013)
Wang, Y., Yu, Z., Anh, V.: Fuzzy C-means method with empirical mode decomposition for clustering microarray data. Int. J. Data Min. Bioinf. 7(2), 103–117 (2013)
Wilk, T., Woźniak, M.: Soft computing methods applied to combination of one-class classifiers. Neurocomputing 75, 185–193 (2012)
Woźniak, M., Grana, M., Corchado, E.: A survey of multiple classifier systems as hybrid systems. Inf. Fusion 16(1), 3–17 (2014)
Woźniak, M., Zmyślony, M.: Chosen problems of designing effective multiple classifier systems. In: International Conference on Computer Information Systems and Industrial Management Applications, CISIM, Krakow, Poland, 8–10 October 2010, pp. 42–47 (2010)
Acknowledgements
This work was partially supported by The Polish National Science Centre under the grant PRELUDIUM number DEC-2013/09/N/ST6/03504.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Krawczyk, B. (2016). Hybrid One-Class Ensemble for High-Dimensional Data Classification. In: Nguyen, N.T., Trawiński, B., Fujita, H., Hong, TP. (eds) Intelligent Information and Database Systems. ACIIDS 2016. Lecture Notes in Computer Science(), vol 9622. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49390-8_13
Download citation
DOI: https://doi.org/10.1007/978-3-662-49390-8_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-49389-2
Online ISBN: 978-3-662-49390-8
eBook Packages: Computer ScienceComputer Science (R0)