Abstract
This papers compares the behavior of three linear classifiers modeled on both the feature space and the dissimilarity space when the class imbalance of data sets interweaves with small disjuncts and noise. To this end, experiments are carried out over three synthetic databases with different imbalance ratios, levels of noise and complexity of the small disjuncts. Results suggest that small disjuncts can be much better overcome on the dissimilarity space than on the feature space, which means that the learning models will be only affected by imbalance and noise if the samples have firstly been mapped into the dissimilarity space.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alpaydin, E.: Introduction to Machine Learning. MIT Press, Cambridge (2010)
García, S., Luengo, J., Herrera, F.: Data Preprocessing in Data Mining. Springer, Heidelberg (2015)
Ho, T.K., Basu, M.: Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 289–300 (2002)
Holte, R.C., Acker, L.E., Porter, B.W.: Concept learning and the problem of small disjuncts. In: Proceedings of 11th International Joint Conference on Artificial Intelligence, vol. 1, pp. 813–818, Detroit (1989)
Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)
Jo, T., Japkowicz, N.: Class imbalances versus small disjuncts. SIGKDD Explor. Newsl. 6(1), 40–49 (2004)
López, V., Fernández, A., García, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 250, 113–141 (2013)
Lughofer, E.: Single-pass active learning with conflict and ignorance. Evol. Syst. 3(4), 251–271 (2012)
Napierała, K., Stefanowski, J., Wilk, S.: Learning from imbalanced data in presence of noisy and borderline examples. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS, vol. 6086, pp. 158–167. Springer, Heidelberg (2010)
Pȩkalska, E., Duin, R.P.W.: Dissimilarity representations allow for building good classifiers. Pattern Recogn. Lett. 23(8), 943–956 (2002)
Pȩkalska, E., Duin, R.P.W.: The Dissimilarity Representation for Pattern Recognition: Foundations and Applications. World Scientific, Singapore (2005)
Pȩkalska, E., Duin, R.P.W., Paclík, P.: Prototype selection for dissimilarity-based classifiers. Pattern Recogn. 39(2), 189–208 (2006)
Pȩkalska, E., Paclik, P., Duin, R.P.W.: A generalized kernel approach to dissimilarity-based classification. J. Mach. Learn. Res. 2, 175–211 (2002)
Prati, R.C., Batista, G.E.A.P.A., Monard, M.C.: Learning with class skews and small disjuncts. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 296–306. Springer, Heidelberg (2004)
Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Folleco, A.: An empirical study of the classification performance of learners on imbalanced and noisy software quality data. Inf. Sci. 259, 571–595 (2014)
Sun, Y., Wong, A.K.C., Kamel, M.S.: Classification of imbalanced data: a review. Int. J. Pattern Recogn. Artif. Intell. 23(4), 687–719 (2009)
Weiss, G.M., Hirsh, H.: The problem with noise and small disjuncts. In: Proceedings of 15th International Conference on Machine Learning, pp. 574–578, Madison (1998)
Weiss, G.M.: The effect of small disjuncts and class distribution on decision tree learning. Ph.D. thesis, Rutgers University, New Brunswick (2003)
Acknowledgment
This work has partially been supported by the Mexican Science and Technology Council (CONACYT-Mexico) through the Postdoctoral Fellowship Program [223351 and 232167], the Spanish Ministry of Economy [TIN2013-46522-P] and the Generalitat Valenciana [PROMETEOII/2014/062].
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
García, V., Sánchez, J.S., Ochoa Domínguez, H.J., Cleofas-Sánchez, L. (2015). Dissimilarity-Based Learning from Imbalanced Data with Small Disjuncts and Noise. In: Paredes, R., Cardoso, J., Pardo, X. (eds) Pattern Recognition and Image Analysis. IbPRIA 2015. Lecture Notes in Computer Science(), vol 9117. Springer, Cham. https://doi.org/10.1007/978-3-319-19390-8_42
Download citation
DOI: https://doi.org/10.1007/978-3-319-19390-8_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19389-2
Online ISBN: 978-3-319-19390-8
eBook Packages: Computer ScienceComputer Science (R0)