Abstract
Instance selection (IS) serves as a vital preprocessing step, particularly in addressing the complexities associated with high-dimensional problems. Its primary goal is the reduction of data instances, a process that involves eliminating irrelevant and superfluous data while maintaining a high level of classification accuracy. IS, as a strategic filtering mechanism, addresses these challenges by retaining essential instances and discarding hindering elements. This refinement process optimizes classification algorithms, enabling them to excel in handling extensive datasets. In this research, IS offers a promising avenue to strengthen the effectiveness of classification in various real-world applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)
Etikan, I., Bala, K.: Sampling and sampling methods. Biometrics Biostatistics Int. J. 5(6), 00149 (2017)
García, S., Luengo, J., Herrera, F.: Data Preprocessing in Data Mining, vol. 72. Springer, Cham (2015)
García, S., Luengo, J., Herrera, F.: Tutorial on practical tips of the most influential data preprocessing algorithms in data mining. Knowl.-Based Syst. 98, 1–29 (2016)
Garcia, S., Luengo, J., Sáez, A., Lopez, V., Herrera, F.: A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans. Knowl. Data Eng. 25(4), 734–750 (2012)
Kohavi, R., et al.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Ijcai, vol. 14, pp. 1137–1145. Montreal, Canada (1995)
Leyva, E., González, A., Pérez, R.: Three new instance selection methods based on local sets: A comparative study with several approaches from a bi-objective perspective. Pattern Recogn. 48(4), 1523–1537 (2015)
Li, T., Fong, S., Wu, Y., Tallón-Ballesteros, A.J.: Kennard-stone balance algorithm for time-series big data stream mining. In: 2020 International Conference on Data Mining Workshops (ICDMW), pp. 851–858. IEEE (2020)
Li, Y., Li, T., Liu, H.: Recent advances in feature selection and its applications. Knowl. Inf. Syst. 53(3), 551–577 (2017). https://doi.org/10.1007/s10115-017-1059-8
Nanni, L., Lumini, A.: Prototype reduction techniques: a comparison among different approaches. Expert Syst. Appl. 38(9), 11820–11828 (2011)
Rendon, E., Alejo, R., Castorena, C., Isidro-Ortega, F.J., Granda-Gutierrez, E.E.: Data sampling methods to deal with the big data multi-class imbalance problem. Appl. Sci. 10(4), 1276 (2020)
Schaffer, C.: A conservation law for generalization performance. In: Machine Learning Proceedings 1994, pp. 259–265. Elsevier (1994)
Triguero, I., Sáez, J.A., Luengo, J., García, S., Herrera, F.: On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification. Neurocomputing 132, 30–41 (2014)
Yıldırım, A.A., Özdoğan, C., Watson, D.: Parallel data reduction techniques for big datasets. In: Big Data: Concepts, Methodologies, Tools, and Applications, pp. 734–756. IGI Global (2016)
Zhang, S., Zhang, C., Yang, Q.: Data preparation for data mining. Appl. Artif. Intell. 17(5–6), 375–381 (2003)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Cubillos, M.A.P., Ballesteros, A.J.T. (2023). Instance Selection Techniques for Large Volumes of Data. In: Quaresma, P., Camacho, D., Yin, H., Gonçalves, T., Julian, V., Tallón-Ballesteros, A.J. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2023. IDEAL 2023. Lecture Notes in Computer Science, vol 14404. Springer, Cham. https://doi.org/10.1007/978-3-031-48232-8_49
Download citation
DOI: https://doi.org/10.1007/978-3-031-48232-8_49
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-48231-1
Online ISBN: 978-3-031-48232-8
eBook Packages: Computer ScienceComputer Science (R0)