Abstract
Class overlap in imbalanced datasets is the most common challenging situation for researchers in the fields of deep learning (DL) machine learning (ML), and big data (BD) based applications. Class overlap and imbalance data intrinsic characteristics negatively affect the performance of classification models. The data level, algorithm level, ensemble, and hybrid methods are the most commonly used solutions to reduce the biasing of the standard classification model towards the majority class. The data level methods change the distribution of class instances thus, increasing the information loss and overfitting. The algorithm-level methods attempt to modify its structure which gives more weight to the misclassified minority class instances in the learning phases. However, the changes in the algorithm are less compatible for the users. To overcome the issues in these methods, an in-depth discussion on the state-of-the-art methods is required and thus, presented here. In this survey, we presented a detailed discussion of the existing methods to handle class overlap in imbalanced datasets with their advantages, disadvantages, limitations, and key performance metrics in which the method shown outperformed. The detailed comparative analysis mainly of recent years’ papers discussed and summarized the research gaps and future directions for the researchers in ML, DL, and BD-based applications.
Similar content being viewed by others
Data availability statement
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.
References
Kumar A, Singh D, Yadav RS (2023) Entropy and improved k-nearest neighbor search based under-sampling (ENU) method to handle class overlap in imbalanced datasets. Concurr Comput Pract Exp e7894
Vuttipittayamongkol P, Elyan E, Petrovski A (2021) On the class overlap problem in imbalanced data classification. Knowl-Based Syst 212:106631
Vuttipittayamongkol P, Elyan E (2020) Neighbourhood-based undersampling approach for handling imbalanced and overlapped data. Inf Sci 509:47–70
Bilal M, Maqsood M, Yasmin S, Ul Hasan N, Rho S (2022) A transfer learning-based efficient spatiotemporal human action recognition framework for long and overlapping action classes. J Supercomput 78(2):2873–2908
Ghosh K, Bellinger C, Corizzo R, Krawczyk B, Japkowicz N (2021) On the combined effect of class imbalance and concept complexity in deep learning. In: 2021 IEEE international conference on big data (big data), pp 4859–4868
Zhai J, Wang M, Zhang S (2022) Binary imbalanced big data classification based on fuzzy data reduction and classifier fusion. Soft Comput 26(6):2781–2792
Yin X, Liu Q, Huang X, Pan Y (2022) Perception model of surrounding rock geological conditions based on TBM operational big data and combined unsupervised-supervised learning. Tunn Undergr Space Technol 120:104285
Javaid N, Jan N, Umar Javed M (2021) An adaptive synthesis to handle imbalanced big data with deep siamese network for electricity theft detection in smart grids. J Parallel Distrib Comput 153:44–52
William C, Sleeman IV, Krawczyk B (2021) Multi-class imbalanced big data classification on spark. Knowl Based Syst 212:106598
Maurya CK, Toshniwal D, Venkoparao GV (2016) Online sparse class imbalance learning on big data. Neurocomputing 216:250–260
Wang Z, Xin J, Yang H, Tian S, Yu G, Xu C, Yao Y (2017) Distributed and weighted extreme learning machine for imbalanced big data learning. Tsinghua Sci Technol 22(2):160–173
Johnson JM, Khoshgoftaar TM (2019) Deep learning and data sampling with imbalanced big data. In: 2019 IEEE 20th international conference on information reuse and integration for data science (IRI), pp 175–183
Chatrati SP, Hossain G, Goyal A, Bhan A, Bhattacharya S, Gaurav D, Tiwari SM (2020) Smart home health monitoring system for predicting type 2 diabetes and hypertension. J King Saud Univ-Comput Inf Sci
Liu Y, Luo J, Ding P (2018) Inferring microrna targets based on restricted Boltzmann machines. IEEE J Biomed Health Inform 23(1):427–436
Jayashree R (2022) Enhanced classification using restricted boltzmann machine method in deep learning for covid-19. In: Understanding COVID-19: the role of computational intelligence. Springer, pp 425–446
Mohd Hasri NN, Wen NH, Howe CW, Mohamad MS, Deris S, Kasim S (2017) Improved support vector machine using multiple SVM-RFE for cancer classification. Int J Adv Sci Eng Inf Technol 7(4–2):1589–1594
Yuan X, Xie L, Abouelenien M (2018) A regularized ensemble framework of deep learning for cancer detection from multi-class, imbalanced training data. Pattern Recognit 77:160–172
Gupta S, Kumar M (2021) Prostate cancer prognosis using multi-layer perceptron and class balancing techniques. In: 2021 13th international conference on contemporary computing (IC3-2021), pp 1–6
Ding H, Chen L, Dong L, Fu Z, Cui X (2022) Imbalanced data classification: A KNN and generative adversarial networks-based hybrid approach for intrusion detection. Future Gener Comput Syst 131:240–254
Qu X, Yang L, Guo K, Ma L, Sun M, Ke M, Li M (2021) A survey on the development of self-organizing maps for unsupervised intrusion detection. Mobile Netw Appl 26(2):808–829
Aldwairi T, Perera D, Novotny MA (2018) An evaluation of the performance of restricted Boltzmann machines as a model for anomaly network intrusion detection. Comput Netw 144:111–119
Gupta N, Jindal V, Bedi P (2021) LIO IDS: handling class imbalance using LSTM and improved one-vs-one technique in intrusion detection system. Comput Netw 192:1080–76
Pal A, Kumar M (2019) DLME: distributed log mining using ensemble learning for fault prediction. IEEE Syst J 13(4):3639–3650
Liu S, Jiang H, Wu Z, Li X (2022) Data synthesis using deep feature enhanced generative adversarial networks for rolling bearing imbalanced fault diagnosis. Mechan Syst Signal Process 163:108139
Peng Y, Wang Y, Shao Y (2022) A novel bearing imbalance fault-diagnosis method based on a wasserstein conditional generative adversarial network. Measurement 192:110924
Zhang W, Li X, Jia XD, Ma H, Luo Z, Li X (2020) Machinery fault diagnosis with imbalanced data using deep generative adversarial networks. Measurement 152:107377
Jang J, Kim CO (2022) Unstructured borderline self-organizing map: learning highly imbalanced, high-dimensional datasets for fault detection. Expert Syst Appl 188:116028
Kim JK, Lee JS, Han YS (2019) Fault detection prediction using a deep belief network-based multi-classifier in the semiconductor manufacturing process. Int J Softw Eng Knowl Eng 29:1125–1139
Peng P, Zhang W, Zhang Y, Wang H, Zhang H (2022) Non-revisiting genetic cost-sensitive sparse autoencoder for imbalanced fault diagnosis. Appl Soft Comput 114:108138
Zhao B, Zhang X, Li H, Yang Z (2020) Intelligent fault diagnosis of rolling bearings based on normalized CNN considering data imbalance and variable working conditions. Knowl Based Syst 199:105971
Zhu J, Jiang Q, Shen Y, Qian C, Xu F, Zhu Q (2022) Application of recurrent neural network to mechanical fault diagnosis: a review. J Mechan Sci Technol 36(2):1–16
Liu J, Zhang C, Jiang X (2022) Imbalanced fault diagnosis of rolling bearing using improved MsR-GAN and feature enhancement-driven CapsNet. Mechan Syst Signal Process 168
Dangut MD, Skaf Z, Jennions IK (2022) Handling imbalanced data for aircraft predictive maintenance using the BACHE algorithm. Appl Soft Comput 123:108924
De S, Prabu P (2022) A sampling-based stack framework for imbalanced learning in churn prediction. IEEE Access 10:68017–68028
Toor AA, Usman M (2022) Adaptive telecom churn prediction for concept-sensitive imbalance data streams. J Supercomput 78(3):3746–3774
Kimura T (2022) Customer churn prediction with hybrid resampling and ensemble learning. J Manag Inf Decis Sci 25(1)
Edwine N, Wang W, Song W, Ssebuggwawo D (2022) Detecting the risk of customer churn in telecom sector: a comparative study. Math Probl Eng 2022
Johnson JM, Khoshgoftaar TM (2019) Survey on deep learning with class imbalance. J Big Data 6(1):1–54
Moghar A, Hamiche M (2020) Stock market prediction using LSTM recurrent neural network. Procedia Comput Sci 170:1168–1173
Akşehir ZD, Kiliç E (2022) How to handle data imbalance and feature selection problems in CNN-based stock price forecasting. IEEE Access 10:31297–31305
Wang X, Zhang R, Zhang Z (2022) A novel hybrid sampling method esmote+ sslm for handling the problem of class imbalance with overlap in financial distress detection. Neural Process Lett, pp 1–25
Wu JM-T, Li Z, Srivastava G, Tasi MH, Lin JCW (2021) A graph-based convolutional neural network stock price prediction with leading indicators. Softw Pract Exp 51(3):628–644
Kawintiranon K, Singh L, Budak C (2022) Traditional and context-specific spam detection in low resource settings. Mach Learn 111(7):1–22
Wang G, Wang J, He K (2022) Majority-to-minority resampling for boosting-based classification under imbalanced data. Appl Intell 53(4):1–22
Lingam G, Yasaswini B, Jagadamba PVSL, Kolliboyana N (2022) An improved bot identification with imbalanced data using GG-XGBoost. In: 2022 2nd International conference on intelligent technologies (CONIT), pp 1–6
Hazarika BB, Gupta D (2022) Density weighted twin support vector machines for binary class imbalance learning. Neural Process Lett 54(2):1091–1130
Hossain T, Mauni HZ, Rab R (2022) Reducing the effect of imbalance in text classification using SVD and glove with ensemble and deep learning. Comput Inform 41(1):98–115
Rashid MRU, Mahbub M, Adnan MA (2022) Breaking the curse of class imbalance: bangla text classification. Trans Asian Low-Resour Lang Inf Process 21(5):1–21
Khurana A, Verma OP (2022) Optimal feature selection for imbalanced text classification. IEEE Trans Artif Intell
Wang Z, Wang H (2021) Global data distribution weighted synthetic oversampling technique for imbalanced learning. IEEE Access 9:44770–44783
Epasto A, Lattanzi S, Leme RP (2017) Ego-splitting framework: from non-overlapping to overlapping clusters. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp 145–154
Wang S, Yao X (2009) Diversity analysis on imbalanced data sets by using ensemble models. In: 2009 IEEE symposium on computational intelligence and data mining, pp 324–331
Lu Y, Cheung Y-M, Tang YY (2016) Hybrid sampling with bagging for class imbalance learning. In: Pacific-Asia conference on knowledge discovery and data mining, pp 14–26
Wang S, Yao X (2009) Diversity analysis on imbalanced data sets by using ensemble models. In: 2009 IEEE symposium on computational intelligence and data mining, pp 324–331
Zhao Y, Liu S, Hu Z (2022) Focal learning on stranger for imbalanced image segmentation. IET Image Process 16(5):1305–1323
Ruwani K, Fernando M, Tsokos CP (2021) Dynamically weighted balanced loss: class imbalanced learning and confidence calibration of deep neural networks. IEEE Trans Neural Netw Learn Syst
Jeong JJ, Tariq A, Adejumo T, Trivedi H, Gichoya JW, Banerjee I (2022) Systematic review of generative adversarial networks (GANs) for medical image classification and segmentation. J Digit Imag 35:1–16
Stoyanov D, Taylor Z, Carneiro G, Syeda-Mahmood T, Martel A, Maier-Hein L, Tavares JMRS, Bradley A, Papa JP, Belagiannis V et al (2018) Deep learning in medical image analysis and multimodal learning for clinical decision support. In: 4th International workshop, DLMIA 2018, and 8th international workshop, ML-CDS 2018, Held in conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings, vol 11045. Springer
Akil M, Saouli R, Kachouri R et al (2020) Fully automatic brain tumor segmentation with deep learning-based selective attention using overlapping patches and multi-class weighted cross-entropy. Med Image Anal 63:101692
Nyo MT, Mebarek-Oudina F, Hlaing SS, Khan NA (2022) Otsu’s thresholding technique for mri image brain tumor segmentation. Multimedia Tools Appl 81(30):43837–43849
Sampath V, Maurtua I, Aguilar Martín JJ, Gutierrez A (2021) A survey on generative adversarial networks for imbalance problems in computer vision tasks. J Big Data 8:1–59
Fendri E, Hammami M (2022) Imbalanced learning for robust moving object classification in video surveillance applications. In: Intelligent systems design and applications: 21st international conference on intelligent systems design and applications (ISDA 2021) held during december 13–15, 2021. Springer, vol 418, pp 199
Zhang Y, Lin M, Yang Y, Ding C (2022) A hybrid ensemble and evolutionary algorithm for imbalanced classification and its application on bioinformatics. Comput Biol Chem 98:107646
Dou L, Yang F, Xu L, Zou Q (2021) A comprehensive review of the imbalance classification of protein post-translational modifications. Brief Bioinform 22(5):bbab089
Thavappiragasam M, Kale V, Hernandez O, Sedova A (2021) Addressing load imbalance in bioinformatics and biomedical applications: efficient scheduling across multiple GPUs. In: 2021 IEEE international conference on bioinformatics and biomedicine (BIBM), pp 1992–1999
Chen J, Yang R, Zhang C, Zhang L, Zhang Q (2019) DeepGly: a deep learning framework with recurrent and convolutional neural networks to identify protein glycation sites from imbalanced data. IEEE Access 7:142368–142378
Greene CS, Himmelstein DS, Kiralis J, Moore JH (2010) The informative extremes: using both nearest and farthest individuals can improve relief algorithms in the domain of human genetics. In: European conference on evolutionary computation, machine learning and data mining in bioinformatics, pp 182–193
Greene CS, Penrod NM, Kiralis J, Moore JH (2009) Spatially uniform relieff (SURF) for computationally-efficient filtering of gene-gene interactions. BioData Min 2(1):1–9
Djenouri Y, Belhadi A, Srivastava G, Lin JCW (2021) Secure collaborative augmented reality framework for biomedical informatics. IEEE J Biomed Health Inform 26(6):2417–2424
Chen L, Fang B, Shang Z, Tang Y (2018) Tackling class overlap and imbalance problems in software defect prediction. Softw Qual J 26(1):97–125
Goyal S (2022) Handling class-imbalance with KNN (neighbourhood) under-sampling for software defect prediction. Artif Intell Rev 55(3):2023–2064
Manchala P, Bisi M (2022) Diversity based imbalance learning approach for software fault prediction using machine learning models. Appl Soft Comput 124:109069
Yin J, Tang MJ, Cao J, Wang H, You M, Lin Y (2022) Vulnerability exploitation time prediction: an integrated framework for dynamic imbalanced learning. World Wide Web 25(1):401–423
Lu S, Gao Z, Xu Q, Jiang C, Zhang A, Wang X (2022) Class-imbalance privacy-preserving federated learning for decentralized fault diagnosis with biometric authentication. IEEE Trans Ind Inform
Sun M, Yang R, Liu M (2022) Privacy-preserving minority oversampling protocols with fully homomorphic encryption. Secur Commun Netw 2022
Singh K, Mahajan A, Mansotra V (2022) Deep learning approach based on ADASYN for detection of web attacks in the CICIDS2017 dataset. In: Rising threats in expert applications and solutions. Springer, pp 53–62
Le TTH, Oktian YE, Kim H (2022) Xgboost for imbalanced multiclass classification-based industrial internet of things intrusion detection systems. Sustainability 14(14):8707
Zhang S, Yin J, Li Z, Yang R, Du M, Li R (2022) Node-imbalance learning on heterogeneous graph for pirated video website detection. In: 2022 IEEE 25th international conference on computer supported cooperative work in design (CSCWD). IEEE, pp 834–840
Santos MS, Abreu PH, Japkowicz N, Fernández A, Soares C, Wilk S, Santos J (2022) On the joint-effect of class imbalance and overlap: a critical review. Artif Intell Rev 55(8):1–69
Santos MS, Abreu PH, Japkowicz N, Fernández A, Santos J (2022) A unifying view of class overlap and imbalance: key concepts, multi-view panorama, and open avenues for research. Inf Fusion
Branco P, Torgo L, Ribeiro RP (2016) A survey of predictive modeling on imbalanced domains. ACM Comput Surv (CSUR) 49(2):1–50
Rout N, Mishra D, Mallick MK (2018) Handling imbalanced data: a survey. In: International proceedings on advances in soft computing, intelligent systems and applications. Springer, pp 431–443
Kaur H, Pannu HS, Malhi AK (2019) A systematic review on imbalanced data challenges in machine learning: applications and solutions. ACM Comput Surv (CSUR) 52(4):1–36
Xiong H, Wu J, Liu L (2010) Classification with class overlapping: a systematic study. In: 2010 International conference on E-business intelligence, pp 491–497
Liu X, Fu L, Lin JCW, Liu S (2022) SRAS-net: low-resolution chromosome image classification based on deep learning. IET Syst Biol 16(3–4):85–97
Tian C, Zhang X, Lin JCW, Zuo W, Zhang Y, Lin CW (2022) Generative adversarial networks for image super-resolution: a survey. arXiv:2204.13620
Mezair T, Djenouri Y, Belhadi A, Srivastava G, Lin JCW (2022) A sustainable deep learning framework for fault detection in 6G industry 4.0 heterogeneous data environments. Comput Commun 187:164–171
Akondi VS, Menon V, Baudry J, Whittle J (2022) Novel big data-driven machine learning models for drug discovery application. Molecules 27(3):594
Khattak A, Bukhsh R, Aslam S, Yafoz A, Alghushairy O, Alsini R (2022) A hybrid deep learning-based model for detection of electricity losses using big data in power systems. Sustainability 14(20):13627
Hewamalage H, Bergmeir C, Bandara K (2021) Recurrent neural networks for time series forecasting: current status and future directions. Int J Forecast 37:388–427
Das S, Datta S, Chaudhuri BB (2018) Handling data irregularities in classification: foundations, trends, and future challenges. Pattern Recognit 81:674–693
Napierała K, Stefanowski J, Wilk S (2010) Learning from imbalanced data in presence of noisy and borderline examples. In: International conference on rough sets and current trends in computing. Springer, pp 158–167
López V, Fernández A, García S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci 250:113–141
Stefanowski J (2016) Dealing with data difficulty factors while learning from imbalanced data. In: Challenges in computational statistics and data mining. Springer, pp 333–363
Wojciechowski S, Wilk S (2017) Difficulty factors and preprocessing in imbalanced data sets: an experimental study on artificial data. Found Comput Decis Sci 42(2):149–176
García V, Mollineda RA, Sánchez JS (2008) On the k-NN performance in a challenging scenario of imbalance and overlapping. Pattern Anal Appl 11(3):269–280
Lee HK, Kim SB (2018) An overlap-sensitive margin classifier for imbalanced and overlapping data. Expert Syst Appl 98:72–83
Das B, Krishnan NC, Cook DJ (2014) Handling imbalanced and overlapping classes in smart environments prompting dataset. In: Data Min Serv. Springer, pp 199–219
Pascual-Triana JD, Charte D, Arroyo MA, Fernández A, Herrera F (2021) Revisiting data complexity metrics based on morphology for overlap and imbalance: snapshot, new overlap number of balls metrics and singular problems prospect. Knowl Inf Syst 63:1–29
Vuttipittayamongkol P, Elyan E (2020) Improved overlap-based undersampling for imbalanced dataset classification with application to Epilepsy and Parkinson’s disease. Int J Neural Syst 30(08):2050043
Dkhar RA, Nath K, Roy S, Bhattacharyya DK, Nandi S (2016) Evaluating the effectiveness of soft k-means in detecting overlapping clusters. In: Proceedings of the 2nd international conference on information and communication technology for competitive strategies, pp 1–6
Tao X, Chen W, Zhang X, Guo W, Qi L, Fan Z (2021) SVDD boundary and DPC clustering technique-based oversampling approach for handling imbalanced and overlapped data. Knowl Based Syst 234:107588
Xiong H, Li M, Jiang T, Zhao S (2013) Classification algorithm based on nb for class overlapping problem. Appl Math 7(2L):409–415
Tung NT, Dieu VH, Than K, Linh NV (2018) Reducing class overlapping in supervised dimension reduction. In: Proceedings of the 9th international symposium on information and communication technology, pp 8–15
Fernandes ERQ, De Carvalho AC (2019) Evolutionary inversion of class distribution in overlapping areas for multi-class imbalanced learning. Inf Sci 494:141–154
Li Z, Huang M, Liu G, Jiang C (2021)A hybrid method with dynamic weighted entropy for handling the problem of class imbalance with overlap in credit card fraud detection. Expert Syst Appl 175:114750
Wong ML, Seng K, Wong PK (2020) Cost-sensitive ensemble of stacked denoising autoencoders for class imbalance problems in business domain. Expert Syst Appl 141:112918
Rogić S, Kašćelan L, Bach MP (2022) Customer response model in direct marketing: solving the problem of unbalanced dataset with a balanced support vector machine. J Theor Appl Electron Commer Res 17(3):1003–1018
Zhu B, Pan X, Vanden Broucke S, Xiao J (2022) A GAN-based hybrid sampling method for imbalanced customer classification. Inf Sci 609:1397–1411
Ntomaris AV, Marneris IG, Biskas PN, Bakirtzis AG (2022) Optimal participation of RES aggregators in electricity markets under main imbalance pricing schemes: price taker and price maker approach. Electr Power Syst Res 206:107786
Lee D, Kim K (2022) Business transaction recommendation for discovering potential business partners using deep learning. Expert Syst Appl 201:117222
Garcia J (2022) Bankruptcy prediction using synthetic sampling. Mach Learn Appl 9:100343
Rodić LD, Perković T, Škiljo M, Šolić P (2022) Privacy leakage of lorawan smart parking occupancy sensors. Future Gener Comput Syst
Vuttipittayamongkol P, Elyan E (2020) Overlap-based undersampling method for classification of imbalanced medical datasets. In: Maglogiannis I, Iliadis L, Pimenidis E (eds) Artificial intelligence applications and innovations. Springer, Cham, pp 358–369
Zhang R, Zhang Z, Wang D (2021) RFCL: a new under-sampling method of reducing the degree of imbalance and overlap. Pattern Anal Appl 24(2):641–654
Devi D, Biswas SK, Purkayastha B (2019) Learning in presence of class imbalance and class overlapping by using one-class SVM and undersampling technique. Connect Sci 31(2):105–142
Vuttipittayamongkol P, Elyan E, Petrovski A, Jayne C (2018) Overlap-based undersampling for improving imbalanced data classification. In: International conference on intelligent data engineering and automated learning. Springer, pp 689–697
Ibrahim MH (2021) ODBOT: outlier detection-based oversampling technique for imbalanced datasets learning. Neural Comput Appl 33:15781–15806
Tao X, Zheng Y, Chen W, Zhang X, Qi L, Fan Z, Huang S (2022) SVDD-based weighted oversampling technique for imbalanced and overlapped dataset learning. Inf Sci 588:13–51
Zhu Y, Yan Y, Zhang Y, Zhang Y (2020) EHSO: evolutionary hybrid sampling in overlapping scenarios for imbalanced learning. Neurocomputing 417:333–346
Maldonado S, Vairetti C, Fernandez A, Herrera F (2022) FW-SMOTE: a feature-weighted oversampling approach for imbalanced classification. Pattern Recognit 124:108511
Tax DMJ, Duin RPW (2004) Support vector data description. Mach Learn 54(1):45–66
Mayabadi S, Saadatfar H (2022) Two density-based sampling approaches for imbalanced and overlapping data. Knowl Based Syst 241:108217
Zian S, Kareem SA, Varathan KD (2021) An empirical evaluation of stacked ensembles with different meta-learners in imbalanced classification. IEEE Access
Sumana BV, Punithavalli M (2020) Optimising prediction in overlapping and non-overlapping regions. Int J Nat Comput Res (IJNCR) 9(1):45–63
Gupta S, Gupta A (2018) Handling class overlapping to detect noisy instances in classification. Knowl Eng Rev 33
Chujai P, Chomboon K, Chaiyakhan K, Kerdprasop K, Kerdprasop N (2017) A cluster based classification of imbalanced data with overlapping regions between classes. Proceedings of the international multiconference of engineers and computer scientists 1:353–358
Liu C, Ren Y, Liang M, Gu Z, Wang J, Pan L, Wang Z (2020) Detecting overlapping data in system logs based on ensemble learning method. Wireless Commun Mobile Comput 2020:1–8
De Miguel L, Gómez D, Rodríguez JT, Montero J, Bustince H, Dimuro GP, Sanz JA (2019) General overlap functions. Fuzzy Sets Syst 372:81–96
Elkan C (2001) The foundations of cost-sensitive learning. International joint conference on artificial intelligence, vol 17. Lawrence Erlbaum Associates Ltd, Mahwah, pp 973–978
Xia Y, Liu C, Liu N (2017) Cost-sensitive boosted tree for loan evaluation in peer-to-peer lending. Electron Commer Res Appl 24:30–49
Yang S, Korayem M, AlJadda K, Grainger T, Natarajan S (2017) Combining content-based and collaborative filtering for job recommendation system: a cost-sensitive statistical relational learning approach. Knowl Based Syst 136:37–45
Yuan BW, Luo XG, Zhang ZL, Yu Y, Huo HW, Johannes T, Zou XD (2021) A novel density-based adaptive k nearest neighbor method for dealing with overlapping problem in imbalanced datasets. Neural Comput Appl 33(9):4457–4481
Rubbo M, Silv LA (2021) Filtering-based instance selection method for overlapping problem in imbalanced datasets. J 4(3):308–327
Zhang N, Karimoune W, Thompson L, Dang H (2017) A between-class overlapping coherence-based algorithm in KNN classification. In: 2017 IEEE international conference on systems, man, and cybernetics (SMC). IEEE, pp 572–577
Gu Y, Cheng L (2017) Classification of class overlapping datasets by kernel-MTS method. Int J Innovat Comput Inf Control 13(5):1759–1767
Afridi MK, Azam N, Yao J (2020) Variance based three-way clustering approaches for handling overlapping clustering. Int J Approx Reason 118:47–63
Li H, Zhang L, Zhou X, Huang B (2017) Cost-sensitive sequential three-way decision modeling using a deep neural network. Int J Approx Reason 85:68–78
Lee HK, Kim SB (2018) An overlap-sensitive margin classifier for imbalanced and overlapping data. Expert Syst Appl 98:72–83
Mienye ID, Sun Y (2021) Performance analysis of cost-sensitive learning methods with application to imbalanced medical data. Inf Med Unlocked 25:100690
Lin X, Li C, Zhang Y, Su B, Fan M, Wei H (2018) Selecting feature subsets based on svm-rfe and the overlapping ratio with applications in bioinformatics. Molecules 23(1):52
Akhter S, Sharmin S, Ahmed S, Sajib AA, Shoyaib M (2021) mRelief: a reward penalty based feature subset selection considering data overlapping problem. In: International conference on computational science. Springer, pp 278–292
Omar B, Rustam F, Mehmood A, Choi GS (2021) Minimizing the overlapping degree to improve class-imbalanced learning under sparse feature selection: application to fraud detection. IEEE Access 9:28101–28110
Alshomrani S, Bawakid A, Shim Seong-O, Fernández A, Herrera F (2015) A proposal for evolutionary fuzzy systems using feature weighting: dealing with overlapping in imbalanced datasets. Knowl Based Syst 73:1–17
Zhang Y, Cheng S, Shi Y, Gong DW, Zhao X (2019) Cost-sensitive feature selection using two-archive multi-objective artificial bee colony algorithm. Expert Syst Appl 137:46–58
Sáez JA, Galar M, Krawczyk B (2019) Addressing the overlapping data problem in classification using the one-vs-one decomposition strategy. IEEE Access 7:83396–83411
Shahee SA, Ananthakumar U (2021) An overlap sensitive neural network for class imbalanced data. Data Min Knowl Discov 35(4):1–34
Yuan BW, Zhang ZL, Luo XG, Yu Y, Zou XH, Zou XD (2021) OIS-RF: a novel overlap and imbalance sensitive random forest. Eng Appl Artif Intell 104:104355
Nwe MM, Lynn KT (2019) kNN-based overlapping samples filter approach for classification of imbalanced data. In: International conference on software engineering research, management and applications. Springer, pp 55–73
Yan Y, Jiang Y, Zheng Z, Yu C, Zhang Y, Zhang Y (2022) LDAS: local density-based adaptive sampling for imbalanced data classification. Expert Syst Appl 191:116213
Roy A, Cruz RM, Sabourin R, Cavalcanti GD (2018) A study on combining dynamic selection and data preprocessing for imbalance learning. Neurocomputing 286:179–192
Pouyanfar S, Sadiq S, Yan Y, Tian H, Tao Y, Reyes MP, Shyu ML, Chen SC, Iyengar SS (2018) A survey on deep learning: algorithms, techniques, and applications. ACM Comput Surv (CSUR) 51:1–36
Tong K, Wu Y (2022) Deep learning-based detection from the perspective of small or tiny objects: a survey. Image Vis Comput 123:104471
Liu Z, Tong L, Jiang Z, Chen L, Zhou F, Zhang Q, Zhang X, Jin Y, Zhou H (2020) Deep learning based brain tumor segmentation: a survey. Preprint at https://arxiv.org/abs/2007.09479
Wong LJ, Headley WC, Michaels AJ (2019) Specific emitter identification using convolutional neural network-based IQ imbalance estimators. IEEE Access 7:33544–33555
Chen Z, Duan J, Kang L, Qiu G (2021) Class-imbalanced deep learning via a class-balanced ensemble. IEEE Trans Neural Netw Learn Syst
Yan Y, Chen M, Shyu ML, Chen SC (2015) Deep learning for imbalanced multimedia data classification. In: 2015 IEEE international symposium on multimedia (ISM). IEEE, pp 483–488
Böhm A, Ücker A, Jäger T, Ronneberger O, Falk T (2018) ISOO_DL: Instance segmentation of overlapping biological objects using deep learning. In: 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018). IEEE, pp 1225–1229
Banerjee I, Ling Y, Chen MC, Hasan SA, Langlotz CP, Moradzadeh N, Chapman B, Amrhein T, Mong D, Rubin DL (2019) Comparative effectiveness of convolutional neural network (CNN) and recurrent neural network (RNN) architectures for radiology text report classification. Artif Intell Med 97:79–88
Gao L, Lu P, Ren Y (2021) A deep learning approach for imbalanced crash data in predicting highway-rail grade crossings accidents. Reliab Eng Syst Saf 216:108019
Rai HM, Chatterjee K (2022) Hybrid CNN LSTM deep learning model and ensemble technique for automatic detection of myocardial infarction using big ECG data. Appl Intell 52(5):5366–5384
Gao J, Zhang H, Lu P, Wang Z (2019) An effective LSTM recurrent network to detect arrhythmia on imbalanced ecg dataset. J Healthc Eng
Tran D, Mac H, Tong V, Tran HA, Nguyen LG (2018) A LSTM based framework for handling multiclass imbalance in DGA botnet detection. Neurocomputing 275:2401–2413
Dong Q, Gong S, Zhu X (2018) Imbalanced deep learning by minority class incremental rectification. IEEE Trans Pattern Anal Mach Intell 41(6):1367–1381
Lin E, Chen Q, Qi X (2020) Deep reinforcement learning for imbalanced classification. Appl Intell 50(8):2488–2502
Wang S, Liu W, Wu J, Cao L, Meng Q, Kennedy PJ (2016) Training deep neural networks on imbalanced data sets. In: 2016 International joint conference on neural networks (IJCNN). IEEE, pp 4368–4374
Zhang C, Tan KC, Li H, Hong GS (2018) A cost-sensitive deep belief network for imbalanced classification. IEEE Trans Neural Netw Learn Syst 30(1):109–122
Dablain D, Krawczyk B, Chawla NV (2022) DeepSMOTE: fusing deep learning and smote for imbalanced data. IEEE Trans Neural Netw Learn Syst
Andrei V, Cucu H, Burileanu C (2019) Overlapped speech detection and competing speaker counting–humans versus deep learning. IEEE J Sel Topics Signal Process 13(4):850–862
Alia A, Maree M, Chraibi M (2022) A hybrid deep learning and visualization framework for pushing behavior detection in pedestrian dynamics. Sensors 22(11):4040
Wang X, Jing L, Lyu Y, Guo M, Wang J, Liu H, Yu J, Zeng T (2022) Deep generative mixture model for robust imbalance classification. IEEE Trans Pattern Anal Mach Intell
Yue X, Li H, Fujikawa Y, Meng L (2022) Dynamic dataset augmentation for deep learning-based oracle bone inscriptions recognition. J Comput Cult Herit (JOCCH)
Liu T, Bao J, Wang J, Wang J (2021) Deep learning for industrial image: challenges, methods for enriching the sample space and restricting the hypothesis space, and possible issue. Int J Comput Integr Manuf 35:1–30
ArunKumar KE, Kalaga DV, Kumar CMS, Kawaji M, Brenza TM (2021) Forecasting of covid-19 using deep layer recurrent neural networks (RNNs) with gated recurrent units (GRUs) and long short term memory (LSTM) cells. Chaos, Solitons Fractals 146:110861
Zhang Q, Wang W, Zhu SC (2018) Examining cnn representations with respect to dataset bias. In: Proceedings of the AAAI conference on artificial intelligence, vol 32
Doshi-Velez F, Kim B (2017) Towards a rigorous science of interpretable machine learning. arXiv:1702.08608
Ibrahim M, Louie M, Modarres C, Paisley J (2019) Global explanations of neural networks: mapping the landscape of predictions. In: Proceedings of the 2019 AAAI/ACM conference on AI, ethics, and society, pp 279–287
Wu JMT, Li Z, Herencsar N, Vo B, Lin JCW (2021) A graph-based CNN-LSTM stock price prediction algorithm with leading indicators. Multimedia Syst 29:1–20
Sherstinsky A (2020) Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Phys D Nonlin Phenom 404:132306
Chen MY, Chiang HS, Huang WK (2022) Efficient generative adversarial networks for imbalanced traffic collision datasets. IEEE Trans Intell Transp Syst
Lee HK, Lee J, Kim SB (2022) Boundary-focused generative adversarial networks for imbalanced and multimodal time series. IEEE Trans Knowl Data Eng
Li W, Chen J, Cao J, Ma C, Wang J, Cui X, Chen P (2022) EID-GAN: generative adversarial nets for extremely imbalanced data augmentation. IEEE Trans Ind Inform
Gao S, Dai Y, Li Y, Liu K, Chen K, Liu Y (2022) Multiview wasserstein generative adversarial network for imbalanced pearl classification. Meas Sci Technol 33(8):085406
Suh S, Lee H, Lukowicz P, Lee YO (2021) CEGAN: classification enhancement generative adversarial networks for unraveling data imbalance problems. Neural Netw 133:69–86
De Oliveira Nogueira T, Palacio GBA, Braga FD, Maia PPN, De Moura EP, De Andrade CF, Rocha PAC (2022) Imbalance classification in a scaled-down wind turbine using radial basis function kernel and support vector machines. Energy 238:122064
Satapathy SK, Mishra S, Mallick PK, Chae GS (2021) ADASYN and ABC-optimized RBF convergence network for classification of electroencephalograph signal. Pers Ubiquitous Comput 27:1–17
Zhang D, Zhang N, Ye N, Fang J, Han X (2020) Hybrid learning algorithm of radial basis function networks for reliability analysis. IEEE Trans Reliab 70(3):887–900
Kamaruddin SK, Ravi V (2019) A parallel and distributed radial basis function network for big data analytics. In: TENCON 2019-2019 IEEE Region 10 Conference (TENCON). IEEE, pp 395–399
Akter S, Das D, Haque RU, Tonmoy MIQ, Hasan MR, Mahjabeen S, Ahmed M (2022) AD-covNet: an exploratory analysis using a hybrid deep learning model to handle data imbalance, predict fatality, and risk factors in Alzheimer’s patients with covid-19. Comput Biol Med 146:105657
Ram PK, Kuila P (2022) GAAE: a novel genetic algorithm based on autoencoder with ensemble classifiers for imbalanced healthcare data. J Supercomput 79:1–32
Hassib EM, El-Desouky AI, Labib LM, El-Kenawy ESM (2020) WOA+BRNN: an imbalanced big data classification framework using whale optimization and deep neural network. Soft Comput 24(8):5573–5592
Dumas J, Boukas I, De Villena MM, Mathieu S, Cornélusse B (2019) Probabilistic forecasting of imbalance prices in the Belgian context. In: 2019 16th International conference on the European energy market (EEM). IEEE, pp 1–7
Ghanem WA, Jantan A (2018) A cognitively inspired hybridization of artificial bee colony and dragonfly algorithms for training multi-layer perceptrons. Cogn Comput 10(6):1096–1134
Zhu G, Wu X, Ge J, Liu F, Zhao W, Wu C (2020) Influence of mining activities on groundwater hydrochemistry and heavy metal migration using a self-organizing map (SOM). J Clean Prod 257:120664
Hameed AA, Karlik B, Salman MS, Eleyan G (2019) Robust adaptive learning approach to self-organizing maps. Knowl Based Syst 171:25–36
Huysmans D, Smets E, De Raedt W, Van Hoof C, Bogaerts K, Van Diest I, Helic D (2018) Unsupervised learning for mental stress detection-exploration of self-organizing maps. Proceedings of the 11th international joint conference on biomedical engineering systems and technologies, vol 4, pp 26–35
Xie H, Wu L, Xie W, Lin Q, Liu M, Lin Y (2021) Improving ECMWF short-term intensive rainfall forecasts using generative adversarial nets and deep belief networks. Atmos Res 249:105281
Vinayakumar R, Alazab M, Srinivasan S, Pham QV, Padannayil SK, Simran K (2020) A visualized botnet detection system based deep learning for the internet of things networks of smart cities. IEEE Trans Ind Appl 56:4436–4456
Leonelli FE, Agliari E, Albanese L, Barra A (2021) On the effective initialisation for restricted Boltzmann machines via duality with Hopfield model. Neural Netw 143:314–326
Savitha R, Ambikapathi A, Rajaraman K (2020) Online RBM: growing restricted boltzmann machine on the fly for unsupervised representation. Appl Soft Comput 92:106278
Huang K, Wang X (2022) ADA-INCVAE: improved data generation using variational autoencoder for imbalanced classification. Appl Intell 52(3):2838–2853
Chen J, Wu Z, Zhang J (2019) Driving safety risk prediction using cost-sensitive with nonnegativity-constrained autoencoders based on imbalanced naturalistic driving data. IEEE Trans Intell Transp Syst 20(12):4450–4465
Alhassan Z, Budgen D, Alshammari R, Daghstani T, McGough AS, Al Moubayed N (2018) Stacked denoising autoencoders for mortality risk prediction using imbalanced clinical data. In: 2018 17th IEEE international conference on machine learning and applications (ICMLA). IEEE, pp 541–546
Johnson JM, Khoshgoftaar TM (2020) The effects of data sampling with deep learning and highly imbalanced big data. Inf Syst Front 22(5):1113–1131
Yan M, Li N (2022) Borderline-margin loss based deep metric learning framework for imbalanced data. Appl Intell 53:1–18
Lin WC, Tsai CF, Hu YH, Jhang JS (2017) Clustering-based undersampling in class-imbalanced data. Inf Sci 409:17–26
Vannucci M, Colla V (2018) Self–organizing–maps based undersampling for the classification of unbalanced datasets. In: 2018 International joint conference on neural networks (IJCNN). IEEE, pp 1–6
Tsai CF, Lin WC, Hu YH, Yao GT (2019) Under-sampling class imbalanced datasets by combining clustering analysis and instance selection. Inf Sci 477:47–54
More A (2016) Survey of resampling techniques for improving classification performance in unbalanced datasets. arXiv:1608.06048
Yang Z, Gao D (2013) Classification for imbalanced and overlapping classes using outlier detection and sampling techniques. Appl Math Inf Sci 7(1):375–381
Douzas G, Bacao F, Last F (2018) Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE. Inf Sci 465:1–20
Barua S, Islam MM, Yao X, Murase K (2012) MWMOTE-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans Knowl Data Eng 26(2):405–425
He H, Bai Y et al (2008) ADASYN: adaptive synthetic sampling for imbalanced data. In: IEEE international joint conference on neural networks (IEEE world congress on computational intelligence), vol 69. https://doi.org/10.1109/ijcnn
Ren R, Yang Y, Sun L (2020) Oversampling technique based on fuzzy representativeness difference for classifying imbalanced data. Appl Intell 50(8):2465–2487
Elyan E, Moreno-Garcia CF, Jayne C (2021) CDSMOTE: class decomposition and synthetic minority class oversampling technique for imbalanced-data classification. Neural Comput Appl 33(7):2839–2851
Liu G, Yang Y, Li B (2018) Fuzzy rule-based oversampling technique for imbalanced and incomplete data learning. Knowl Based Syst 158:154–174
Koziarski M, Krawczyk B, Wozniak M (2019) Radial-based oversampling for noisy imbalanced data classification. Neurocomputing 343:19–33
Yan Y, Liu R, Ding Z, Du X, Chen J, Zhang Y (2019) A parameter-free cleaning method for SMOTE in imbalanced classification. IEEE Access 7:23537–23548
Patel H, Thakur GS (2016) A hybrid weighted nearest neighbor approach to mine imbalanced data. In: Proceedings of the international conference on data science (ICDATA), The steering committee of the world congress in computer, science, Computer, pp 106
Tang B, He H (2015) ENN: extended nearest neighbor method for pattern recognition [research frontier]. IEEE Comput Intell Mag 10(3):52–60
Wang P, Yao Y (2018) CE3: a three-way clustering method based on mathematical morphology. Knowl Based Syst 155:54–65
Masson MH, Denoeux T (2009) RECM: relational evidential c-means algorithm. Pattern Recognit Lett 30(11):1015–1026
Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newsl 6(1):20–29
Fan Q, Wang Z, Li D, Gao D, Zha H (2017) Entropy-based fuzzy support vector machine for imbalanced datasets. Knowl Based Syst 115:87–99
Zhu C, Wang Z (2017) Entropy-based matrix learning machine for imbalanced data sets. Pattern Recognit Lett 88:72–80
Wang BX, Japkowicz N (2010) Boosting support vector machines for imbalanced data sets. Knowl Inf Syst 25(1):1–20
Ju H, Li H, Yang X, Zhou X, Huang B (2017) Cost-sensitive rough set: a multi-granulation approach. Knowl Based Syst 123:137–153
Ju H, Yang X, Yu H, Li T, Yu DJ, Yang J (2016) Cost-sensitive rough set approach. Inf Sci 355:282–298
Cabitza F, Ciucci D, Locoro A (2017) Exploiting collective knowledge with three-way decision theory: cases from the questionnaire-based research. Int J Approx Reason 83:356–370
Maulidevi NU, Surendro K (2021) SMOTE-LOF for noise identification in imbalanced data classification. J King Saud Univ Comput Inf Sci
Armano G, Tamponi E (2018) Building forests of local trees. Pattern Recognit 76:380–390
Galar M, Fernández A, Barrenechea E, Herrera F (2013) EUSboost: enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recognit 46(12):3460–3471
Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2011) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans Syst, Man, Cybern, Part C (Appl Rev) 42(4):463–484
Sesmero MP, Ledezma AI, Sanchis A (2015) Generating ensembles of heterogeneous classifiers using stacked generalization. Wiley Interdiscip Rev Data Min Knowl Discov 5(1):21–34
Kim S, Zhang H, Wu R, Gong L (2011) Dealing with noise in defect prediction. In: 2011 33rd International conference on software engineering (ICSE). IEEE, pp 481–490
Tang W, Khoshgoftaar TM (2004) Noise identification with the k-means algorithm. In: 16th IEEE international conference on tools with artificial intelligence. IEEE, pp 373–378
Sundqvist T, Bhuyan MH, Forsman J, Elmroth E (2020) Boosted ensemble learning for anomaly detection in 5G RAN. In: IFIP international conference on artificial intelligence applications and innovations. Springer, pp 15–30
Seiffert C, Khoshgoftaar TM, Van Hulse J, Napolitano A (2009) RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans Syst Man Cybern Part A Syst Hum 40(1):185–197
Tosin MC, Majolo M, Chedid R, Cene VH, Balbinot A (2017) sEMG feature selection and classification using SVM-RFE. In: 2017 39th Annual international conference of the IEEE engineering in medicine and biology society (EMBC). IEEE, pp 390–393
Alcala-Fdez J, Alcala R, Herrera F (2011) A fuzzy association rule-based classification model for high-dimensional problems with genetic rule selection and lateral tuning. IEEE Trans Fuzzy Syst 19(5):857–872
Akhter S, Sharmin S, Ahmed S, Sajib AA, Shoyaib M (2021) mRelief: a reward penalty based feature subset selection considering data overlapping problem. In: International conference on computational science. Springer, pp 278–292
Min F, Hu Q, Zhu W (2014) Feature selection with test cost constraint. Int J Approx Reason 55(1):167–179
Zhao H, Wang P, Hu Q (2016) Cost-sensitive feature selection based on adaptive neighborhood granularity with multi-level confidence. Inf Sci 366:134–149
Emekter R, Tu Y, Jirasakuldech B, Lu M (2015) Evaluating credit risk and loan performance in online peer-to-peer (P2P) lending. Appl Econ 47(1):54–70
Vorraboot P, Rasmequan S, Chinnasarn K, Lursinsap C (2015) Improving classification rate constrained to imbalanced data between overlapped and non-overlapped regions by hybrid algorithms. Neurocomputing 152:429–443
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interests/Competing interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kumar, A., Singh, D. & Shankar Yadav, R. Class overlap handling methods in imbalanced domain: A comprehensive survey. Multimed Tools Appl 83, 63243–63290 (2024). https://doi.org/10.1007/s11042-023-17864-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-17864-8