Abstract
Neural networks handling data imbalance heavily rely on resampling or reweighting strategies. However, existing resampling and reweighting approaches mainly focus on rebalancing known data, which ignore the essence of the data imbalance problem, namely, the problem of insufficient empirical representation of the minority class caused by the small number of samples. Therefore, we propose a new solution for neural networks classifying imbalanced data by sampling absent minority class samples. Specifically, an improved Metropolis Hasting (IMH) algorithm is developed to sample absent minority class samples by collecting samples rejected by the majority class approximation process. The sampled absent minority samples are then provided to neural networks to address the data imbalance problem. For IMH, in order to accelerate the sampling process and reduce the vague class definition of the sampled minority class samples, line segment transition kernel and class probability constraint are proposed. For neural networks, two boundary shifting strategies are supported to operate on different application modes of sampled absent minority class samples. In experiments, the proposed method is validated on 34 imbalanced datasets. Comparable AUC, G-MEAN, and MACC results are achieved. These results demonstrate the effectiveness of sampling absent minority class samples for neural networks solving the imbalanced data problem.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Patel H, Thakur G (2019) An improved fuzzy k-nearest neighbor algorithm for imbalanced data using adaptive approach. IETE J Res 65(6):780–789
Patel H, Singh Rajput D, Thippa Reddy G, Iwendi C, Kashif Bashir A, Jo O (2020) A review on classification of imbalanced data for wireless sensor networks. Int J Distrib Sens Netw 16(4):1550147720916404
Hannun AY, Rajpurkar P, Haghpanahi M, Tison GH, Bourn C, Turakhia MP, Ng AY (2019) Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nat Med 25(1):65–69
Polat K (2018) Similarity-based attribute weighting methods via clustering algorithms in the classification of imbalanced medical datasets. Neural Comput Appl 30(3):987–1013
Zhang C, Song D, Chen Y, Feng X, Lumezanu C, Cheng W, Ni J, Zong B, Chen H, Chawla NV (2019) A deep neural network for unsupervised anomaly detection and diagnosis in multivariate time series data. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 1409–1416
Wang Z, Wang H, Chen T, Wang Z, Ma K (2021) Troubleshooting blind image quality models in the wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 16256–16265
Tavana M, Abtahi A-R, Di Caprio D, Poortarigh M (2018) An artificial neural network and Bayesian network model for liquidity risk assessment in banking. Neurocomputing 275:2525–2554
Lv JC, Yi Z, Li Y (2014) Non-divergence of stochastic discrete time algorithms for pca neural networks. IEEE Trans Neural Netw Learn Syst 26(2):394–399
Lv JC, Tan KK, Yi Z et al (2009) A family of fuzzy learning algorithms for robust principal component analysis neural networks. IEEE Trans Fuzzy Syst 18(1):217–226
LeCun Y, Kavukcuoglu K, Farabet C (2010) Convolutional networks and applications in vision. In: Proceedings of 2010 IEEE international symposium on circuits and systems, pp 253–256. IEEE
Aydogan EK, Ozmen M, Delice Y (2019) Cbr-pso: cost-based rough particle swarm optimization approach for high-dimensional imbalanced problems. Neural Comput Appl 31(10):6345–6363
Chan TK, Chin CS (2019) Health stages diagnostics of underwater thruster using sound features with imbalanced dataset. Neural Comput Appl 31(10):5767–5782
Wei C, Sohn K, Mellina C, Yuille A, Yang F (2021) Crest: A class-rebalancing self-training framework for imbalanced semi-supervised learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10857–10866
Dong Q, Gong S, Zhu X (2018) Imbalanced deep learning by minority class incremental rectification. IEEE Trans Pattern Anal Mach Intell 41(6):1367–1381
Kim J, Jeong J, Shin J (2020) M2m: imbalanced classification via major-to-minor translation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13896–13905
Borisyak M, Ryzhikov A, Ustyuzhanin A, Derkach D, Ratnikov F, Mineeva O (2020) \((1+\epsilon )\)-class classification: an anomaly detection method for highly imbalanced or incomplete data sets. J Mach Learn Res 21(72):1–22
Pourhabib A, Mallick BK, Ding Y (2015) Absent data generating classifier for imbalanced class sizes. 1foldr Import 2019-10-08 Batch 4
Fernández A, García S, Galar M, Prati RC, Krawczyk B, Herrera F (2018) Learning from imbalanced data sets, vol 10. Springer, Berlin
Kaur H, Pannu HS, Malhi AK (2019) A systematic review on imbalanced data challenges in machine learning: applications and solutions. ACM Comput Surv (CSUR) 52(4):1–36
Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
Cao K, Wei C, Gaidon A, Arechiga N, Ma T (2019) Learning imbalanced datasets with label-distribution-aware margin loss. In: Advances in neural information processing systems, vol 32
Du J, Zhou Y, Liu P, Vong C-M, Wang T (2021) Parameter-free loss for class-imbalanced deep learning in image classification. IEEE Trans Neural Netw Learn Syst
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Han H, Wang W-Y, Mao B-H (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing, pp 878–887. Springer
He H, Bai Y, Garcia EA, Li S (2008) Adasyn: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence), pp 1322–1328. IEEE
Sáez JA, Luengo J, Stefanowski J, Herrera F (2015) Smote-ipf: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf Sci 291:184–203
Xie X, Liu H, Zeng S, Lin L, Li W (2021) A novel progressively undersampling method based on the density peaks sequence for imbalanced data. Knowl Based Syst 213:106689
Hoyos-Osorio J, Alvarez-Meza A, Daza-Santacoloma G, Orozco-Gutierrez A, Castellanos-Dominguez G (2021) Relevant information undersampling to support imbalanced data classification. Neurocomputing 436:136–146
Wang Z, Cao C, Zhu Y (2020) Entropy and confidence-based undersampling boosting random forests for imbalanced problems. IEEE Trans Neural Netw Learn Syst 31(12):5178–5191
Jin L, Lazarow J, Tu Z (2017) Introspective classification with convolutional nets. In: Advances in neural information processing systems, vol 30
Andrieu C, De Freitas N, Doucet A, Jordan MI (2003) An introduction to mcmc for machine learning. Mach Learn 50(1):5–43
Huang C, Li Y, Loy CC, Tang X (2019) Deep imbalanced learning for face recognition and attribute prediction. IEEE Trans Pattern Anal Mach Intell 42(11):2781–2794
Cui J, Zhong Z, Liu S, Yu B, Jia J (2021) Parametric contrastive learning. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 715–724
Zhong Y, Deng W, Wang M, Hu J, Peng J, Tao X, Huang, Y (2019) Unequal-training for deep face recognition with long-tailed noisy data. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7812–7821
Yang Y, Xu Z (2020) Rethinking the value of labels for improving class-imbalanced learning. Adv Neural Inf Process Syst 33:19290–19301
Li T, Wang L, Wu G (2021) Self supervision to distillation for long-tailed visual recognition. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 630–639
Yu W, Yang T, Chen C (2021) Towards resolving the challenge of long-tail distribution in uav images for object detection. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 3258–3267
Koziarski M, Krawczyk B, Woźniak, M (2017) Radial-based approach to imbalanced data oversampling. In: International conference on hybrid artificial intelligence systems, pp 318–327. Springer
Wang X, Xu J, Zeng T, Jing L (2021) Local distribution-based adaptive minority oversampling for imbalanced data classification. Neurocomputing 422:200–213
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2020) Generative adversarial networks. Commun ACM 63(11):139–144
Ali-Gombe A, Elyan E (2019) Mfc-gan: class-imbalanced dataset classification using multiple fake class generative adversarial network. Neurocomputing 361:212–221
Hao J, Wang C, Yang G, Gao Z, Zhang J, Zhang H (2021) Annealing genetic gan for imbalanced web data learning. IEEE Trans Multimed
Li Y, Shi Z, Liu C, Tian W, Kong Z, Williams CB (2021) Augmented time regularized generative adversarial network (atr-gan) for data augmentation in online process anomaly detection. IEEE Trans Autom Sci Eng
de Morais RF, Vasconcelos GC (2019) Boosting the performance of over-sampling algorithms through under-sampling the minority class. Neurocomputing 343:3–18
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823
Ng WW, Zeng G, Zhang J, Yeung DS, Pedrycz W (2016) Dual autoencoders features for imbalance classification problem. Pattern Recognit 60:875–889
Gong D, Liu L, Le V, Saha B, Mansour MR, Venkatesh S, Hengel Avd (2019) Memorizing normality to detect anomaly: memory-augmented deep autoencoder for unsupervised anomaly detection. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV)
Chang Y, Tu Z, Xie W, Yuan J (2020) Clustering driven deep autoencoder for video anomaly detection. In: European conference on computer vision, pp 329–345. Springer
Huang C, Li Y, Loy CC, Tang X (2016) Learning deep representation for imbalanced classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5375–5384
Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) Smoteboost: Improving prediction of the minority class in boosting. In: European conference on principles of data mining and knowledge discovery, pp 107–119. Springer
Liu X-Y, Wu J, Zhou Z-H (2008) Exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man Cybern Part B (Cybern) 39(2):539–550
Sun Y, Kamel MS, Wong AK, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recognit 40(12):3358–3378
Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Multiple-Valued Log Soft Comput 17
Acknowledgements
This work is supported by the Key Program of National Natural Science Fund of China (Grant No. 61836006), the National Key Research and Development Program of China (Grant No. 2019YFC1510705), and the Science and Technology Major Project of Sichuan province (Grant No. 2019ZDZX0006).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Huang, Z.a., Sang, Y., Sun, Y. et al. Neural network with absent minority class samples and boundary shifting for imbalanced data classification. Neural Comput & Applic 35, 8937–8953 (2023). https://doi.org/10.1007/s00521-022-08135-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-08135-y