Abstract
This paper addresses the imbalanced data problem in an online fashion based on multi-threshold learning. The majority of existing works on processing large-scale imbalanced data stream assume a prior distribution of data based on a training dataset, while we consider a more challenging setting without any assumption of the prior, and propose an online multi-threshold learning (OMTL) method by simultaneously learning multiple classifiers with different threshold based on F-measure incremental updating. The proposed approach shows its potentials on recent benchmark datasets compared to previous cost-sensitive and threshold fine-tuning based online learning algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: DBSMOTE: density-based synthetic minority over-sampling technique. Appl. Intell. 36(3), 664–684 (2012)
Busa-Fekete, R., Szörényi, B., Dembczynski, K., Hüllermeier, E.: Online F-measure optimization. In: Advances in Neural Information Processing Systems, pp. 595–603 (2015)
Elkan, C.: The foundations of cost-sensitive learning. In: International Joint Conference on Artificial Intelligence, vol. 17, pp. 973–978. Lawrence Erlbaum Associates Ltd. (2001)
Gao, J., Liu, X., Ooi, B.C., Wang, H., Chen, G.: An online cost sensitive decision-making method in crowdsourcing systems. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 217–228. ACM (2013)
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). doi:10.1007/11538059_91
He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1322–1328. IEEE (2008)
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Li, Y., Zaragoza, H., Herbrich, R., Shawe-Taylor, J., Kandola, J.: The perceptron algorithm with uneven margins. In: ICML, vol. 2, pp. 379–386 (2002)
Scott, C.: Surrogate losses and regret bounds for cost-sensitive classification with example-dependent costs. In: Proceedings of the 28th International Conference on Machine Learning (ICML 2011), pp. 153–160 (2011)
Stefanowski, J., Wilk, S.: Selective pre-processing of imbalanced data for improving classification performance. In: Song, I.-Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2008. LNCS, vol. 5182, pp. 283–292. Springer, Heidelberg (2008). doi:10.1007/978-3-540-85836-2_27
Sun, Y., Wong, A.K., Kamel, M.S.: Classification of imbalanced data: a review. Int. J. Pattern Recogn. Artif. Intell. 23(04), 687–719 (2009)
Van Hulse, J., Khoshgoftaar, T.M., Napolitano, A.: Experimental perspectives on learning from imbalanced data. In: Proceedings of the 24th International Conference on Machine Learning, pp. 935–942. ACM (2007)
Wang, J., Zhao, P., Hoi, S.C.: Cost-sensitive online classification. IEEE Trans. Knowl. Data Eng. 26(10), 2425–2438 (2014)
Ying, Y., Wen, L., Lyu, S.: Stochastic online AUC maximization. In: Advances in Neural Information Processing Systems, pp. 451–459 (2016)
Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent (2003)
Acknowledgments
The authors would like to acknowledge the funding supported by State Key Laboratory of Software Engineering, Computer School, Wuhan University, and research project number is SKLSE-2015-A-06, and also partially supported by the National Natural Science Foundation of China under the Project 61371191.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Cai, X., Yang, M., Zhu, R., Li, X., Ye, L., Zhang, Q. (2017). Online Multi-threshold Learning with Imbalanced Data Stream. In: Cong, F., Leung, A., Wei, Q. (eds) Advances in Neural Networks - ISNN 2017. ISNN 2017. Lecture Notes in Computer Science(), vol 10261. Springer, Cham. https://doi.org/10.1007/978-3-319-59072-1_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-59072-1_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59071-4
Online ISBN: 978-3-319-59072-1
eBook Packages: Computer ScienceComputer Science (R0)