Abstract
The identification of anomalies in a data stream is a difficulty for decision-making in real time. A memory-constrained online detection system that is able to quickly detect the concept drift of streaming data is required because the constant arrival of massive amounts of streaming data with changing characteristics makes real-time and efficient anomaly detection a difficult task. This is because of the nature of the data itself, which is constantly changing. In this study, a novel model for detecting anomalies using dynamic micro-clusters scheme is developed. The macro-clusters are generated from a network of connected micro-clusters. When new data items are added, the normal patterns that are formed in macro-clusters will update in tandem with the dynamic micro-clusters in an incremental fashion. An outlier may be understood from both a global and a local perspective by examining the global and local densities respectively. The effectiveness of the suggested approach was evaluated with the use of three different datasets. The findings of the experiment demonstrate that the suggested method is superior to earlier algorithms in terms of both the accuracy of detection and the level of computing complexity it requires.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Zhai, Y., Ong, Y.-S., Tsang, I.W.: The emerging “big dimensionality.” IEEE Comput. Intell. Mag. 9(3), 14–26 (2014). https://doi.org/10.1109/mci.2014.2326099
Hai, T., Zhou, J., Li, N., Jain, S.K., Agrawal, S., Dhaou, I.B.: Cloud-based bug tracking software defects analysis using deep learning. J. Cloud Comput. 11(1), 1–14 (2022)
Shi, Y., Peng, X., Li, R., Zhang, Y.: Unsupervised anomaly detection for network flow using immune network based k-means clustering (chap. 33). In: Data Science, (Communications in Computer and Information Science), pp. 386–399 (2017)
Sadik, S., Gruenwald, L.: Research issues in outlier detection for data streams. ACM SIGKDD Explor. Newslett. 15(1), 33–40 (2014). https://doi.org/10.1145/2594473.2594479
Hai, T., Alsharif, S., Dhahad, H.A., Attia, E.A., Shamseldin, M.A., Ahmed, A.N.: The evolutionary artificial intelligence-based algorithm to find the minimum GHG emission via the integrated energy system using the MSW as fuel in a waste heat recovery plant. Sustain. Energy Technol. Assess. 53, 102531 (2022)
Yin, C., Zhang, S., Yin, Z., Wang, J.: Anomaly detection model based on data stream clustering. Clust. Comput. 22(1), 1729–1738 (2017). https://doi.org/10.1007/s10586-017-1066-2
Dromard, J., Roudiere, G., Owezarski, P.: Online and scalable unsupervised network anomaly detection method. IEEE Trans. Netw. Serv. Manag. 14(1), 34–47 (2017). https://doi.org/10.1109/tnsm.2016.2627340
Choi, H., Kim, M., Lee, G., Kim, W.: Unsupervised learning approach for network intrusion detection system using autoencoders. J. Supercomput. 75(9), 5597–5621 (2019). https://doi.org/10.1007/s11227-019-02805-w
Bay, S.D., Kibler, D., Pazzani, M.J., Smyth, P.: The UCI KDD archive of large data sets for data mining research and experimentation. ACM SIGKDD Explor. Newslett. 2(2), 81–85 (2000). https://doi.org/10.1145/380995.381030
Prasad, M., Tripathi, S., Dahal, K.: Unsupervised feature selection and cluster center initialization based arbitrary shaped clusters for intrusion detection. Comput. Secur. 99, 19 (2020). https://doi.org/10.1016/j.cose.2020.102062
Verma, A., Ranga, V.: Statistical analysis of CIDDS-001 dataset for network intrusion detection systems using distance-based machine learning. In: Procedia Computer Science, vol. 125, pp. 709–716 (2018). https://doi.org/10.1016/j.procs.2017.12.091. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85040688913&doi=10.1016%2fj.procs.2017.12.091&partnerID=40&md5=b18bb5f2eb83d20d0a8654577709a0c9
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings - 29th International Conference on Very Large Data Bases, VLDB 2003, pp. 81–92 (2003). https://www.scopus.com/inward/record.uri?eid=2-s2.0-85012236181&partnerID=40&md5=ba9b3babce1e0698d473b70d76f2062d. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85012236181&partnerID=40&md5=ba9b3babce1e0698d473b70d76f2062d
Cao, F., Ester, M., Qian, W.N., Zhou, A.Y.: Density-based clustering over an evolving data stream with noise. In: Proceedings of the Sixth Siam International Conference on Data Mining, p. 328 (2006)
Ruiz, C., Menasalvas, E., Spiliopoulou, M.: C-DenStream: using domain knowledge on a data stream. In: Gama, J., Costa, V.S., Jorge, A.M., Brazdil, P.B. (eds.) DS 2009. LNCS (LNAI), vol. 5808, pp. 287–301. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04747-3_23
Ren, J., Ma, R.: Density-based data streams clustering over sliding windows. In: 6th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2009, vol. 5, pp. 248–252 (2009). https://doi.org/10.1109/FSKD.2009.553. https://www.scopus.com/inward/record.uri?eid=2-s2.0-76549115319&doi=10.1109%2fFSKD.2009.553&partnerID=40&md5=f58d4f0a94fd24238b7ad6e84025bfc2
Hyde, R., Angelov, P., MacKenzie, A.R.: Fully online clustering of evolving data streams into arbitrarily shaped clusters. Inf. Sci. 382, 96–114 (2017). https://doi.org/10.1016/j.ins.2016.12.004
Hyde, R., Angelov, P.: A new online clustering approach for data in arbitrary shaped clusters. In: Proceedings - 2015 IEEE 2nd International Conference on Cybernetics, CYBCONF 2015, pp. 228–233 (2015). https://doi.org/10.1109/CYBConf.2015.7175937. https://www.scopus.com/inward/record.uri?eid=2-s2.0-84947967804&doi=10.1109%2fCYBConf.2015.7175937&partnerID=40&md5=4b211a62c8fe6bc814762baf234eea83
Islam, M.K., Ahmed, M.M., Zamli, K.Z.: A buffer-based online clustering for evolving data stream. Inf. Sci. 489, 113–135 (2019). https://doi.org/10.1016/j.ins.2019.03.022
Škrjanc, I., Ozawa, S., Ban, T., Dovžan, D.: Large-scale cyber attacks monitoring using evolving Cauchy possibilistic clustering. Appl. Soft Comput. 62, 592–601 (2018). https://doi.org/10.1016/j.asoc.2017.11.008
Bigdeli, E., Mohammadi, M., Raahemi, B., Matwin, S.: Incremental anomaly detection using two-layer cluster-based structure. Inf. Sci. 429, 315–331 (2018). https://doi.org/10.1016/j.ins.2017.11.023
Shou, Z., Zou, F., Tian, H., Li, S.: Outlier detection based on local density of vector dot product in data stream. In: Yang, C.-N., Peng, S.-L., Jain, L.C. (eds.) SICBS 2018. AISC, vol. 895, pp. 170–184. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-16946-6_14
Prasad, M., Tripathi, S., Dahal, K.: An efficient feature selection based Bayesian and rough set approach for intrusion detection. Appl. Soft Comput. 87, 14 (2020). https://doi.org/10.1016/j.asoc.2019.105980
Funding
This work is supported by the Youth Science and Technology New Star Plan of Shaanxi Province (2021KJXX-50) and Technology New Star Plan of Shaanxi Province (No. 20JS09).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wang, X., Ahmed, M.M., Husen, M.N., Tao, H., Zhao, Q. (2023). Dynamic Micro-cluster-Based Streaming Data Clustering Method for Anomaly Detection. In: Yusoff, M., Hai, T., Kassim, M., Mohamed, A., Kita, E. (eds) Soft Computing in Data Science. SCDS 2023. Communications in Computer and Information Science, vol 1771. Springer, Singapore. https://doi.org/10.1007/978-981-99-0405-1_5
Download citation
DOI: https://doi.org/10.1007/978-981-99-0405-1_5
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-0404-4
Online ISBN: 978-981-99-0405-1
eBook Packages: Computer ScienceComputer Science (R0)