Improved Anomaly Detection by Using the Attention-Based Isolation Forest
<p>A general scheme of ABIForest, which illustrates how iForest is modified by incorporating an attention mechanism.</p> "> Figure 2
<p>Points from the Circle dataset.</p> "> Figure 3
<p>Points from the Normal dataset.</p> "> Figure 4
<p>F1-score measures as functions of the softmax hyperparameter <math display="inline"><semantics> <mi>ω</mi> </semantics></math> for different contamination parameters <math display="inline"><semantics> <mi>ϵ</mi> </semantics></math> for the Circle dataset.</p> "> Figure 5
<p>F1-score measures as functions of the contamination parameter <math display="inline"><semantics> <mi>ϵ</mi> </semantics></math> for different numbers of trees in iForest <span class="html-italic">T</span> for the Circle dataset.</p> "> Figure 6
<p>Comparison of the test set generated for the Circle dataset (the <b>left</b> panel), predictions obtained by iForest (the <b>central</b> panel), and predictions obtained by ABIForest (the <b>right</b> panel).</p> "> Figure 7
<p>F1-score measures as functions of the softmax hyperparameter <math display="inline"><semantics> <mi>ω</mi> </semantics></math> for different contamination parameters <math display="inline"><semantics> <mi>ϵ</mi> </semantics></math> and for the Normal dataset.</p> "> Figure 8
<p>F1-score measures as functions of the contamination parameter <math display="inline"><semantics> <mi>ϵ</mi> </semantics></math> for different numbers of trees in iForest <span class="html-italic">T</span> for the Circle dataset.</p> "> Figure 9
<p>Comparison of the test set generated for the Normal dataset (the <b>left</b> panel), predictions obtained by iForest (the <b>central</b> panel), and predictions obtained by ABIForest (the <b>right</b> panel).</p> "> Figure 10
<p>Illustration how the F1-score measures of iForest and ABIForest depend on the number of training data for the Circle dataset (the <b>left</b> panel) and the Normal dataset (the <b>right</b> panel).</p> "> Figure 11
<p>Comparison of iForest and ABIForest with different thresholds <math display="inline"><semantics> <mi>τ</mi> </semantics></math> and with different contamination parameters <math display="inline"><semantics> <mi>ϵ</mi> </semantics></math> for the Credit (the <b>left</b> panel) and Ionosphere (the <b>right</b> panel) datasets.</p> "> Figure 12
<p>Comparison of iForest and ABIForest with different thresholds <math display="inline"><semantics> <mi>τ</mi> </semantics></math> and with different contamination parameters <math display="inline"><semantics> <mi>ϵ</mi> </semantics></math> for the Arrhythmia (the <b>left</b> panel) and Mullcross (the <b>right</b> panel) datasets.</p> "> Figure 13
<p>Comparison of iForest and ABIForest with different thresholds <math display="inline"><semantics> <mi>τ</mi> </semantics></math> and with different contamination parameters <math display="inline"><semantics> <mi>ϵ</mi> </semantics></math> for the Http (the <b>left</b> panel) and Pima (the <b>right</b> panel) datasets.</p> ">
Abstract
:1. Introduction
- A new modification of iForest called the attention-based isolation forest (ABIForest), which incorporates an attention mechanism in the form of the Nadaraya–Watson regression to improve the solution of the anomaly detection problem, is proposed.
- The algorithm of computing attention weights is reduced to solving linear or quadratic programming problems due to the application of Huber’s -contamination model. Moreover, we propose the use of the hinge-loss function to simplify the optimization problem. The contamination parameter is regarded as a tuning hyperparameter.
- Numerical experiments with synthetic and real datasets were performed to study ABIForest. They demonstrated outstanding results for most datasets. The code of the proposed algorithms can be found at https://github.com/AndreyAgeev/Attention-based-isolation-forest (accessed on 1 November 2022).
2. Related Work
3. Preliminaries
3.1. Attention Mechanism as Nadaraya–Watson Regression
3.2. Isolation Forest
4. Attention-Based Isolation Forest
4.1. Keys, Values, and Queries in iForests
4.2. Loss Function and Attention Weights
4.3. Huber’s Contamination Model
4.4. Loss Function with the Contamination Model
5. Numerical Experiments
5.1. Synthetic Datasets
5.2. Real Datasets
6. Concluding Remarks
- ABIForest is very simple from the point of view of computation because, in contrast to an attention-based neural network, the attention weights in ABIForest are trained by solving a standard quadratic optimization problem. This modification avoids the use of gradient-based algorithms to compute the optimal learnable attention parameters.
- ABIForest is a flexible model that can be simply modified. There are several components of ABIForest that can be changed to improve the model’s performance. First, different kernels can be used instead of the Gaussian kernel considered above. Second, there are statistical models [58] that are different from Huber’s -contamination model that can also be used in ABIForest. Third, the attention weights can be associated with some subsets of trees, including intersecting subsets. In this case, the number of trainable parameters can be reduced to avoid overfitting. Fourth, the paths in trees can be also attended, for example, by assigning attention weights to each branch in every path. Fifth, multi-head attention can be applied to iForest in order to improve the model—for example, by changing the hyperparameter of the softmax. Sixth, the distance between the instance and all instances that fall in the same leaf as can be differently defined. The above improvements can be regarded as directions for further research.
- The attention model is trained after building the forest. This implies that we do not need to rebuild iForest to achieve higher accuracy. The hyperparameters are tuned without rebuilding iForest. Moreover, we can apply various modifications and extensions of iForest and incorporate the attention mechanism in the same way as in the original iForest.
- ABIForest allows us to obtain an interpretation that answers the question of why an instance is anomalous. This can be done by analyzing the isolation trees with the largest attention weights.
- ABIForest deals perfectly with tabular data.
- It follows from the numerical experiments that ABIForest improves the performance of iForest for many datasets.
- The main disadvantage is that ABIForest has three additional hyperparameters: the contamination parameter , the hyperparameter of the softmax operation , and the regularization hyperparameter . We do not include the threshold , which is also used in iForest. Additional hyperparameters lead to significant increases in the validation time.
- Some additional time is required to solve the optimization problem (14).
- In contrast to iForest, ABIForest is a supervised model. It requires one to have labels of data (normal or anomalous) in order to determine the criteria of optimization, that is, to construct the optimization problem (14).
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Chalapathy, R.; Chawla, S. Deep learning for anomaly detection: A survey. arXiv 2022, arXiv:1901.03407. [Google Scholar]
- Boukerche, A.; Zheng, L.; Alfandi, O. Outlier Detection: Methods, Models, and Classification. ACM Comput. Surv. 2020, 53, 1–37. [Google Scholar] [CrossRef]
- Braei, M.; Wagner, S. Anomaly Detection in Univariate Time-series: A Survey on the State-of-the-Art. arXiv 2020, arXiv:2004.00433. [Google Scholar]
- Chandola, V.; Banerjee, A.; Kumar, V. Anomaly detection: A survey. ACM Comput. Surv. 2009, 41, 1–58. [Google Scholar] [CrossRef]
- Farizi, W.A.; Hidayah, I.; Rizal, M. Isolation Forest Based Anomaly Detection: A Systematic Literature Review. In Proceedings of the 8th International Conference on Information Technology, Computer and Electrical Engineering (ICITACEE), Semarang, Indonesia, 23–24 September 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 118–122. [Google Scholar]
- Fauss, M.; Zoubir, A.; Poor, H. Minimax Robust Detection: Classic Results and Recent Advances. IEEE Trans. Signal Process. 2021, 69, 2252–2283. [Google Scholar] [CrossRef]
- Pang, G.; Shen, C.; Cao, L.; Hengel, A.V.D. Deep Learning for Anomaly Detection: A Review. ACM Comput. Surv. 2022, 54, 1–38. [Google Scholar] [CrossRef]
- Yang, J.; Zhou, K.; Li, Y.; Liu, Z. Generalized Out-of-Distribution Detection: A Survey. arXiv 2021, arXiv:2110.11334v2. [Google Scholar]
- Pang, G.; Cao, L.; Aggarwal, C. Deep Learning for Anomaly Detection: Challenges, Methods, and Opportunities. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining, Virtual, 8–12 March 2021; pp. 1127–1130. [Google Scholar]
- Ruff, L.; Kauffmann, J.; Vandermeulen, R.; Montavon, G.; Samek, W.; Kloft, M.; Dietterich, T.; Müller, K. A Unifying Review of Deep and Shallow Anomaly Detection. Proc. IEEE 2021, 109, 756–795. [Google Scholar] [CrossRef]
- Wang, H.; Bah, M.H. Progress in Outlier Detection Techniques: A Survey. IEEE Access. 2019, 7, 107964–108000. [Google Scholar] [CrossRef]
- Aggarwal, C. An Introduction to Outlier Analysis; Chapter Outlier Analysis; Springer: Berlin/Heidelberg, Germany, 2013; pp. 1–40. [Google Scholar]
- Campbell, C.; Bennett, K. A linear programming approach to novelty detection. In Advances in Neural Information Processing Systems; Leen, T., Dietterich, T., Tresp, V., Eds.; MIT Press: Cambridge, MA, USA, 2001; Volume 13, pp. 395–401. [Google Scholar]
- Scholkopf, B.; Platt, J.; Shawe-Taylor, J.; Smola, A.; Williamson, R. Estimating the support of a high-dimensional distribution. Neural Comput. 2001, 13, 1443–1471. [Google Scholar] [CrossRef]
- Tax, D.; Duin, R. Support vector data description. Mach. Learn. 2004, 54, 45–66. [Google Scholar] [CrossRef] [Green Version]
- Liu, F.T.; Kai, M.T.; Zhou, Z.H. Isolation forest. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, Pisa, Italy, 15–19 December 2008; IEEE: Piscataway, NJ, USA, 2008; pp. 413–422. [Google Scholar]
- Liu, F.T.; Ting, K.M.; Zhou, Z.H. Isolation-based anomaly detection. ACM Trans. Knowl. Discov. Data (TKDD) 2012, 6, 1–39. [Google Scholar] [CrossRef]
- Chaudhari, S.; Mithal, V.; Polatkan, G.; Ramanath, R. An attentive survey of attention models. arXiv 2019, arXiv:1904.02874. [Google Scholar] [CrossRef]
- Correia, A.; Colombini, E. Attention, please! A survey of neural attention models in deep learning. arXiv 2021, arXiv:2103.16775. [Google Scholar]
- Correia, A.; Colombini, E. Neural Attention Models in Deep Learning: Survey and Taxonomy. arXiv 2021, arXiv:2112.05909. [Google Scholar]
- Lin, T.; Wang, Y.; Liu, X.; Qiu, X. A Survey of Transformers. arXiv 2021, arXiv:2106.04554. [Google Scholar] [CrossRef]
- Niu, Z.; Zhong, G.; Yu, H. A review on the attention mechanism of deep learning. Neurocomputing 2021, 452, 48–62. [Google Scholar] [CrossRef]
- Utkin, L.; Konstantinov, A. Attention-based Random Forest and Contamination Model. Neural Netw. 2022, 154, 346–359. [Google Scholar] [CrossRef]
- Nadaraya, E. On estimating regression. Theory Probab. Its Appl. 1964, 9, 141–142. [Google Scholar] [CrossRef]
- Watson, G. Smooth regression analysis. Sankhya Indian J. Stat. Ser. A 1964, 26, 359–372. [Google Scholar]
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
- Huber, P. Robust Statistics; Wiley: New York, NY, USA, 1981. [Google Scholar]
- Sawant, S.; Singh, S. Understanding Attention: In Minds and Machines. arXiv 2020, arXiv:2012.02659. [Google Scholar]
- Xu, K.; Ba, J.; Kiros, R.; Cho, K.; Courville, A.; Salakhudinov, R.; Zemel, R.; Bengio, Y. Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 2048–2057. [Google Scholar]
- Luong, T.; Pham, H.; Manning, C. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 1412–1421. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
- Yang, Z.; Yang, D.; Dyer, C.; He, X.; Smola, A.; Hovy, E. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 12–17 June 2016; pp. 1480–1489. [Google Scholar]
- Liu, F.; Huang, X.; Chen, Y.; Suykens, J. Random Features for Kernel Approximation: A Survey on Algorithms, Theory, and Beyond. arXiv 2021, arXiv:2004.11154v5. [Google Scholar] [CrossRef] [PubMed]
- Utkin, L.; Konstantinov, A. Attention and Self-Attention in Random Forests. arXiv 2022, arXiv:2207.04293. [Google Scholar]
- Utkin, L.; Konstantinov, A. Random Survival Forests Incorporated by the Nadaraya-Watson Regression. Inform. Autom. 2022, 21, 851–880. [Google Scholar] [CrossRef]
- Konstantinov, A.; Utkin, L.; Kirpichenko, S. AGBoost: Attention-based Modification of Gradient Boosting Machine. In Proceedings of the 31st Conference of Open Innovations Association (FRUCT), Helsinki, Finland, 27–29 April 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 96–101. [Google Scholar]
- Kundu, A.; Sahu, A.; Serpedin, E.; Davis, K. A3d: Attention-based auto-encoder anomaly detector for false data injection attacks. Electr. Power Syst. Res. 2020, 189, 106795. [Google Scholar] [CrossRef]
- Takimoto, H.; Seki, J.; Situju, S.; Kanagawa, A. Anomaly Detection Using Siamese Network with Attention Mechanism for Few-Shot Learning. Appl. Artif. Intell. 2022, 36, 2930–2946. [Google Scholar] [CrossRef]
- Lei, X.; Xia, Y.; Wang, A.; Jian, X.; Zhong, H.; Sun, L. Mutual information based anomaly detection of monitoring data with attention mechanism and residual learning. Mech. Syst. Signal Process. 2023, 182, 109607. [Google Scholar] [CrossRef]
- Yu, Y.; Zha, Z.; Jin, B.; Wu, G.; Dong, C. Graph-Based Anomaly Detection via Attention Mechanism. In Proceedings of the International Conference on Intelligent Computing, Xi’an, China, 7–11 August 2022; Springer: Cham, Switzerland, 2022; pp. 401–411. [Google Scholar]
- Madan, N.; Ristea, N.C.; Ionescu, R.; Nasrollahi, K.; Khan, F.; Moeslund, T.; Shah, M. Self-Supervised Masked Convolutional Transformer Block for Anomaly Detection. arXiv 2022, arXiv:2209.12148. [Google Scholar]
- Ristea, N.C.; Madan, N.; Ionescu, R.; Nasrollahi, K.; Khan, F.; Moeslund, T.; Shah, M. Self-supervised predictive convolutional attentive block for anomaly detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2022; pp. 13576–13586. [Google Scholar]
- Huang, C.; Xu, Q.; Wang, Y.; Wang, Y.; Zhang, Y. Self-Supervised Masking for Unsupervised Anomaly Detection and Localization. arXiv 2022, arXiv:2205.06568. [Google Scholar] [CrossRef]
- Zhao, H.; Wang, Y.; Duan, J.; Huang, C.; Cao, D.; Tong, Y.; Xu, B.; Bai, J.; Tong, J.; Zhang, Q. Multivariate time-series anomaly detection via graph attention network. In Proceedings of the 2020 IEEE International Conference on Data Mining (ICDM), Sorrento, Italy, 17–20 November 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 841–850. [Google Scholar]
- Wang, J.; Jia, Y.; Wang, D.; Xiao, W.; Wang, Z. Weighted IForest and siamese GRU on small sample anomaly detection in healthcare. Comput. Methods Programs Biomed. 2022, 218, 106706. [Google Scholar] [CrossRef] [PubMed]
- Hariri, S.; Kind, M.; Brunner, R. Extended Isolation Forest. IEEE Trans. Knowl. Data Eng. 2021, 33, 1479–1489. [Google Scholar] [CrossRef] [Green Version]
- Buschjager, S.; Honysz, P.J.; Morik, K. Generalized Isolation Forest: Some Theory and More Applications Extended Abstract. In Proceedings of the IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), Sydney, Australia, 6–9 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 793–794. [Google Scholar]
- Lesouple, J.; Baudoin, C.; Spigai, M.; Tourneret, J.Y. Generalized isolation forest for anomaly detection. Pattern Recognit. 2021, 149, 109–119. [Google Scholar] [CrossRef]
- Karczmarek, P.; Kiersztyn, A.; Pedrycz, W.; Al, E. K-Means-based isolation forest. Knowl. Based Syst. 2020, 195, 1–15. [Google Scholar] [CrossRef]
- Karczmarek, P.; Kiersztyn, A.; Pedrycz, W. Fuzzy Set-Based Isolation Forest. In Proceedings of the IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Glasgow, UK, 19–24 July 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–6. [Google Scholar]
- Tokovarov, M.; Karczmarek, P. A probabilistic generalization of isolation forest. Inf. Sci. 2022, 584, 433–449. [Google Scholar] [CrossRef]
- Li, C.; Guo, L.; Gao, H.; Li, Y. Similarity-measured isolation forest: Anomaly detection method for machine monitoring data. IEEE Trans. Instrum. Meas. 2021, 70, 1–12. [Google Scholar] [CrossRef]
- Li, S.; Zhang, K.; Duan, P.; Kang, X. Hyperspectral Anomaly Detection with Kernel Isolation Forest. IEEE Trans. Geosci. Remote Sens. 2020, 58, 319–329. [Google Scholar] [CrossRef]
- Liu, Z.; Liu, X.; Ma, J.; Gao, H. An Optimized Computational Framework for Isolation Forest. Math. Probl. Eng. 2018, 2018, 1–14. [Google Scholar] [CrossRef]
- Staerman, G.; Mozharovskyi, P.; Clemencon, S.; d′Alche Buc, F. Functional Isolation Forest. In Proceedings of the Eleventh Asian Conference on Machine Learning, Nagoya, Japan, 17–19 November 2019; pp. 332–347. [Google Scholar]
- Xu, H.; Pang, G.; Wang, Y.; Wang, Y. Deep Isolation Forest for Anomaly Detection. arXiv 2022, arXiv:2206.06602. [Google Scholar]
- Zhang, A.; Lipton, Z.; Li, M.; Smola, A. Dive into Deep Learning. arXiv 2021, arXiv:2106.11342 2021. [Google Scholar]
- Walley, P. Statistical Reasoning with Imprecise Probabilities; Chapman and Hall: London, UK, 1991. [Google Scholar]
Dataset | d | ||
---|---|---|---|
Circle (synthetic) | 1000 | 200 | 2 |
Normal dataset (synthetic) | 1000 | 50 | 2 |
Credit | 1500 | 400 | 30 |
Ionosphere | 225 | 126 | 33 |
Arrhythmia | 386 | 66 | 18 |
Mulcross | 1800 | 400 | 4 |
Http | 500 | 50 | 3 |
Pima | 500 | 268 | 8 |
T | |||||
---|---|---|---|---|---|
5 | 15 | 25 | 50 | 150 | |
T | |||||
---|---|---|---|---|---|
5 | 15 | 25 | 50 | 150 | |
The Circle Dataset | ||||
---|---|---|---|---|
n | 50 | 200 | 800 | 1200 |
iForest | ||||
ABIForest | ||||
The Normal dataset | ||||
n | 50 | 150 | 350 | 550 |
iForest | ||||
ABIForest |
ABIForest | iForest | |||||
---|---|---|---|---|---|---|
Dataset | F1 | F1 | ||||
Credit | ||||||
Ionosphere | ||||||
Arrhythmia | − | |||||
Mullcross | ||||||
Http | ||||||
Pima | 30 |
The Ionosphere Dataset | ||||
---|---|---|---|---|
n | 80 | 100 | 200 | 300 |
iForest | ||||
ABIForest |
The Ionosphere Dataset | |||||
---|---|---|---|---|---|
10 | 20 | 40 | 50 | 60 | |
iForest | |||||
ABIForest |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Utkin, L.; Ageev, A.; Konstantinov, A.; Muliukha, V. Improved Anomaly Detection by Using the Attention-Based Isolation Forest. Algorithms 2023, 16, 19. https://doi.org/10.3390/a16010019
Utkin L, Ageev A, Konstantinov A, Muliukha V. Improved Anomaly Detection by Using the Attention-Based Isolation Forest. Algorithms. 2023; 16(1):19. https://doi.org/10.3390/a16010019
Chicago/Turabian StyleUtkin, Lev, Andrey Ageev, Andrei Konstantinov, and Vladimir Muliukha. 2023. "Improved Anomaly Detection by Using the Attention-Based Isolation Forest" Algorithms 16, no. 1: 19. https://doi.org/10.3390/a16010019
APA StyleUtkin, L., Ageev, A., Konstantinov, A., & Muliukha, V. (2023). Improved Anomaly Detection by Using the Attention-Based Isolation Forest. Algorithms, 16(1), 19. https://doi.org/10.3390/a16010019