Evaluating Vision Transformer Models for Visual Quality Control in Industrial Manufacturing

Miriam Alber^11,13,
Christoph Hönes¹² &
Patrick Baier¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14950))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

725 Accesses

Abstract

One of the most promising use-cases for machine learning in industrial manufacturing is the early detection of defective products using a quality control system. Such a system can save costs and reduces human errors due to the monotonous nature of visual inspections. Today, a rich body of research exists which employs machine learning methods to identify rare defective products in unbalanced visual quality control datasets. These methods typically rely on two components: A visual backbone to capture the features of the input image and an anomaly detection algorithm that decides if these features are within an expected distribution. With the rise of transformer architecture as visual backbones of choice, there exists now a great variety of different combinations of these two components, ranging all along the trade-off between detection quality and inference time. Facing this variety, practitioners in the field often have to spend a considerable amount of time on researching the right combination for their use-case at hand. Our contribution is to help practitioners with this choice by reviewing and evaluating current vision transformer models together with anomaly detection methods. For this, we chose SotA models of both disciplines, combine and evaluate them towards the goal of having small, fast and efficient anomaly detection models suitable for industrial manufacturing. We evaluate the results on the well-known MVTecAD and BTAD datasets and propose considerations for using a quality control system in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Automated Quality Inspection of High Voltage Equipment Supported by Machine Learning and Computer Vision

Automated Quality Inspection Using Computer Vision: A Review

Machine Vision Systems for Industrial Quality Control Inspections

References

Ardizzone, L., et al.: Framework for Easily Invertible Architectures (FrEIA) (2022). https://github.com/vislearn/FrEIA
Bae, J., Lee, J.H., Kim, S.: Image anomaly detection and localization with position and neighborhood information. https://arxiv.org/pdf/2211.12634v2.pdf
Bergmann, P., Batzner, K., Fauser, M., Sattlegger, D., Steger, C.: The MVTEC anomaly detection dataset: a comprehensive real-world dataset for unsupervised anomaly detection. Int. J. Comput. Vision 129(4), 1038–1059 (2021). https://doi.org/10.1007/s11263-020-01400-4
Article Google Scholar
Bergmann, P., Fauser, M., Sattlegger, D., Steger, C.: Uninformed students: student-teacher anomaly detection with discriminative latent embeddings. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 4182–4191 (2020). https://doi.org/10.1109/CVPR42600.2020.00424
Bishop, C.: Mixture density networks. Workingpaper, Aston University (1994)
Google Scholar
Choi, B., Jeong, J.: VIV-ANO: anomaly detection and localization combining vision transformer and variational autoencoder in the manufacturing process. Electronics 11(15), 2306 (2022). https://doi.org/10.3390/electronics11152306
Article Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
Dinh, L., Sohl-Dickstein, J., Bengio, S.: Density estimation using real NVP. https://arxiv.org/pdf/1605.08803
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. https://arxiv.org/pdf/2010.11929
Fan, Y., Wen, G., Li, D., Qiu, S., Levine, M.D., Xiao, F.: Video anomaly detection and localization via gaussian mixture fully convolutional variational autoencoder. Comput. Vision Image Underst.195 (2020). https://doi.org/10.1016/j.cviu.2020.102920
Gudovskiy, D., Ishizaka, S., Kozuka, K.: Cflow-AD: real-time unsupervised anomaly detection with localization via conditional normalizing flows. https://arxiv.org/pdf/2107.12571
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. https://arxiv.org/pdf/1512.03385
Hyun, J., Kim, S., Jeon, G., Kim, S.H., Bae, K., Kang, B.J.: Reconpatch: contrastive patch representation learning for industrial anomaly detection. http://arxiv.org/pdf/2305.16713
Kim, Y., Jang, H., Lee, D., Choi, H.J.: Altub: alternating training method to update base distribution of normalizing flow for anomaly detection. https://arxiv.org/pdf/2210.14913v1.pdf
Lei, J., Hu, X., Wang, Y., Liu, D.: Pyramidflow: high-resolution defect contrastive localization using pyramid normalizing flow. https://arxiv.org/pdf/2303.02595v1.pdf
Li, C., et al.: Efficient self-supervised vision transformers for representation learning. https://arxiv.org/pdf/2106.09785.pdf
Li, H., Wu, J., Chen, H., Wang, M., Shen, C.: Efficient anomaly detection with budget annotation using semi-supervised residual transformer. http://arxiv.org/pdf/2306.03492
Li, Y., et al.: Efficientformer: Vision transformers at mobilenet speed. arXiv preprint arXiv:2206.01191 (2022)
Liu, Z., et al.: SWIN transformer: hierarchical vision transformer using shifted windows. https://arxiv.org/pdf/2103.14030
Mathian, E., Liu, H., Fernandez-Cuesta, L., Samaras, D., Foll, M., Chen, L.: Haloae: an halonet based local transformer auto-encoder for anomaly detection and localization. https://arxiv.org/pdf/2208.03486.pdf
Mishra, P., Verk, R., Fornasier, D., Piciarelli, C., Foresti, G.L.: VT-ADL: a vision transformer network for image anomaly detection and localization. In: 30th IEEE/IES International Symposium on Industrial Electronics (ISIE) (2021)
Google Scholar
Roth, K., Pemula, L., Zepeda, J., Schölkopf, B., Brox, T., Gehler, P.: Towards total recall in industrial anomaly detection. https://arxiv.org/pdf/2106.08265v2.pdf
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv2: inverted residuals and linear bottlenecks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019). https://arxiv.org/pdf/1801.04381
Tao, X., Gong, X., Zhang, X., Yan, S., Adak, C.: Deep learning for unsupervised anomaly localization in industrial images: a survey. IEEE Trans. Instrum. Meas. 71, 1–21 (2022). https://doi.org/10.1109/TIM.2022.3196436
Article Google Scholar
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. https://arxiv.org/pdf/2012.12877
Vaswani, A., Ramachandran, P., Srinivas, A., Parmar, N., Hechtman, B., Shlens, J.: Scaling local self-attention for parameter efficient visual backbones. https://arxiv.org/pdf/2103.12731
Wang, T., Chen, Y., Qiao, M., Snoussi, H.: A fast and robust convolutional neural network-based defect detection model in product quality control. Int. J. Adv. Manuf. Technol. 94(9–12), 3465–3471 (2018). https://doi.org/10.1007/s00170-017-0882-0
Article Google Scholar
Wang, W., et al.: PVT V2: improved baselines with pyramid vision transformer. Comput. Visual Media 8(3), 415–424 (2022). https://doi.org/10.1007/s41095-022-0274-8
Article Google Scholar
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: Segformer: simple and efficient design for semantic segmentation with transformers. http://arxiv.org/pdf/2105.15203.pdf
You, Z., Yang, K., Luo, W., Cui, L., Le, X., Zheng, Y.: ADTR: anomaly detection transformer with feature reconstruction. https://arxiv.org/pdf/2209.01816.pdf
Yu, J., et al.: Fastflow: unsupervised anomaly detection and localization via 2D normalizing flows. https://arxiv.org/pdf/2111.07677
Zhang, K., Wang, B., Kuo, C.C.J.: PedeNet: image anomaly localization via patch embedding and density estimation. Pattern Recogn. Lett. 153, 144–150 (2022). https://doi.org/10.1016/j.patrec.2021.11.030
Article Google Scholar
Zhang, Z., Zhang, H., Zhao, L., Chen, T., Arik, S.O., Pfister, T.: Nested hierarchical transformer: towards accurate, data-efficient and interpretable visual understanding. https://arxiv.org/pdf/2105.12723.pdf
Zong, B., et al.: Deep autoencoding gaussian mixture model for unsupervised anomaly detection. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=BJJLHbb0-

Download references

Acknowledgements

Christoph Hönes has received funding from SAP SE. Christoph Hönes and Miriam Alber were employed by esentri AG who also provided computational resources.

Author information

Authors and Affiliations

iteratec GmbH, Karlsruhe, Germany
Miriam Alber
Hasso Plattner Institute, University of Potsdam, Potsdam, Germany
Christoph Hönes
University of Applied Sciences, Karlsruhe, Germany
Miriam Alber & Patrick Baier

Authors

Miriam Alber
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Hönes
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Baier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Miriam Alber .

Editor information

Editors and Affiliations

LTCI, Télécom Paris, Palaiseau Cedex, France
Albert Bifet
Faculty of Informatics, Vytautas Magnus University, Akademija, Lithuania
Tomas Krilavičius
Stockholm University, Kista, Sweden
Ioanna Miliou
School of Information Technology, Halmstad University, Halmstad, Sweden
Slawomir Nowaczyk

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 5794 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alber, M., Hönes, C., Baier, P. (2024). Evaluating Vision Transformer Models for Visual Quality Control in Industrial Manufacturing. In: Bifet, A., Krilavičius, T., Miliou, I., Nowaczyk, S. (eds) Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track. ECML PKDD 2024. Lecture Notes in Computer Science(), vol 14950. Springer, Cham. https://doi.org/10.1007/978-3-031-70381-2_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-70381-2_8
Published: 22 August 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70380-5
Online ISBN: 978-3-031-70381-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)