Abstract
The use of neural networks in perception pipelines of autonomous systems such as autonomous driving is indispensable due to their outstanding performance. But, at the same time their complexity poses a challenge with respect to safety. An important question in this regard is how to substantiate test sufficiency for such a function. One approach from software testing literature is that of coverage metrics. Similar notions of coverage, called neuron coverage, have been proposed for deep neural networks and try to assess to what extent test input activates neurons in a network. Still, the correspondence between high neuron coverage and safety-related network qualities remains elusive. Potentially, a high coverage could imply sufficiency of test data. In this paper, we argue that the coverage metrics as discussed in the current literature do not satisfy these high expectations and present a line of experiments from the field of computer vision to prove this claim.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
In a software architecture inspired view, both activation and non-activation should be included for coverage (cf. branch coverage). Since non-activation is the standard case and typically achieved with few tests, we focus on the activation part.
- 2.
Obviously, the performance of the weakly trained model, does not generalize to strong augmentation.
References
Adadi, A., Berrada, M.: Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018)
Alam, M., Samad, M.D., Vidyaratne, L., Glandon, A., Iftekharuddin, K.M.: Survey on deep neural networks in speech and vision systems. arXiv:1908.07656 (2019)
Bron, A., Farchi, E., Magid, Y., Nir, Y., Ur, S.: Applications of synchronization coverage. In: Symposium on Principles and Practice of Parallel Programming, pp. 206–212 (2005)
Burkov, A.: Machine Learning Engineering (2020). http://www.mlebook.com/wiki/doku.phps
Geirhos, R., Temme, C.R., Rauber, J., Schütt, H.H., Bethge, M., Wichmann, F.A.: Generalisation in humans and deep neural networks. In: Advances in Neural Information Processing Systems, pp. 7538–7550 (2018)
Gladisch, C., Heinzemann, C., Herrmann, M., Woehrle, M.: Leveraging combinatorial testing for safety-critical computer vision datasets. In: Workshop on Safe Artificial Intelligence for Automated Driving (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)
Kim, J., Feldt, R., Yoo, S.: Guiding deep learning system testing using surprise adequacy. In: International Conference on Software Engineering (2019)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Li, Z., Ma, X., Xu, C., Cao, C.: Structural coverage criteria for neural networks could be misleading. In: 41st International Conference on Software Engineering: New Ideas and Emerging Results, pp. 89–92. IEEE Press (2019)
Ma, L., et al.: Deepct: tomographic combinatorial testing for deep learning systems. In: 26th International Conference on Software Analysis, Evolution and Reengineering, pp. 614–618. IEEE (2019)
Ma, L., et al.: Deepgauge: multi-granularity testing criteria for deep learning systems. In: International Conference on Automated Software Engineering (2018)
Olah, C., et al.: The building blocks of interpretability. Distill 3, e10 (2018). https://doi.org/10.23915/distill.00010
Pei, K., Cao, Y., Yang, J., Jana, S.: DeepXplore: automated whitebox testing of deep learning systems. In: Symposium on Operating Systems Principles, pp. 1–18 (2017)
Pezzè, M., Young, M.: Software Testing and Analysis: Process, Principles, and Techniques. Wiley, Hoboken (2008)
Schwalbe, G., Schels, M.: A survey on methods for the safety assurance of machine learning based systems. In: European Congress Embedded Real Time Software and Systems (2020)
Sun, Y., Huang, X., Kroening, D., Sharp, J., Hill, M., Ashmore, R.: Structural test coverage criteria for deep neural networks. ACM Trans. Embed. Comput. Syst. 18(5s), 1–23 (2019)
Tian, Y., Pei, K., Jana, S., Ray, B.: DeepTest: automated testing of deep-neural-network-driven autonomous cars. arXiv:1708.08559 (2017)
Wang, H., Xu, J., Xu, C., Ma, X., Lu, J.: Dissector: input validation for deep learning applications by crossing-layer dissection. In: International Conference on Software Engineering (2020)
Woods, W., Chen, J., Teuscher, C.: Adversarial explanations for understanding image classification decisions and improved neural network robustness. Nat. Mach. Intell. 1(11), 508–516 (2019)
Zhang, J., Li, J.: Testing and verification of neural-network-based safety-critical control software: a systematic literature review. Inf. Softw. Technol. 123, 106296 (2020)
Acknowledgment
The research leading to the results presented above are funded by the German Federal Ministry for Economic Affairs and Energy within the project KI Absicherung—Safe AI for automated driving.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Abrecht, S. et al. (2020). Revisiting Neuron Coverage and Its Application to Test Generation. In: Casimiro, A., Ortmeier, F., Schoitsch, E., Bitsch, F., Ferreira, P. (eds) Computer Safety, Reliability, and Security. SAFECOMP 2020 Workshops. SAFECOMP 2020. Lecture Notes in Computer Science(), vol 12235. Springer, Cham. https://doi.org/10.1007/978-3-030-55583-2_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-55583-2_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-55582-5
Online ISBN: 978-3-030-55583-2
eBook Packages: Computer ScienceComputer Science (R0)