Deep Neural Network Confidence Calibration from Stochastic Weight Averaging
<p>Illustration of the 3D loss landscape with SGD optimization of the DNN during the training phase.</p> "> Figure 2
<p>The data flow of fusion during the testing phase of our method, where <math display="inline"><semantics> <mrow> <msub> <mi>θ</mi> <mn>1</mn> </msub> <mo>,</mo> <msub> <mi>θ</mi> <mn>2</mn> </msub> <mo>,</mo> <mo>…</mo> <mo>,</mo> <msub> <mi>θ</mi> <mi>m</mi> </msub> </mrow> </semantics></math> indicate the set of <span class="html-italic">m</span> base estimators, and <math display="inline"><semantics> <mover accent="true"> <mi>y</mi> <mo>^</mo> </mover> </semantics></math> represents the output of base estimator, <math display="inline"><semantics> <msub> <mi>θ</mi> <mi>m</mi> </msub> </semantics></math>, on sample <span class="html-italic">x</span>.</p> "> Figure 3
<p>Illustration of the learning rate schedule. During the initial <math display="inline"><semantics> <mrow> <mn>75</mn> <mo>%</mo> </mrow> </semantics></math> of training, a standard decaying schedule is employed, followed by a high constant value for the remaining <math display="inline"><semantics> <mrow> <mn>25</mn> <mo>%</mo> </mrow> </semantics></math>. The dots of different colors represent the weights of the model in different training epochs.</p> "> Figure 4
<p>Comparison of reliability diagram and confidence histogram on the Jester test set. The upper charts are visualizations of average confidence and accuracy, and below them are reliability diagrams from (<b>left</b>) the SGD optimization method and (<b>right</b>) from our method.</p> "> Figure 5
<p>Comparison of reliability diagram and confidence histogram from VGG-16 trained on the CINIC-10 test set. The upper charts are visualizations of average confidence and accuracy, and below them are reliability diagrams from (<b>left</b>) the SGD optimization method and (<b>right</b>) from our method.</p> ">
Abstract
:1. Introduction
- (1)
- We propose an alternate ensemble learning approach to improve the quality of neural network uncertainty measures to overcome overconfidence without incurring additional computational costs.
- (2)
- We evaluate our approach using two benchmarks with different modalities: static images and dynamic videos. The results of our experiments demonstrate that our approach successfully reduces calibration error and enhances the model’s accuracy.
2. Related Work
2.1. Confidence Calibration of a Deep Neural Network
2.2. Deep Ensemble Learning
3. Confidence Calibration
3.1. Approximate Deep Ensemble Learning
3.2. Stochastic Weighted Averaging
4. Experiment Results
4.1. Evaluation Calibration Quality
4.2. Application 1: Gesture Recognition Task
4.3. Application 2: Image Classification Task
5. Discussion and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Jiang, X.; Deng, X. Knowledge reverse distillation based confidence calibration for deep neural networks. Neural Process. Lett. 2023, 55, 345–360. [Google Scholar] [CrossRef]
- Guo, C.; Pleiss, G.; Sun, Y.; Weinberger, K.Q. On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning, ICML’17, Sydney, NSW, Australia, 6–11 August 2017; pp. 1321–1330. [Google Scholar]
- Gawlikowski, J.; Tassi, C.R.N.; Ali, M.; Lee, J.; Humt, M.; Feng, J.; Kruspe, A.; Triebel, R.; Jung, P.; Roscher, R.; et al. A survey of uncertainty in deep neural networks. Artif. Intell. Rev. 2023, 56, 1513–1589. [Google Scholar] [CrossRef]
- Abdar, M.; Pourpanah, F.; Hussain, S.; Rezazadegan, D.; Liu, L.; Ghavamzadeh, M.; Fieguth, P.; Cao, X.; Khosravi, A.; Acharya, U.R.; et al. A review of uncertainty quantification in deep learning: Techniques, applications and challenges. Inf. Fusion. 2021, 76, 243–297. [Google Scholar] [CrossRef]
- Jospin, L.V.; Laga, H.; Boussaid, F.; Buntine, W.; Bennamoun, M. Hands-on Bayesian neural networks—A tutorial for deep learning users. IEEE Comput. Intell. Mag. 2022, 17, 29–48. [Google Scholar] [CrossRef]
- Wang, H.; Yeung, D.Y. A survey on Bayesian deep learning. ACM Comput. Surv. 2020, 53, 1–37. [Google Scholar] [CrossRef]
- Munir, M.A.; Khan, M.H.; Khan, S.; Khan, F.S. Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detection. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Los Alamitos, CA, USA, 28 June 2023; pp. 11474–11483. [Google Scholar]
- Lee, J.; Park, S. A Study on the Calibrated Confidence of Text Classification Using a Variational Bayes. Appl. Sci. 2022, 12, 9007. [Google Scholar] [CrossRef]
- Psaros, A.F.; Meng, X.; Zou, Z.; Guo, L.; Karniadakis, G.E. Uncertainty quantification in scientific machine learning: Methods, metrics, and comparisons. J. Comput. Phys. 2023, 477, 111902. [Google Scholar] [CrossRef]
- Ganaie, M.A.; Hu, M.; Malik, A.; Tanveer, M.; Suganthan, P. Ensemble deep learning: A review. Eng. Appl. Artif. Intell. 2022, 115, 105151. [Google Scholar] [CrossRef]
- Lakshminarayanan, B.; Pritzel, A.; Blundell, C. Simple and scalable predictive uncertainty estimation using deep ensembles. Adv. Neural Inf. Process. Syst. 2017, 30, 1–12. [Google Scholar]
- Ovadia, Y.; Fertig, E.; Ren, J.; Nado, Z.; Sculley, D.; Nowozin, S.; Dillon, J.; Lakshminarayanan, B.; Snoek, J. Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift. Adv. Neural Inf. Process. Syst. 2019, 32, 1–12. [Google Scholar]
- Huang, G.; Li, Y.; Pleiss, G.; Liu, Z.; Hopcroft, J.E.; Weinberger, K.Q. Snapshot Ensembles: Train 1, get M for free. arXiv 2017, arXiv:1704.00109. [Google Scholar]
- Garipov, T.; Izmailov, P.; Podoprikhin, D.; Vetrov, D.P.; Wilson, A.G. Loss surfaces, mode connectivity, and fast ensembling of dnns. Adv. Neural Inf. Process. Syst. 2018, 31, 1–10. [Google Scholar]
- Izmailov, P.; Podoprikhin, D.; Garipov, T.; Vetrov, D.P.; Wilson, A.G. Averaging Weights Leads to Wider Optima and Better Generalization. In Proceedings of the Conference on Uncertainty in Artificial Intelligence, Monterey, CA, USA, 6–10 August 2018. [Google Scholar]
- Gal, Y.; Ghahramani, Z. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 1050–1059. [Google Scholar]
- Mller, R.; Kornblith, S.; Hinton, G.E. When does label smoothing help? Adv. Neural Inf. Process. Syst. 2019, 32, 1–10. [Google Scholar]
- Rahaman, R. Uncertainty quantification and deep ensembles. Adv. Neural Inf. Process. Syst. 2021, 34, 20063–20075. [Google Scholar]
- Patel, K.; Beluch, W.; Zhang, D.; Pfeiffer, M.; Yang, B. On-manifold adversarial data augmentation improves uncertainty calibration. In Proceedings of the 25th International Conference on Pattern Recognition, Milan, Italy, 10–15 January 2021; pp. 8029–8036. [Google Scholar]
- Yang, Y.; Lv, H.; Chen, N. A survey on ensemble learning under the era of deep learning. Artif. Intell. Rev. 2023, 56, 5545–5589. [Google Scholar] [CrossRef]
- Mienye, I.D.; Sun, Y. A survey of ensemble learning: Concepts, algorithms, applications, and prospects. IEEE Access 2022, 10, 99129–99149. [Google Scholar] [CrossRef]
- Mahajan, P.; Uddin, S.; Hajati, F.; Moni, M.A. Ensemble Learning for Disease Prediction: A Review. Healthcare 2023, 11, 1808. [Google Scholar] [CrossRef]
- Guo, H.; Liu, H.; Li, R.; Wu, C.; Guo, Y.; Xu, M. Margin diversity based ordering ensemble pruning. Neurocomputing 2018, 275, 237–246. [Google Scholar] [CrossRef]
- Fernando, K.R.M.; Tsokos, C.P. Dynamically weighted balanced loss: Class imbalanced learning and confidence calibration of deep neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 2940–2951. [Google Scholar] [CrossRef]
- Materzynska, J.; Berger, G.; Bax, I.; Memisevic, R. The Jester Dataset: A Large-Scale Video Dataset of Human Gestures. In Proceedings of the 2019 International Conference on Computer Vision Workshop, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 2874–2882. [Google Scholar]
- Wei, H.; Xie, R.; Cheng, H.; Feng, L.; An, B.; Li, Y. Mitigating neural network overconfidence with logit normalization. In Proceedings of the International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; pp. 23631–23644. [Google Scholar]
- Darlow, L.N.; Crowley, E.J.; Antoniou, A.; Storkey, A.J. Cinic-10 is not imagenet or cifar-10. arXiv 2018, arXiv:1810.03505. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Zagoruyko, S.; Komodakis, N. Wide residual networks. arXiv 2016, arXiv:1605.07146. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Method | Jester Test Set | ||||
---|---|---|---|---|---|
Acc. | Conf. | RMSCE | MCE | ECE | |
Baseline (SGD) | 83.89 | 87.65 | 8.73 | 4.53 | 3.76 |
MC-Dropout | 84.21 | 89.91 | 3.70 | 8.34 | 4.83 |
Logits-Scaling | 83.32 | 87.21 | 5.19 | 4.66 | 4.24 |
Ours | 3.71 |
Architecture | Method | CINIC-10 Test Set | ||||
---|---|---|---|---|---|---|
Acc. | Conf. | RMSCE | MCE | ECE | ||
ResNet-50 | Baseline (SGD) | 72.92 | 75.93 | 5.22 | 3.39 | 3.01 |
MC-Dropout | 73.03 | 76.28 | 5.70 | 3.62 | 3.25 | |
Logits-Scaling | 72.46 | 75.66 | 6.02 | 3.59 | 3.21 | |
Ours | ||||||
Wide-ResNet-50 | Baseline (SGD) | 73.06 | 77.01 | 7.48 | 4.43 | 3.97 |
MC-Dropout | 72.82 | 77.05 | 7.85 | 4.83 | 4.31 | |
Logits-Scaling | 69.01 | 72.04 | 5.67 | 3.57 | 3.07 | |
Ours | ||||||
VGG-16 | Baseline (SGD) | 78.07 | 82.69 | 8.95 | 5.23 | 4.62 |
MC Dropout | 78.97 | 85.23 | 20.01 | 7.04 | 6.29 | |
Logits-Scaling | 76.08 | 78.42 | 5.79 | 2.56 | 2.37 | |
Ours |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cao, Z.; Li, Y.; Kim, D.-H.; Shin, B.-S. Deep Neural Network Confidence Calibration from Stochastic Weight Averaging. Electronics 2024, 13, 503. https://doi.org/10.3390/electronics13030503
Cao Z, Li Y, Kim D-H, Shin B-S. Deep Neural Network Confidence Calibration from Stochastic Weight Averaging. Electronics. 2024; 13(3):503. https://doi.org/10.3390/electronics13030503
Chicago/Turabian StyleCao, Zongjing, Yan Li, Dong-Ho Kim, and Byeong-Seok Shin. 2024. "Deep Neural Network Confidence Calibration from Stochastic Weight Averaging" Electronics 13, no. 3: 503. https://doi.org/10.3390/electronics13030503