Simplified Routing Mechanism for Capsule Networks
<p>Typical structure of a neuron. (green: inputs, blue: operations, yellow: output, purple: neuron).</p> "> Figure 2
<p>Typical structure of a capsule. (green: inputs, red: prediction vectors, blue: operations, yellow: output, purple: capsule).</p> "> Figure 3
<p>Our squash activation function with different <math display="inline"><semantics> <mi>ε</mi> </semantics></math> values.</p> "> Figure 4
<p>Block diagram of the dynamic routing algorithm by Sabour et al. [<a href="#B11-algorithms-16-00336" class="html-bibr">11</a>]. (green: inputs, yellow: operations, blue: activations, purple: internal tensors).</p> "> Figure 5
<p>Block diagram of our proposed routing algorithm. (green: inputs, yellow: operations, blue: activations, purple: internal tensors).</p> "> Figure 6
<p>The capsule network architecture used in the research, based on work by Sabour et al. [<a href="#B11-algorithms-16-00336" class="html-bibr">11</a>]. (green: input, purple: convolutional layer, yellow: primary capsule layer, red: secondary capsule layer, gray: prediction).</p> "> Figure 7
<p>Sample data from MNIST dataset.</p> "> Figure 8
<p>Sample data from Fashion-MNIST dataset.</p> "> Figure 9
<p>Sample data from SmallNORB dataset.</p> "> Figure 10
<p>Sample data from CIFAR10 dataset.</p> "> Figure 11
<p>Sample data from SVHN dataset.</p> "> Figure 12
<p>Sample data from GTSRB dataset.</p> "> Figure 13
<p>Classification test accuracy on MNIST dataset.</p> "> Figure 14
<p>Classification test accuracy on Fashion-MNIST dataset.</p> "> Figure 15
<p>Classification test accuracy on SmallNORB dataset.</p> "> Figure 16
<p>Classification test accuracy on CIFAR10 dataset.</p> "> Figure 17
<p>Classification test accuracy on SVHN dataset.</p> "> Figure 18
<p>Classification test accuracy on GTSRB dataset.</p> "> Figure 19
<p>Classification test loss on MNIST dataset.</p> "> Figure 20
<p>Classification test loss on Fashion-MNIST dataset.</p> "> Figure 21
<p>Classification test loss on SmallNORB dataset.</p> "> Figure 22
<p>Classification test loss on CIFAR10 dataset.</p> "> Figure 23
<p>Classification test loss on SVHN dataset.</p> "> Figure 24
<p>Classification test loss on GTSRB dataset.</p> "> Figure 25
<p>Comparison of training time on the same hardware (Nvidia Quadro RTX4000 series).</p> "> Figure 26
<p>Confusion matrices for the capsule-based networks.</p> ">
Abstract
:1. Introduction
2. Theory of Capsule Network
3. Improved Routing Algorithm
4. Network Architecture
5. Datasets
5.1. MNIST
5.2. Fashion-MNIST
5.3. SmallNORB
5.4. CIFAR10
5.5. SVHN
5.6. GTSRB
6. Results
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Chen, X.; Liang, C.; Huang, D.; Real, E.; Wang, K.; Liu, Y.; Pham, H.; Dong, X.; Luong, T.; Hsieh, C.; et al. Symbolic Discovery of Optimization Algorithms. arXiv 2023, arXiv:2302.06675. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In Proceedings of the International Conference on Learning Representations (ICLR), Vienna, Austria, 4 May 2021. [Google Scholar]
- Wang, W.; Dai, J.; Chen, Z.; Huang, Z.; Li, Z.; Zhu, X.; Hu, X.; Lu, T.; Lu, L.; Li, H.; et al. InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions. arXiv 2023, arXiv:2211.05778. [Google Scholar]
- Ghiasi, G.; Cui, Y.; Srinivas, A.; Qian, R.; Lin, T.; Cubuk, E.D.; Le, Q.V.; Zoph, B. Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation. In Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR), online, 19–25 June 2021. [Google Scholar]
- Su, W.; Zhu, X.; Tao, C.; Lu, L.; Li, B.; Huang, G.; Qiao, Y.; Wang, X.; Zhou, J.; Dai, J. Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information. arXiv 2022, arXiv:2211.09807. [Google Scholar]
- Yuan, Y.; Chen, X.; Chen, X.; Wang, J. Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Online, 23–28 August 2020. [Google Scholar]
- Fang, Y.; Wang, W.; Xie, B.; Sun, Q.; Wu, L.; Wang, X.; Huang, T.; Wang, X.; Cao, Y. EVA: Exploring the Limits of Masked Visual Representation Learning at Scale. arXiv 2022, arXiv:2211.07636. [Google Scholar]
- Zhang, H.; Li, F.; Zou, X.; Liu, S.; Li, C.; Gao, J.; Yang, J.; Zhang, L. A Simple Framework for Open-Vocabulary Segmentation and Detection. arXiv 2023, arXiv:2303.08131. [Google Scholar]
- Zafar, A.; Aamir, M.; Mohd Nawi, N.; Arshad, A.; Riaz, S.; Alruban, A.; Dutta, A.K.; Almotairi, S. A Comparison of Pooling Methods for Convolutional Neural Networks. Appl. Sci. 2022, 12, 8643. [Google Scholar] [CrossRef]
- Hinton, G.E.; Krizhevsky, A.; Wang, S.D. Transforming Auto-Encoders. International Conference on Artificial Neural Networks; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2011; Volume 6791, pp. 44–51. [Google Scholar]
- Sabour, S.; Frosst, N.; Hinton, G.E. Dynamic Routing Between Capsules. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 4–7 December 2017. [Google Scholar]
- Hinton, G.E.; Sabour, S.; Frosst, N. Matrix capsules with EM routing. In Proceedings of the 6th International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- LeCun, Y.; Cortes, C.; Burges, C.J.C. The MNIST Database of Handwritten Digits. 2012. Available online: http://yann.lecun.com/exdb/mnist/ (accessed on 9 April 2023).
- Fukushima, K. Visual Feature Extraction by a Multilayered Network of Analog Threshold Elements. In IEEE Transactions on Systems Science and Cybernetics, October 1969; IEEE: Piscataway, NJ, USA, 1969; Volume 5, pp. 322–333. [Google Scholar]
- Xiao, H.; Rasul, K.; Vollgraf, R. Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv 2017, arXiv:1708.07747. [Google Scholar]
- LeCun, Y.; Huang, F.J.; Bottou, L. Learning methods for generic object recognition with invariance to pose and lighting. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Washington, DC, USA, 27 June–2 July 2004; pp. 97–104. [Google Scholar]
- Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images; Technical Report; University of Toronto: Toronto, ON, Canada, 2009. [Google Scholar]
- Netzer, Y.; Wang, T.; Coates, A.; Bissacco, A.; Wu, B.; Ng, A.Y. Reading Digits in Natural Images with Unsupervised Feature Learning. In Proceedings of the 25th Conference on Neural Information Processing Systems, Granada, Spain, 12–17 December 2011. [Google Scholar]
- Stallkamp, J.; Schlipsing, M.; Salmen, J.; Igel, C. The German traffic sign recognition benchmark: A multi-class classification competition. In Proceedings of the International Joint Conference on Neural Networks, San Jose, CA, USA, 31 July–5 August 2011; pp. 1453–1460. [Google Scholar]
- Heinsen, F.A. An Algorithm for Routing Vectors in Sequences. arXiv 2022, arXiv:2211.11754. [Google Scholar]
- Rossum, V.G.; Fred, L.D. Python 3 Reference Manual; CreateSpace: Scotts Valley, CA, USA, 2009. [Google Scholar]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. Adv. Neural Inf. Process. Syst. 2019, 32, 8024–8035. [Google Scholar]
- Paperspace. Available online: https://www.paperspace.com/ (accessed on 9 April 2023).
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2017, arXiv:1412.6980. [Google Scholar]
- Goyal, P.; Duval, Q.; Seessel, I.; Caron, M.; Misra, I.; Sagun, L.; Joulin, A.; Bojanowski, P. Vision Models Are More Robust and Fair When Pretrained On Uncurated Images without Supervision. arXiv 2022, arXiv:2202.08360. [Google Scholar] [CrossRef]
- Taylor, L.; King, A.; Harper, N. Robust and Accelerated Single-Spike Spiking Neural Network Training with Applicability to Challenging Temporal Tasks. arXiv 2022, arXiv:2205.15286. [Google Scholar] [CrossRef]
- Phaye, S.S.R.; Sikka, A.; Dhall, A.; Bathula, D. Dense and Diverse Capsule Networks: Making the Capsules Learn Better. arXiv 2018, arXiv:1805.04001. [Google Scholar] [CrossRef]
- Remerscheid, N.W.; Ziller, A.; Rueckert, D.; Kaissis, G. SmoothNets: Optimizing CNN Architecture Design for Differentially Private Deep Learning. arXiv 2022, arXiv:2205.04095. [Google Scholar] [CrossRef]
- Dupont, E.; Doucet, A.; Teh, Y.W. Augmented Neural ODEs. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2019; Volume 32. [Google Scholar]
- Abad, G.; Ersoy, O.; Picek, S.; Urbieta, A. Sneaky Spikes: Uncovering Stealthy Backdoor Attacks in Spiking Neural Networks with Neuromorphic Data. arXiv 2023, arXiv:2302.06279. [Google Scholar] [CrossRef]
Dataset | Image Size | Channels | Classes | Train Set | Test Set | Background |
---|---|---|---|---|---|---|
MNIST [13] | (28, 28) | 1 | 10 | 60,000 | 10,000 | false |
F-MNIST [15] | (28, 28) | 1 | 10 | 60,000 | 10,000 | false |
SmallNORB [16] | (48, 48) | 1 | 5 | 48,600 | 48,600 | true |
CIFAR10 [17] | (32, 32) | 3 | 10 | 50,000 | 10,000 | true |
SVHN [18] | (32, 32) | 3 | 10 | 73,257 | 26,032 | true |
GTSRB [19] | (32, 32) | 3 | 43 | 26,640 | 12,630 | true |
MNIST | F-MNIST | S.NORB | CIFAR10 | SVHN | GTSRB | |
---|---|---|---|---|---|---|
Goyal et al. [25] | 0.58% | - | - | 10% | 13.6% | 9.29% |
Taylor et al. [26] | 2.09% | 10.95% | - | - | - | - |
Phaye et al. [27] | - | - | 5.57% | - | - | - |
Remerscheid et al. [28] | - | - | - | 26.5% | - | - |
Dupont et al. [29] | 1.8% | - | - | 39.4% | 16.5% | - |
Abad et al. [30] | 0.6% | - | - | 31.7% | - | - |
Sabour et al. [11] | 0.45% | 8.35% | 9.15% | 29.36% | 8.05% | 2.67% |
Heinsen [20] | 0.7% | 9.25% | 10.18% | 41.18% | 11.36% | 9.79% |
Ours | 0.41% | 8.35% | 8.54% | 28.26% | 6.90% | 2.22% |
Sabour et al. [11] | Ours | Heinsen [20] | |
---|---|---|---|
0 | 0.9969 | 0.9980 | 0.9980 |
1 | 0.9974 | 0.9974 | 0.9965 |
2 | 0.9932 | 0.9961 | 0.9932 |
3 | 0.9941 | 0.9950 | 0.9941 |
4 | 0.9908 | 0.9929 | 0.9847 |
5 | 0.9933 | 0.9933 | 0.9944 |
6 | 0.9916 | 0.9948 | 0.9896 |
7 | 0.9971 | 0.9961 | 0.9893 |
8 | 0.9959 | 0.9969 | 0.9918 |
9 | 0.9891 | 0.9921 | 0.9851 |
Sabour et al. [11] | Ours | Heinsen [20] | |
---|---|---|---|
T-shirt/top | 0.8920 | 0.9060 | 0.8410 |
Trouser | 0.9780 | 0.9800 | 0.9780 |
Pullover | 0.8560 | 0.8800 | 0.8710 |
Dress | 0.9230 | 0.9190 | 0.8900 |
Coat | 0.8370 | 0.8390 | 0.8400 |
Sandal | 0.9810 | 0.9810 | 0.9740 |
Shirt | 0.7320 | 0.6770 | 0.6340 |
Sneaker | 0.9790 | 0.9780 | 0.9820 |
Bag | 0.9840 | 0.9820 | 0.9750 |
Ankle boot | 0.9600 | 0.9620 | 0.9530 |
Sabour et al. [11] | Ours | Heinsen [20] | |
---|---|---|---|
Airplane | 0.6280 | 0.7100 | 0.6110 |
Automobile | 0.7780 | 0.6810 | 0.6260 |
Bird | 0.4630 | 0.5180 | 0.4230 |
Cat | 0.4660 | 0.4440 | 0.3670 |
Deer | 0.6930 | 0.5970 | 0.4340 |
Dog | 0.5480 | 0.6450 | 0.5110 |
Frog | 0.8110 | 0.7530 | 0.5380 |
Horse | 0.6710 | 0.7300 | 0.5830 |
Ship | 0.8480 | 0.7590 | 0.6260 |
Truck | 0.7810 | 0.8600 | 0.5920 |
Sabour et al. [11] | Ours | Heinsen [20] | |
---|---|---|---|
Animal | 0.8496 | 0.8645 | 0.8681 |
Human | 0.9295 | 0.9513 | 0.7709 |
Plane | 0.9206 | 0.9210 | 0.8175 |
Truck | 0.9958 | 0.9932 | 0.9610 |
Car | 0.7075 | 0.7973 | 0.8192 |
Sabour et al. [11] | Ours | Heinsen [20] | |
---|---|---|---|
Speed limit 20 km/h | 0.8667 | 0.8167 | 0.6500 |
Speed limit 30 km/h | 0.9889 | 0.9917 | 0.9458 |
Speed limit 50 km/h | 0.9920 | 0.9893 | 0.9707 |
Speed limit 60 km/h | 0.9889 | 0.9756 | 0.9333 |
Speed limit 70 km/h | 0.9758 | 0.9758 | 0.8727 |
Speed limit 80 km/h | 0.9619 | 0.9810 | 0.8000 |
Speed limit 100 km/h | 0.8533 | 0.8533 | 0.8067 |
Speed limit 120 km/h | 0.9133 | 0.8822 | 0.8289 |
End of speed limit (80 km/h) | 0.9444 | 0.9511 | 0.9000 |
No passing | 0.9958 | 1.0000 | 0.9458 |
No passing for vehicles over 3.5 metric t | 0.9924 | 0.9955 | 0.9697 |
Right-of-way at the next intersection | 0.9619 | 0.9714 | 0.8929 |
Priority road | 0.9812 | 0.9884 | 0.9087 |
Yield | 0.9972 | 0.9972 | 0.9861 |
Stop | 1.0000 | 1.0000 | 0.9630 |
No vehicles | 1.0000 | 1.0000 | 0.9762 |
Vehicles over 3.5 metric tons prohibited | 0.9933 | 0.9933 | 0.9533 |
No entry | 0.9972 | 0.9972 | 0.8806 |
General caution | 0.9385 | 0.9410 | 0.7231 |
Dangerous curve to the left | 1.0000 | 1.0000 | 0.4833 |
Dangerous curve to the right | 0.9889 | 0.9889 | 0.8889 |
Double curve | 0.7667 | 0.7000 | 0.6778 |
Bumpy road | 0.9833 | 0.9917 | 0.9083 |
Slippery road | 0.9467 | 0.9267 | 0.6667 |
Road narrows on the right | 0.9444 | 0.9444 | 0.5333 |
Road work | 0.9542 | 0.9500 | 0.9438 |
Traffic signals | 0.8278 | 0.8167 | 0.7833 |
Pedestrians | 0.5000 | 0.5667 | 0.5000 |
Children crossing | 0.9933 | 0.9933 | 0.9200 |
Bicycles crossing | 1.0000 | 1.0000 | 0.9222 |
Beware of ice/snow | 0.7667 | 0.7733 | 0.5000 |
Wild animals crossing | 0.9815 | 0.9741 | 0.9074 |
End of all speed and passing limits | 1.0000 | 1.0000 | 1.0000 |
Turn right ahead | 0.9952 | 0.9952 | 0.9476 |
Turn left ahead | 0.9917 | 0.9917 | 0.9833 |
Ahead only | 0.9923 | 0.9974 | 0.9538 |
Go straight or right | 0.9750 | 0.9667 | 0.9417 |
Go straight or left | 1.0000 | 1.0000 | 0.9000 |
Keep right | 0.9739 | 0.9870 | 0.9087 |
Keep left | 1.0000 | 1.0000 | 0.8000 |
Roundabout mandatory | 0.9667 | 0.9778 | 0.8000 |
End of no passing | 0.8000 | 0.7667 | 0.5500 |
End of no passing by vehicles over 3.5 t | 0.9778 | 0.9778 | 0.9778 |
Sabour et al. [11] | Ours | Heinsen [20] | |
---|---|---|---|
0 | 0.9151 | 0.8928 | 0.9002 |
1 | 0.9686 | 0.9547 | 0.9280 |
2 | 0.9393 | 0.9549 | 0.9152 |
3 | 0.8636 | 0.8550 | 0.7696 |
4 | 0.9080 | 0.9164 | 0.9045 |
5 | 0.8624 | 0.8968 | 0.8335 |
6 | 0.8402 | 0.8720 | 0.8255 |
7 | 0.8579 | 0.8742 | 0.8757 |
8 | 0.7861 | 0.8193 | 0.7657 |
9 | 0.8245 | 0.8621 | 0.8301 |
MNIST | F-MNIST | SmallNORB | CIFAR10 | SVHN | GTSRB | |
---|---|---|---|---|---|---|
Sabour et al. [11] | 15% | 35% | 50% | 20% | 30% | 51.94% |
Heinsen [20] | 15% | 20% | 0% | 40% | 10% | 0.77% |
Ours | 70% | 45% | 50% | 40% | 60% | 47.29% |
MNIST | F-MNIST | SmallNORB | CIFAR10 | SVHN | GTSRB | |
---|---|---|---|---|---|---|
Sabour et al. [11] | 0.9939 | 0.9159 | 0.9056 | 0.6790 | 0.8958 | 0.9513 |
Heinsen [20] | 0.9916 | 0.8937 | 0.8419 | 0.5311 | 0.8532 | 0.8304 |
Ours | 0.9948 | 0.9124 | 0.9138 | 0.6925 | 0.9012 | 0.9519 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hollósi, J.; Ballagi, Á.; Pozna, C.R. Simplified Routing Mechanism for Capsule Networks. Algorithms 2023, 16, 336. https://doi.org/10.3390/a16070336
Hollósi J, Ballagi Á, Pozna CR. Simplified Routing Mechanism for Capsule Networks. Algorithms. 2023; 16(7):336. https://doi.org/10.3390/a16070336
Chicago/Turabian StyleHollósi, János, Áron Ballagi, and Claudiu Radu Pozna. 2023. "Simplified Routing Mechanism for Capsule Networks" Algorithms 16, no. 7: 336. https://doi.org/10.3390/a16070336