Image Classification Using Multiple Convolutional Neural Networks on the Fashion-MNIST Dataset
<p>The architecture of the proposed multiple convolutional neural network models with different convolutional layers and optimized hyperparameters.</p> "> Figure 2
<p>In (<b>a</b>) The Fashion Product Images Dataset (<a href="https://www.kaggle.com/paramaggarwal/fashion-product-images-small" target="_blank">https://www.kaggle.com/paramaggarwal/fashion-product-images-small</a>, accessed on the 26 October 2022) and in (<b>b</b>) the Fashion household dataset.</p> "> Figure 3
<p>In (<b>a</b>), the progress of losses related to our model is shown, and in (<b>b</b>), the progress of accuracies is described.</p> "> Figure 4
<p>The confusion matrix of our MCNN15 model using the Fashion-MNIST dataset.</p> "> Figure 5
<p>The plotted receiver operating characteristic (ROC) curve for MCNN15.</p> ">
Abstract
:1. Introduction
- We proposed new MCNN models to increase the classification performance on the Fashion-MNIST dataset. Moreover, searching hyper-optimization and data augmentation techniques are applied to improve the generalization of the models.
- We created the Fashion-Product dataset and a customized dataset of ours for confirming the network´s performance.
- We compared our models’ performance with state-of-the-art and the literature of different model structures trained on the Fashion-MNIST dataset. In addition, the performance on the Fashion-Product dataset and a customized dataset are compared by state-of-the-art and MCNN15.
2. Related Works
3. Methods
3.1. Multiple Convolutional Neural Networks Models Design
3.2. Hyperparameters Optimization
4. Experimental Set-Up
4.1. Datasets
4.1.1. Fashion-MNIST Dataset
4.1.2. Fashion-Product Dataset
4.1.3. Customized Our Dataset
5. Results
Quantitative Results
6. Discussion
7. Conclusions
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
MCNN | Multiple Convolutional Neural Networks |
HOG | Histogram of Oriented Gradients |
SURF | Speeded Up Robust Features |
SIFT | Scale-Invariant feature transform |
SVM | Support Vector Machine |
CNN | Convolutional Neural Network |
ANN | Artificial Neural Network |
LSTM | Long-Short Term Memory |
H-CNN | Hierarchical Convolutional Neural Networks |
VGG | Visual Geometry Group |
WRN | Wide Residual Network |
SCNNB | Shallow Convolutional Neural Network |
PCA | Principal Component Analysis |
GAN | Generative Adversial Network |
References
- Turchetti, G.; Micera, S.; Cavallo, F.; Odetti, L.; Dario, P. Technology and innovative services. IEEE Pulse 2011, 2, 27–35. [Google Scholar] [CrossRef] [PubMed]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 886–893. [Google Scholar] [CrossRef] [Green Version]
- Bay, H.; Tuytelaars, T.; Van Gool, L. Surf: Speeded up robust features. In Proceedings of the European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; pp. 404–417. [Google Scholar] [CrossRef]
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Viswanathan, D.G. Features from accelerated segment test (fast). In Proceedings of the 10th Workshop on Image Analysis for Multimedia Interactive Services, London, UK, 6–8 May 2009; pp. 6–8. [Google Scholar]
- Boser, B.E.; Guyon, I.M.; Vapnik, V.N. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, Pittsburgh, PA, USA, 27–29 July 1992; pp. 144–152. [Google Scholar] [CrossRef]
- Rish, I. An empirical study of the naive Bayes classifier. In Proceedings of the IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, Seattle, WT, USA, 4–10 August 2001; Volume 3, pp. 41–46. [Google Scholar]
- Bshouty, N.H.; Haddad-Zaknoon, C.A. On Learning and Testing Decision Tree. arXiv 2021, arXiv:2108.04587. [Google Scholar] [CrossRef]
- Taunk, K.; De, S.; Verma, S.; Swetapadma, A. A brief review of nearest neighbor algorithm for learning and classification. In Proceedings of the 2019 International Conference on Intelligent Computing and Control Systems (ICCS), Madurai, India, 15–17 May 2019; pp. 1255–1260. [Google Scholar] [CrossRef]
- Tharwat, A.; Gaber, T.; Ibrahim, A.; Hassanien, A.E. Linear discriminant analysis: A detailed tutorial. AI Commun. 2017, 30, 169–190. [Google Scholar] [CrossRef] [Green Version]
- Uhrig, R.E. Introduction to artificial neural networks. In Proceedings of the IECON’95-21st Annual Conference on IEEE Industrial Electronics, Orlando, FL, USA, 6–10 November 1995; Volume 1, pp. 33–37. Available online: https://ieeexplore.ieee.org/document/483329 (accessed on 26 October 2022).
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef] [Green Version]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Xu, P.; Wang, L.; Guan, Z.; Zheng, X.; Chen, X.; Tang, Z.; Fang, D.; Gong, X.; Wang, Z. Evaluating brush movements for Chinese calligraphy: A computer vision based approach. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI 2018, Stockholm, Sweden, 13–19 July 2018; pp. 1050–1056. [Google Scholar] [CrossRef] [Green Version]
- Su, H.; Wang, P.; Liu, L.; Li, H.; Li, Z.; Zhang, Y. Where to look and how to describe: Fashion image retrieval with an attentional heterogeneous bilinear network. IEEE Trans. Circuits Syst. Video Technol. 2020, 31, 3254–3265. [Google Scholar] [CrossRef]
- Shajini, M.; Ramanan, A. An improved landmark-driven and spatial–channel attentive convolutional neural network for fashion clothes classification. Vis. Comput. 2021, 37, 1517–1526. [Google Scholar] [CrossRef]
- Shajini, M.; Ramanan, A. A knowledge-sharing semi-supervised approach for fashion clothes classification and attribute prediction. Visual Comput. 2022, 38, 3551–3561. [Google Scholar] [CrossRef]
- Donati, L.; Iotti, E.; Mordonini, G.; Prati, A. Fashion product classification through deep learning and computer vision. Appl. Sci. 2019, 9, 1385. [Google Scholar] [CrossRef] [Green Version]
- Rajput, P.S.; Aneja, S. IndoFashion: Apparel classification for Indian ethnic clothes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 3935–3939. [Google Scholar]
- Eshwar, S.G.; Gautham Ganesh Prabhu, J.; Rishikesh, A.V.; Charan, N.A.; Umadevi, V. Apparel classification using convolutional neural networks. In Proceedings of the 2016 International Conference on ICT in Business Industry & Government (ICTBIG), Indore, India, 18–19 November 2016; pp. 1–5. [Google Scholar] [CrossRef]
- Agarap, A.F. An architecture combining convolutional neural network (CNN) and support vector machine (SVM) for image classification. arXiv 2017, arXiv:1712.03541. [Google Scholar] [CrossRef]
- Bhatnagar, S.; Ghosal, D.; Kolekar, M.H. Classification of fashion article images using convolutional neural networks. In Proceedings of the 2017 Fourth International Conference on Image Information Processing (ICIIP), Shimla, India, 21–23 December 2017; pp. 1–6. Available online: https://ieeexplore.ieee.org/document/8313740 (accessed on 26 October 2022).
- Leithardt, V. Classifying Garments from Fashion-MNIST Dataset Through CNNs. Adv. Sci. Technol. Eng. Syst. J. 2021, 6, 989–994. [Google Scholar] [CrossRef]
- Seo, Y.; Shin, K.S. Hierarchical convolutional neural networks for fashion image classification. Expert Syst. Appl. 2019, 116, 328–339. [Google Scholar] [CrossRef]
- Tang, Y.; Cui, H.; Liu, S. Optimal Design of Deep Residual Network Based on Image Classification of Fashion-MNIST Dataset. J. Phys. Conf. Ser. 2020, 1624, 052011. [Google Scholar] [CrossRef]
- Duan, C.; Yin, P.; Zhi, Y.; Li, X. Image classification of fashion-MNIST data set based on VGG network. In Proceedings of the 2019 2nd International Conference on Information Science and Electronic Technology (ISET 2019), Taiyuan, China, 21–22 September 2022; Volume 19. [Google Scholar] [CrossRef]
- Lei, F.; Liu, X.; Dai, Q.; Ling, B.W.K. Shallow convolutional neural network for image classification. SN Appl. Sci. 2020, 2, 1–8. [Google Scholar] [CrossRef] [Green Version]
- Saiharsha, B.; Abel Lesle, A.; Diwakar, B.; Karthika, R.; Ganesan, M. Evaluating performance of deep learning architectures for image classification. In Proceedings of the 2020 5th International Conference on Communication and Electronics Systems (ICCES), Coimbatore, India, 21–22 October 2020; pp. 917–922. [Google Scholar] [CrossRef]
- Greeshma, K.; Sreekumar, K. Fashion-MNIST classification based on HOG feature descriptor using SVM. Int. J. Innov. Technol. Explor. Eng. 2019, 8, 960–962. [Google Scholar] [CrossRef]
- Hoang, K. Image Classification with Fashion-MNIST and CIFAR10; California State University: Sacramento, CA, USA, 2018. [Google Scholar]
- Shen, S. Image Classification of Fashion-MNIST Dataset Using Long Short-Term Memory Networks. Available online: http://users.cecs.anu.edu.au/~Tom.Gedeon/conf/ABCs2018/paper/ABCs2018_paper_38.pdf (accessed on 26 October 2022).
- Zhang, K. LSTM: An Image Classification Model Based on Fashion-MNIST Dataset. Available online: http://users.cecs.anu.edu.au/~Tom.Gedeon/conf/ABCs2018/paper/ABCs2018_paper_92.pdf (accessed on 26 October 2022).
- Greeshma, K.; Sreekumar, K. Hyperparameter optimization and regularization on Fashion-MNIST classification. Int. J. Recent Technol. Eng. 2019, 8, 3713–3719. [Google Scholar]
- LeCun, Y. LeNet-5, Convolutional Neural Networks. 2015; Volume 20, p. 14. Available online: http://yann.lecun.com/exdb/lenet (accessed on 26 October 2022).
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef] [Green Version]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
- Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar] [CrossRef]
- Yang, G.W.; Jing, H.F. Multiple convolutional neural network for feature extraction. In Proceedings of the International Conference on Intelligent Computing, Fuzhou, China, 20–23 August 2015; Springer: Cham, Switzerland, 2015; pp. 104–114. [Google Scholar] [CrossRef]
- Liaw, R.; Liang, E.; Nishihara, R.; Moritz, P.; Gonzalez, J.E.; Stoica, I. Tune: A research platform for distributed model selection and training. arXiv 2018, arXiv:1807.05118. [Google Scholar] [CrossRef]
- Bergstra, J.; Yamins, D.; Cox, D.D. Hyperopt: A python library for optimizing the hyperparameters of machine learning algorithms. In Proceedings of the 12th Python in Science Conference, Austin, TX, USA, 24–29 June 2013; Volume 13, p. 20. [Google Scholar] [CrossRef]
- Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2623–2631. [Google Scholar] [CrossRef]
- Jaderberg, M.; Dalibard, V.; Osindero, S.; Czarnecki, W.M.; Donahue, J.; Razavi, A.; Vinyals, O.; Green, T.; Dunning, I.; Simonyan, K.; et al. Population based training of neural networks. arXiv 2017, arXiv:1711.09846. [Google Scholar] [CrossRef]
- Kingma, D.P.; Ba, J.A. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Soydaner, D. A comparison of optimization algorithms for deep learning. Int. J. Pattern Recognit. Artif. Intell. 2020, 34, 2052013. [Google Scholar] [CrossRef]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. In Proceedings of the 2019 Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar] [CrossRef]
- Cao, D.; Chen, Z.; Gao, L. An improved object detection algorithm based on multi-scaled and deformable convolutional neural networks. Hum.-Centric Comput. Inf. Sci. 2020, 10, 1–22. [Google Scholar] [CrossRef]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the 2014 Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
- LeCun, Y.; Misra, I. Self-Supervised Learning: The Dark Matter of Intelligence. 2021. Available online: https://ai.facebook.com/blog/self-supervised-learning-the-dark-matter-of-intelligence/ (accessed on 26 October 2022).
Parameter | Values |
---|---|
Batch size | (2, 4, 8, 16, 32, 64, 128, 256) |
Kernel size | (1, 2, 3) |
Number of filters | (32, 64, 128, 192, 256) |
Fully connected layer | (32, 64, 128, 192, 256, 512, 1024) |
Labels | Description | Examples |
---|---|---|
0 | T-shirt/Top | |
1 | Trouser | |
2 | Pullover | |
3 | Dress | |
4 | Coat | |
5 | Sandal | |
6 | Shirt | |
7 | Sneaker | |
8 | Bag | |
9 | Ankle Boot |
Model | Accuracy |
---|---|
Lenet [34] | 90.16% |
Alexnet [35] | 92.74% |
Resnet18 [12] | 93.20% |
Mobilenet [36] | 93.96% |
Efficientnet [37] | 93.64% |
VIT [46] | 90.98% |
MCNN9 | 93.88% |
MCNN12 | 93.90% |
MCNN15 | 94.04% |
MCNN18 | 93.74% |
Model | Accuracy | |
---|---|---|
H-CNN model with Vgg16 [24] | 94% | |
CNN-Softmax [21] | 91.86% | |
LSTMs [31] | 88.26% | |
LSTM [32] | 89% | |
CNNs [23] | 90.25% | (99.1%) |
CNN+HPO+Reg [33] | 93.99% | |
CNNs [22] | 92.54% | |
CNN LeNet-5 [23] | 90.64% | (98.8%) |
SVM+HOG [29] | 86.53% | |
CNN [30] | 89.54% | |
Vgg [28] | 92.3% | |
Shallow convolutional neural network [27] | 93.69% | |
VGG Network [26] | 91.5% | |
MCNN15 | 94.04% |
Lenet | Alexnet | Resnet18 | Mobilenet | Efficientnet | VIT | MCNN15 | |
---|---|---|---|---|---|---|---|
T-shirt/top | 9/10 | 10/10 | 7/10 | 9/10 | 9/10 | 10/10 | 10/10 |
Trouser | 8/10 | 8/10 | 10/10 | 10/10 | 10/10 | 10/10 | 9/10 |
Pullover | 2/10 | 3/10 | 4/10 | 7/10 | 3/10 | 1/10 | 4/10 |
Dress | 3/10 | 4/10 | 3/10 | 3/10 | 1/10 | 1/10 | 7/10 |
Coat | 8/10 | 7/10 | 8/10 | 8/10 | 7/10 | 8/10 | 8/10 |
Sandal | 4/10 | 6/10 | 5/10 | 4/10 | 6/10 | 6/10 | 7/10 |
Shirt | 0/10 | 0/10 | 0/10 | 0/10 | 0/10 | 0/10 | 0/10 |
Sneaker | 1/10 | 3/10 | 5/10 | 4/10 | 0/10 | 0/10 | 3/10 |
Bag | 8/10 | 8/10 | 7/10 | 9/10 | 9/10 | 8/10 | 4/10 |
Ankle boot | 0/10 | 1/10 | 7/10 | 3/10 | 9/10 | 1/10 | 8/10 |
Total | 43/100 | 50/100 | 56/100 | 57/100 | 54/100 | 45/100 | 60/100 |
Lenet | Alexnet | Resnet18 | Mobilenet | Efficientnet | VIT | MCNN15 | |
---|---|---|---|---|---|---|---|
T-shirt/top | 0/5 | 1/5 | 0/5 | 0/5 | 0/5 | 0/5 | 3/5 |
Trouser | 0/5 | 4/5 | 0/5 | 1/5 | 2/5 | 5/5 | 4/5 |
Pullover | 2/5 | 1/5 | 4/5 | 1/5 | 0/5 | 3/5 | 0/5 |
Dress | 1/5 | 0/5 | 1/5 | 0/5 | 0/5 | 2/5 | 3/5 |
Coat | 0/5 | 1/5 | 0/5 | 1/5 | 0/5 | 0/5 | 0/5 |
Sandal | 0/5 | 0/5 | 0/5 | 0/5 | 0/5 | 0/5 | 0/5 |
Shirt | 1/5 | 2/5 | 1/5 | 3/5 | 0/5 | 1/5 | 2/5 |
Sneaker | 0/5 | 0/5 | 0/5 | 0/5 | 0/5 | 0/5 | 0/5 |
Bag | 5/5 | 1/5 | 5/5 | 5/5 | 5/5 | 5/5 | 3/5 |
Ankle boot | 0/5 | 0/5 | 4/5 | 0/5 | 3/5 | 0/5 | 5/5 |
Total | 9/50 | 10/50 | 15/50 | 11/50 | 10/50 | 16/50 | 20/50 |
Model | Accuracy | Number of Parameters |
---|---|---|
Lenet [34] | 90.16% | 47,540 |
Alexnet [35] | 92.74% | 58,302,196 |
Resnet18 [12] | 93.20% | 11,175,370 |
Mobilenet [36] | 93.96% | 2,236,106 |
Efficientnet [37] | 93.64% | 4,019,782 |
VIT [46] | 90.98% | 212,490 |
MCNN9 | 93.88% | 655,434 |
MCNN12 | 93.90% | 971,882 |
MCNN15 | 94.04% | 2,595,914 |
MCNN18 | 93.74% | 3,004,362 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nocentini, O.; Kim, J.; Bashir, M.Z.; Cavallo, F. Image Classification Using Multiple Convolutional Neural Networks on the Fashion-MNIST Dataset. Sensors 2022, 22, 9544. https://doi.org/10.3390/s22239544
Nocentini O, Kim J, Bashir MZ, Cavallo F. Image Classification Using Multiple Convolutional Neural Networks on the Fashion-MNIST Dataset. Sensors. 2022; 22(23):9544. https://doi.org/10.3390/s22239544
Chicago/Turabian StyleNocentini, Olivia, Jaeseok Kim, Muhammad Zain Bashir, and Filippo Cavallo. 2022. "Image Classification Using Multiple Convolutional Neural Networks on the Fashion-MNIST Dataset" Sensors 22, no. 23: 9544. https://doi.org/10.3390/s22239544
APA StyleNocentini, O., Kim, J., Bashir, M. Z., & Cavallo, F. (2022). Image Classification Using Multiple Convolutional Neural Networks on the Fashion-MNIST Dataset. Sensors, 22(23), 9544. https://doi.org/10.3390/s22239544