Micro-Expression-Based Emotion Recognition Using Waterfall Atrous Spatial Pyramid Pooling Networks
<p>Differences in facial muscle movement for happy emotion among the test subjects: (<b>a</b>) subject 1; (<b>b</b>) subject 2; (<b>c</b>) subject 3.</p> "> Figure 2
<p>Basic SPP module architecture.</p> "> Figure 3
<p>Two placement strategies of the SPP module in the base CNN model.</p> "> Figure 4
<p>Basic ASPP module architecture.</p> "> Figure 5
<p>Two placement strategies of the ASPP module in the base CNN model.</p> "> Figure 6
<p>Direct network flow of the SPP and ASPP modules: (<b>a</b>) DSPP-Net architecture; (<b>b</b>) DASPP-Net architecture.</p> "> Figure 7
<p>Waterfall network flow for SPP and ASPP modules: (<b>a</b>) WSPP-Net architecture; (<b>b</b>) WASPP-Net architecture.</p> "> Figure 8
<p>The training graph performance: (<b>a</b>) DSPP-Net architecture; (<b>b</b>) WSPP-Net architecture.</p> "> Figure 9
<p>The training graph performance: (<b>a</b>) DASPP-Net architecture; (<b>b</b>) WASPP-Net architecture.</p> ">
Abstract
:1. Introduction
2. Recent Works
3. Methodology
3.1. Dataset
3.2. CNN Architecture Model
3.3. Emotion Classification Based on the SPP Module
3.4. Emotion Classification Based on the ASPP Module
3.5. Direct and Waterfall for SPP and ASPP Module
4. Results and Discussions
4.1. Training Setup
- Accuracy (Ac): the ratio of correctly predicted results compared to the number of samples. The formula for calculating the accuracy is shown in Equation (2), where T(+ve) is the true positive, T(−ve) is the true negative, and Ts is the total number of samples.
- F1 score: the mean harmonic for recall, Re, and precision, Pr. It captures a balanced metric between recall and precision metrics with an output range between 0 and 1. If the model has a perfect recall and accuracy values, then its F1 score is 1, whereas, if one or both recall and accuracy are 0, then its F1 score will be 0. The F1 score formulas are shown in Equations (3)–(5), where F(+ve) indicates the false positive detection, and F(−ve) indicates the false negative detection.
4.2. SPP Module Results Based on the Position and Number of Parallel Branches
4.3. ASPP Module Results Based on the Position and Number of Parallel Branches
4.4. SPP and ASPP Module Using Direct and Waterfall Network Flows
4.5. Benchmark to the State-of-the-Art the Algorithms
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Dalal, N.; Triggs, B. Histograms of Oriented Gradients for Human Detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diago, CA, USA, 20–26 June 2005; pp. 886–893. [Google Scholar]
- Liu, Y.-J.; Zhang, J.-K.; Yan, W.-J.; Wang, S.-J.; Zhao, G.; Fu, X. A Main Directional Mean Optical Flow Feature for Spontaneous Micro-Expression Recognition. IEEE Trans. Affect. Comput. 2016, 7, 299–310. [Google Scholar] [CrossRef]
- Liong, S.-T.; See, J.; Wong, K.; Phan, R.C.-W. Less is more: Micro-expression recognition from video using apex frame. Signal Process. Image Commun. 2018, 62, 82–92. [Google Scholar] [CrossRef] [Green Version]
- Zhao, G.; Pietikainen, M. Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 915–928. [Google Scholar] [CrossRef] [Green Version]
- Umar, K.I.; Gokcekus, H. Modeling severity of road traffic accident in Nigeria using artificial neural network. J. Kejuruter. 2019, 31, 221–227. [Google Scholar]
- Sian, L.J.; Stofa, M.M.; Min, K.S.; Zulkifley, M.A. Micro Expression Recognition: Multi-scale Approach to Automatic Emotion Recognition by using Spatial Pyramid Pooling Module. Int. J. Adv. Comput. Sci. Appl. 2021, 12, 583–596. [Google Scholar] [CrossRef]
- Davison, A.K.; Lansley, C.; Ng, C.C.; Tan, K.; Yap, M.H. Objective Micro-Facial Movement Detection Using FACS-Based Regions and Baseline Evaluation. In Proceedings of the 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018), Xi’an, China, 15–19 May 2018; pp. 642–649. [Google Scholar]
- Liong, S.T.; See, J.; Phan, R.C.-W.; Oh, Y.H.; le Ngo, A.C.; Wong, K.; Tan, S.W. Spontaneous Subtle Expression Detection and Recognition based on Facial Strain. Signal Process. Image Commun. 2016, 47, 170–182. [Google Scholar] [CrossRef] [Green Version]
- Liong, S.-T.; Gan, Y.S.; Yau, W.-C.; Huang, Y.-C.; Ken, T.L. OFF-ApexNet on Micro-expression Recognition System. Signal Process. Image Commun. 2019, 74, 129–139. [Google Scholar]
- Olaf, R.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Hyeonseob, N.; Han, B. Learning multi-domain convolutional neural networks for visual tracking. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 4293–4302. [Google Scholar]
- Gao, H.; Liu, Z.; van der Maaten, L.; Weinberger, Q.K. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Kim, D.H.; Baddar, W.J.; Jang, J.; Ro, Y.M. Multi-Objective Based Spatio-Temporal Feature Representation Learning Robust to Expression Intensity Variations for Facial Expression Recognition. IEEE Trans. Affect. Comput. 2017, 10, 223–236. [Google Scholar] [CrossRef]
- Khor, H.-Q.; See, J.; Phan, R.C.W.; Lin, W. Enriched Long-term Recurrent Convolutional Network for Facial Micro-Expression Recognition. In Proceedings of the 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), Xi’an, China, 15–19 May 2018; pp. 667–674. [Google Scholar]
- Jianfeng, Z.; Mao, X.; Chen, L. Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomed. Signal Process. Control 2019, 47, 312–323. [Google Scholar]
- Hailun, X.; Zhang, L.; Lim, C.P. Evolving CNN-LSTM models for time series prediction using enhanced grey wolf optimizer. IEEE Access 2020, 8, 161519–161541. [Google Scholar]
- Ayuni, M.N.; Zulkifley, M.A.; Ibrahim, A.A.; Aouache, M. Optimal training configurations of a CNN-LSTM-based tracker for a fall frame detection system. Sensors 2021, 21, 6485. [Google Scholar]
- Shaheen, S.; El-Hajj, W.; Hajj, H.; Elbassuoni, S. Emotion Recognition from Text Based on Automatically Generated Rules. In Proceedings of the 2014 IEEE International Conference on Data Mining Workshop, Shenzhen, China, 14 December 2014; pp. 383–392. [Google Scholar]
- Erenel, Z.; Adegboye, O.R.; Kusetogullari, H. A New Feature Selection Scheme for Emotion Recognition from Text. Appl. Sci. 2020, 10, 5351. [Google Scholar] [CrossRef]
- Peng, M.; Wang, C.; Chen, T.; Liu, G.; Fu, X. Dual Temporal Scale Convolutional Neural Network for Micro-Expression Recognition. Front. Psychol. 2017, 8, 1745. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Fischer, P.; Ilg, E.; Hausser, P.; Hazirbas, C.; Golkov, V.; van der Smagt, P.; Cremers, D.; Brox, T. Flownet: Learning Optical Flow with Convolutional Networks. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 2758–2766. [Google Scholar]
- Li, J.; Wang, Y.; See, J.; Liu, W. Micro-expression recognition based on 3D flow convolutional neural network. Pattern Anal. Appl. 2019, 22, 1331–1339. [Google Scholar] [CrossRef]
- Li, X.; Hong, X.; Moilanen, A.; Huang, X.; Pfister, T.; Zhao, G.; Pietikäinen, M. Towards Reading Hidden Emotions: A Comparative Study of Spontaneous Micro-Expression Spotting and Recognition Methods. IEEE Trans. Affect. Comput. 2018, 9, 563–577. [Google Scholar] [CrossRef] [Green Version]
- Kumar, A.J.R.; Theagarajan, R.; Peraza, O.; Bhanu, B. Classification of facial micro-expressions using motion magnified emotion avatar images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Long Beach, CA, USA, 16–20 June 2019; pp. 12–20. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. In Computer Vision—ECCV 2014. ECCV 2014; Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T., Eds.; Lecture Notes in Computer Science; Springer: Cham, Germeny, 2014; Volume 8691. [Google Scholar]
- Abdani, S.R.; Zulkifley, M.A.; Zulkifley, N.H. Analysis of Spatial Pyramid Pooling Variations in Semantic Segmentation for Satellite Image Applications. In Proceedings of the 2021 International Conference on Decision Aid Sciences and Application (DASA), Sakheer, Bahrain, 7–8 December 2021; pp. 397–401. [Google Scholar]
- Shi, L.; Zhou, Z.; Guo, Z. Face Anti-Spoofing Using Spatial Pyramid Pooling. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 2126–2133. [Google Scholar]
- Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Qiu, Y.; Liu, Y.; Chen, Y.; Zhang, J.; Zhu, J.; Xu, J. A2SPPNet: Attentive Atrous Spatial Pyramid Pooling Network for Salient Object Detection. IEEE Trans. Multimed. 2022, 1. [Google Scholar] [CrossRef]
- Stofa, M.M.; Zulkifley, M.A.; Zainuri, M.A.A.M.; Ibrahim, A.A. U-Net with Atrous Spatial Pyramid Pooling for Skin Lesion Segmentation. In Proceedings of the 6th International Conference on Electrical, Control and Computer Engineering, Pahang, Malaysia, 23 August 2021; Md. Zain, Z., Sulaiman, M.H., Mohamed, A.I., Bakar, M.S., Ramli, M.S., Eds.; Lecture Notes in Electrical Engineering; Springer: Singapore, 2022; Volume 842. [Google Scholar]
- Artacho, B.; Savakis, A. Waterfall atrous spatial pooling architecture for efficient semantic segmentation. Sensors 2019, 19, 5361. [Google Scholar] [CrossRef] [Green Version]
- Stofa, M.; Zulkifley, M.A.; Zainuri, M.A.A.M.; Moubark, A.M. DenseNet with Atrous Spatial Pyramid Pooling for Skin Lesion Classification. In Proceedings of the 11th International Conference on Robotics, Vision, Signal Processing and Power Applications, Penang, Malaysia, 5–6 April 2021; Mahyuddin, N.M., Mat Noor, N.R., Mat Sakim, H.A., Eds.; Lecture Notes in Electrical Engineering. Springer: Singapore, 2022; Volume 829. [Google Scholar]
- Yan, W.J.; Li, X.; Wang, S.J.; Zhao, G.; Liu, Y.J.; Chen, Y.H.; Fu, X. CASME II: An Improved Spontaneous Micro-Expression Database and the Baseline Evaluation. PLoS ONE 2014, 9, e86041. [Google Scholar] [CrossRef]
- Davison, A.K.; Lansley, C.; Costen, N.; Tan, K.; Yap, M.H. SAMM: A Spontaneous Micro-Facial Movement Datase. IEEE Trans. Affect. Comput. 2018, 9, 116–129. [Google Scholar] [CrossRef] [Green Version]
- Li, X.; Pfister, T.; Huang, X.; Zhao, G.; Pietikainen, M. A Spontaneous Micro-Expression Database: Inducement, collection and baseline. In Proceedings of the 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), Shanghai, China, 22–26 April 2013. [Google Scholar]
- Chen, L.-C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Types of Emotion | Combined | CASME II | SAMM | SMIC |
---|---|---|---|---|
Positive | 109 | 32 | 26 | 51 |
Negative | 250 | 88 | 92 | 70 |
Surprise | 83 | 25 | 14 | 43 |
Total | 441 | 145 | 132 | 164 |
Layer | Size of Kernel | Stride | Padding | Size of Output | Activation Function |
---|---|---|---|---|---|
Conv1 | 7 × 7 | 1 | 1 | 96 × 69 × 69 | ReLu |
Conv2 | 5 × 5 | 1 | 1 | 256 × 65 × 65 | ReLu |
Conv3 | 3 × 3 | 1 | 0 | 512 × 65 × 65 | ReLu |
Pool3 | 3 × 3 | 2 | 1 | 512 × 32 × 32 | - |
Conv4 | 3 × 3 | 1 | 0 | 512 × 32 × 32 | ReLu |
Pool4 | 3 × 3 | 2 | 1 | 512 × 16 × 16 | - |
Conv5 | 3 × 3 | 1 | 0 | 512 × 16 × 16 | ReLu |
Pool5 | 3 × 3 | 2 | 1 | 512 × 8 × 8 | - |
FC1 | - | - | - | 128 | ReLu |
FC2 | - | - | - | 128 | ReLu |
FC3 | - | - | - | 3 | Softmax |
SPP Model | Number of Parallel Paths | Maximum Kernel Size | Position |
---|---|---|---|
I | 2 SPP | 4 × 4 | After Conv1 |
II | 3 SPP | 6 × 6 | After Conv1 |
III | 4 SPP | 8 × 8 | After Conv1 |
IV | 5 SPP | 10 × 10 | After Conv1 |
V | 2 SPP | 4 × 4 | After Conv2 |
VI | 3 SPP | 6 × 6 | After Conv2 |
VII | 4 SPP | 8 × 8 | After Conv2 |
VIII | 5 SPP | 10 × 10 | After Conv2 |
ASPP Model | Number of Parallel Paths | Maximum Dilation Rate | Position |
---|---|---|---|
I | 2 ASPP | 2 | After Conv1 |
II | 3 ASPP | 3 | After Conv1 |
III | 4 ASPP | 4 | After Conv1 |
IV | 5 ASPP | 5 | After Conv1 |
V | 2 ASPP | 2 | After Conv2 |
VI | 3 ASPP | 3 | After Conv2 |
VII | 4 ASPP | 4 | After Conv2 |
VIII | 5 ASPP | 5 | After Conv2 |
Hyperparameter | Type/Value | Function |
---|---|---|
Optimizer | Adam [37] | Update parameters such as weights and learning rates to reduce losses |
Learning rate | 0.0001 | Update weights |
Batch size | 32 | Number of samples taken to update model parameters |
Number of training samples | Leave-One-Subject-Out (LOSO) | The combined number of samples used |
Types of Datasets | Accuracy (%) | ||||||||
---|---|---|---|---|---|---|---|---|---|
Original (Without SPP Module) | Types of SPP Model | ||||||||
I | II | III | IV | V | VI | VII | VIII | ||
Combined | 77.48 | 77.63 | 77.32 | 77.63 | 77.48 | 77.93 | 78.23 | 77.48 | 79.59 |
CASME II | 88.51 | 91.26 | 87.59 | 88.51 | 88.51 | 89.43 | 87.13 | 91.26 | 89.89 |
SAMM | 67.68 | 67.17 | 69.7 | 71.21 | 70.2 | 69.7 | 72.73 | 67.68 | 73.23 |
SMIC | 75.61 | 73.98 | 74.39 | 73.17 | 73.58 | 74.39 | 74.8 | 73.17 | 75.61 |
Types of Datasets | F1 Score | ||||||||
---|---|---|---|---|---|---|---|---|---|
Original (Without SPP Module) | Types of SPP Model | ||||||||
I | II | III | IV | V | VI | VII | VIII | ||
Combined | 0.6621 | 0.6644 | 0.6599 | 0.6644 | 0.6621 | 0.6689 | 0.6735 | 0.6621 | 0.6939 |
CASME II | 0.8276 | 0.869 | 0.8138 | 0.8276 | 0.8276 | 0.8414 | 0.8069 | 0.869 | 0.8483 |
SAMM | 0.5152 | 0.5076 | 0.5455 | 0.5682 | 0.553 | 0.5455 | 0.5909 | 0.5152 | 0.5985 |
SMIC | 0.6441 | 0.6098 | 0.6159 | 0.5976 | 0.6037 | 0.6159 | 0.622 | 0.5976 | 0.6341 |
Types of Datasets | Accuracy (%) | ||||||||
---|---|---|---|---|---|---|---|---|---|
Original (Without ASPP Module) | Types of ASPP Model | ||||||||
I | II | III | IV | V | VI | VII | VIII | ||
Combined | 77.48 | 76.11 | 77.02 | 76.11 | 78.53 | 79.14 | 78.08 | 77.48 | 77.63 |
CASME II | 88.51 | 89.89 | 89.97 | 86.67 | 89.42 | 88.97 | 90.8 | 90.8 | 87.59 |
SAMM | 67.68 | 66.16 | 70.71 | 70.71 | 71.21 | 73.74 | 69.19 | 70.2 | 70.2 |
SMIC | 75.61 | 71.95 | 71.54 | 71.14 | 74.8 | 74.8 | 73.98 | 71.54 | 74.8 |
Types of Datasets | F1 Score | ||||||||
---|---|---|---|---|---|---|---|---|---|
Original (Without ASPP Module) | Types of ASPP Model | ||||||||
I | II | III | IV | V | VI | VII | VIII | ||
Combined | 0.6621 | 0.6417 | 0.6553 | 0.6417 | 0.678 | 0.6871 | 0.6712 | 0.6621 | 0.6644 |
CASME II | 0.8276 | 0.8483 | 0.8345 | 0.80 | 0.8414 | 0.8345 | 0.8621 | 0.8621 | 0.8138 |
SAMM | 0.5152 | 0.4924 | 0.5606 | 0.5606 | 0.5682 | 0.6061 | 0.5379 | 0.553 | 0.553 |
SMIC | 0.6441 | 0.5793 | 0.5732 | 0.5671 | 0.622 | 0.622 | 0.6098 | 0.5732 | 0.622 |
Types of Datasets | Accuracy (%) | ||
---|---|---|---|
Original (Without SPP Module) | DSPP-Net | WSPP-Net | |
Combined | 77.48 | 77.93 | 80.20 |
CASME II | 88.51 | 89.43 | 92.18 |
SAMM | 67.68 | 69.7 | 72.73 |
SMIC | 75.61 | 74.39 | 75.61 |
Types of Datasets | Accuracy (%) | ||
---|---|---|---|
Original (Without ASPP Module) | DASPP-Net | WASPP-Net | |
Combined | 77.48 | 78.08 | 80.50 |
CASME II | 88.51 | 90.8 | 92.18 |
SAMM | 67.68 | 69.19 | 71.21 |
SMIC | 75.61 | 73.98 | 77.64 |
Type of Architecture | Training Time Per Subject (s) | Execution Time (Frames Per Second) |
---|---|---|
Original (Without SPP/ASPP Module) | 520 | 418 |
DSPP-Net | 431 | 510 |
WSPP-Net | 370 | 591 |
DASPP-Net | 548 | 400 |
WASPP-Net | 447 | 460 |
Method | Accuracy (%) | F1-Score |
---|---|---|
VGG-M | 72.34 | 0.5850 |
DualInception | 73.09 | 0.5964 |
AlexNet | 75.51 | 0.6327 |
STSTNet | 77.48 | 0.6621 |
OffApexNet | 78.38 | 0.6757 |
WASPP-Net | 80.50 | 0.7075 |
Types of Models | Number of Parameter |
---|---|
DSPP-Net | 8,378,659 |
WSPP-Net | 8,231,203 |
DASPP-Net | 8,378,659 |
WASPP-Net | 8,117,794 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Stofa, M.M.; Zulkifley, M.A.; Zainuri, M.A.A.M. Micro-Expression-Based Emotion Recognition Using Waterfall Atrous Spatial Pyramid Pooling Networks. Sensors 2022, 22, 4634. https://doi.org/10.3390/s22124634
Stofa MM, Zulkifley MA, Zainuri MAAM. Micro-Expression-Based Emotion Recognition Using Waterfall Atrous Spatial Pyramid Pooling Networks. Sensors. 2022; 22(12):4634. https://doi.org/10.3390/s22124634
Chicago/Turabian StyleStofa, Marzuraikah Mohd, Mohd Asyraf Zulkifley, and Muhammad Ammirrul Atiqi Mohd Zainuri. 2022. "Micro-Expression-Based Emotion Recognition Using Waterfall Atrous Spatial Pyramid Pooling Networks" Sensors 22, no. 12: 4634. https://doi.org/10.3390/s22124634
APA StyleStofa, M. M., Zulkifley, M. A., & Zainuri, M. A. A. M. (2022). Micro-Expression-Based Emotion Recognition Using Waterfall Atrous Spatial Pyramid Pooling Networks. Sensors, 22(12), 4634. https://doi.org/10.3390/s22124634