IRDC-Net: An Inception Network with a Residual Module and Dilated Convolution for Sign Language Recognition Based on Surface Electromyography
<p>Block diagram of the proposed framework for hand gesture recognition.</p> "> Figure 2
<p>The ten Chinese sign language signs considered in this study. (<b>a</b>) Wait (等), (<b>b</b>) I (我), (<b>c</b>) you (你), (<b>d</b>) ring up (打电话), (<b>e</b>) OK (好的), (<b>f</b>) good (好), (<b>g</b>) goodbye (再见), (<b>h</b>) have (有), (<b>i</b>) morning (早上), (<b>j</b>) hello (你好). Note: this sign for “have” refers to somebody owning something.</p> "> Figure 3
<p>Placement positions of the electrodes. (<b>a</b>) The anterior group of antebrachial muscles and (<b>b</b>) the posterior group of antebrachial muscles. Blue solid circles denote the two Ag/AgCl electrodes corresponding to one channel.</p> "> Figure 4
<p>Illustration of data segmentation. Step 1 and Step 2 describe the 7−channel sEMG signal segmentation process of one sign language movement, Step 3 describes the process of constructing the time domain (Step 3.1) and the time−frequency domain (Step 3.2) datasets. The orange box in Step 3.1 represents the seven−channel sEMG signal of Repeat 1.</p> "> Figure 5
<p>The framework of the proposed IRDC-net. (<b>a</b>) IRDC-net framework and (<b>b</b>) STEM block, and (<b>c</b>) the Inception-ResNet A block and (<b>d</b>) the Inception-ResNet B block. The form of the convolutional layer was “number of filters, kernel sizes”.</p> "> Figure 6
<p>Representation of the receptive field.</p> "> Figure 7
<p>Residual block of IRDC-net.</p> "> Figure 8
<p>Representation of dilated convolution. The “size” at the bottom indicates the receptive field size. The blue box denotes the convolution kernel. The blue-grey box denotes the convolution results. A dilation rate of 1 means that the convolution kernel was still a standard kernel.</p> "> Figure 9
<p>Model structures for the ablation experiments. (<b>a</b>–<b>c</b>) show Experiment 1, Experiment 2, and Experient 3 of <a href="#sensors-23-05775-t005" class="html-table">Table 5</a>, respectively. The colour of each module is consistent with <a href="#sensors-23-05775-f005" class="html-fig">Figure 5</a>.</p> "> Figure 10
<p>Confusion matrix of IRDC-net for Mydata. (<b>a</b>) The results of the time domain signal and (<b>b</b>) of the time–frequency domain signal.</p> "> Figure 11
<p>Loss curve for the ablation experiments. (<b>a</b>,<b>c</b>) Training curves of Mydata and Ninapro DB1, respectively; (<b>b</b>,<b>d</b>) validation curves of Mydata and Ninapro DB1, respectively.</p> "> Figure A1
<p>The Inception-V1-based model. We stacked two Inception modules, and N1–N6 were the number of convolution filters. For the first Inception module, N1–N6 were set to [64, 96, 128, 16, 32, and 32], respectively; for the second Inception module, N1–N6 were set to [128, 128, 192, 32, 96, and 64], respectively.</p> "> Figure A2
<p>The Inception-V2- and V3-based models. The two model share the architecture of Inception A, Inception B, and Inception C. The colour of each module is consistent with <a href="#sensors-23-05775-f0A1" class="html-fig">Figure A1</a>.</p> ">
Abstract
:1. Introduction
- (1)
- One-dimensional discrete Fourier transformation was used to transform the non-stational time domain sEMG signal into a time–frequency domain, enhancing the characteristics of the time domain signal and further improving the accuracy of the SLR task.
- (2)
- A novel Inception architecture with a residual module and dilated convolution (IRDC-net) was proposed in this study; it was applied to SLR tasks for the first time. The IRDC-net enriched the sEMG feature map and enlarged the receptive field while avoiding model degradation, meaning that it is suitable for sEMG classification tasks with long-term dependent information.
- (3)
- The public dataset Ninapro DB1 [24] was used to test the generalization performance of our proposed model; the results showed that our methods led to better performance than other recent studies that utilized Ninapro DB1, indicating that IRDC-net can be applied to a wider range of SLR tasks.
2. Materials and Methods
2.1. sEMG Signal Acquisition
2.2. sEMG Signal Pre-Processing and Segmentation
2.3. Signal Matrix Transformation
2.4. Sign Language Recognition: IRDC-Net
2.4.1. Receptive Field
2.4.2. Inception Block
2.4.3. Residual Module
2.4.4. 1D Dilated Convolution
3. Experiments and Results
3.1. Datasets
3.2. Classification Performance with Different DFT Frame Lengths
3.3. Comparison of the Tandem Network Structure and the Parallel Structure
3.4. Comparison of the Inception-Related Networks and IRDC-Net
3.5. Results of the Public Dataset Ninapro DB1
3.6. Ablation Experiments
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
Hyper-Parameter | IRDC-Net | VGG-Net | ResNet-18 | Inception V1-Based | Inception V2-Based | Inception V3-Based |
---|---|---|---|---|---|---|
Batch size | 16 | 64 | 128 | 16 | 32 | 32 |
Epoch | 200 | 100 | 100 | 100 | 100 | 100 |
Optimizer | Adam | Adam | Adam | Adam | Adam | Adam |
Dropout rate | 0.8 | 0.8 | 0.8 | 0.8 | 0.8 | 0.8 |
λ of l1 regularization | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0001 |
λ of l2 regularization | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0001 |
Hyper-Parameter | IRDC-Net | VGG-Net | ResNet-18 | Inception V1-Based | Inception V2-Based | Inception V3-Based |
---|---|---|---|---|---|---|
Batch size | 16 | 16 | 128 | 16 | 32 | 32 |
Epoch | 200 | 100 | 100 | 100 | 100 | 100 |
Optimizer | Adam | Adam | Adam | Adam | Adam | Adam |
Dropout rate | 0.8 | 0.8 | 0.8 | 0.8 | 0.8 | 0.8 |
λ of l1 regularization | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0001 |
λ of l2 regularization | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0001 |
Hyper-Parameter | Value |
---|---|
Batch size | 64 |
Epoch | 300 |
Optimizer | Adam |
Dropout rate | 0.8 |
λ of l1 regularization | 0.0001 |
λ of l2 regularization | 0.0001 |
References
- Kamal, S.M.; Chen, Y.; Li, S.; Shi, X.; Zheng, J. Technical approaches to Chinese sign language processing: A review. IEEE Access 2019, 7, 96926–96935. [Google Scholar] [CrossRef]
- World Health Organization. World Report on Hearing; World Health Organization: Geneva, Switzerland, 2021. [Google Scholar]
- Li, X.; Zhang, X.; Tang, X.; Chen, M.; Chen, X.; Chen, X.; Liu, A. Decoding muscle force from individual motor unit activities using a twitch force model and hybrid neural networks. Biomed. Signal Process. Control 2022, 72, 103297. [Google Scholar] [CrossRef]
- Xie, B.; Meng, J.; Li, B.; Harland, A. Biosignal-based transferable attention Bi-ConvGRU deep network for hand-gesture recognition towards online upper-limb prosthesis control. Comput. Methods Programs Biomed. 2022, 224, 106999. [Google Scholar] [CrossRef] [PubMed]
- Tao, W.; Zhang, X.; Chen, X.; Wu, D.; Zhou, P. Multi-scale complexity analysis of muscle coactivation during gait in children with cerebral palsy. Front. Hum. Neurosci. 2015, 9, 367. [Google Scholar] [CrossRef] [Green Version]
- Li, Y.; Chen, X.; Zhang, X.; Wong, K.; Wang, Z. A sign-component-based framework for Chinese sign language recognition using accelerometer and sEMG data. IEEE Trans. Biomed. Eng. 2012, 59, 2695–2704. [Google Scholar] [PubMed]
- Savur, C.; Sahin, F. American Sign Language Recognition system by using surface EMG signal. In Proceedings of the 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Budapest, Hungary, 9–12 October 2016; pp. 002872–002877. [Google Scholar]
- Yuan, S.; Wang, Y.; Wang, X.; Deng, H.; Sun, S.; Wang, H.; Huang, P.; Li, G. Chinese sign language alphabet recognition based on random forest algorithm. In Proceedings of the 2020 IEEE International Workshop on Metrology for Industry 4.0 & IoT, Roma, Italy, 3–5 June 2020; pp. 340–344. [Google Scholar]
- Pereira-Montiela, E.; Pérez-Giraldoa, E.; Mazoa, J.; Orrego-Metautea, D.; Delgado-Trejosb, E.; Cuesta-Frau, D.; Murillo-Escobar, J. Automatic sign language recognition based on accelerometry and surface electromyography signals: A study for Colombian sign language. Biomed. Signal Process. Control 2022, 71, 103201. [Google Scholar] [CrossRef]
- Akilan, T.; Wu, Q.; Safaei, A.; Wei, J. A late fusion approach for harnessing multi-CNN model high-level features. In Proceedings of the 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Banff, AB, Canada, 5–8 October 2017; pp. 566–571. [Google Scholar]
- Wang, F.; Zhao, S.; Zhou, X.; Li, C.; Li, M.; Zeng, Z. A recognition–verification mechanism for real-time Chinese sign language recognition based on multi-information fusion. Sensors 2019, 19, 2495. [Google Scholar] [CrossRef] [Green Version]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Li, H.; Zhang, Y.; Cao, Q. MyoTac: Real-time recognition of Tactical sign language based on lightweight deep neural network. Wirel. Commun. Mob. Comput. 2022, 2022, 17. [Google Scholar] [CrossRef]
- Li, Y.; Yang, C. Multi time scale inception-time network for soft sensor of blast furnace ironmaking process. J. Process Control 2022, 118, 106–114. [Google Scholar] [CrossRef]
- Liu, J.; Wang, C.; He, B.; Li, P.; Wu, X. Metric Learning for Robust Gait Phase Recognition for a Lower Limb Exoskeleton Robot Based on sEMG. IEEE Trans. Med. Robot. Bionics 2022, 4, 472–479. [Google Scholar] [CrossRef]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 8–10 June 2015; pp. 1–9. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning (ICML), Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 2818–2826. [Google Scholar]
- Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A.A. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar]
- Wu, L.; Zhang, X.; Wang, K.; Chen, X.; Chen, X. Improved high-density myoelectric pattern recognition control against electrode shift using data augmentation and dilated convolutional neural network. IEEE Trans. Neural. Syst. Rehabil. Eng. 2020, 28, 2637–2646. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Agrawal, A.; Mittal, N. Using CNN for facial expression recognition: A study of the effects of kernel size and number of filters on accuracy. Vis. Comput. 2020, 36, 405–412. [Google Scholar] [CrossRef]
- Yu, F.; Koltun, V. Multi-scale context aggregation by dilated convolutions. arXiv 2015, arXiv:1511.07122. [Google Scholar]
- Atzori, M.; Gijsberts, A.; Castellini, C.; Caputo, B.; Hager, A.G.; Elsig, S.; Giatsidis, G.; Bassetto, F.; Müller, H. Electromyography data for non-invasive naturally-controlled robotic hand prostheses. Sci. Data 2014, 1, 1–13. [Google Scholar] [CrossRef] [Green Version]
- Jiang, Y.; Chen, C.; Zhang, X.; Chen, X.; Zhou, Y.; Ni, G.; Muh, S.; Lemos, S. Shoulder muscle activation pattern recognition based on sEMG and machine learning algorithms. Comput. Methods Programs Biomed. 2020, 197, 105721. [Google Scholar] [CrossRef]
- Xu, B.; Wang, N.; Chen, T.; Li, M. Empirical evaluation of rectified activations in convolutional network. arXiv 2015, arXiv:1505.00853. [Google Scholar]
- Glorot, X.; Bordes, A.; Bengio, Y. Deep sparse rectifier neural networks. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 11–13 April 2011; pp. 315–323. [Google Scholar]
- Olsson, A.E.; Björkman, A.; Antfolk, C. Automatic discovery of resource-restricted convolutional neural network topologies for myoelectric pattern recognition. Comput. Biol. Med. 2020, 120, 103723. [Google Scholar] [CrossRef]
- Wei, W.; Hong, H.; Wu, X. A hierarchical view pooling network for multichannel surface electromyography-based gesture recognition. Comput. Intell. Neurosci. 2021, 2021, 6591035. [Google Scholar] [CrossRef] [PubMed]
- Wang, H.; Zhang, Y.; Liu, C.; Liu, H. sEMG based hand gesture recognition with deformable convolutional network. Int. J. Mach. Learn. Cybern. 2022, 13, 1729–1738. [Google Scholar] [CrossRef]
- Zhang, Y.; Yang, F.; Fan, Q.; Yang, A.; Li, X. Research on sEMG-Based Gesture Recognition by Dual-View Deep Learning. IEEE Access 2022, 10, 32928–32937. [Google Scholar] [CrossRef]
- Xu, P.; Li, F.; Wang, H. A novel concatenate feature fusion RCNN architecture for sEMG-based hand gesture recognition. PLoS ONE 2022, 17, e0262810. [Google Scholar] [CrossRef] [PubMed]
- Wang, S.; Huang, L.; Jiang, D.; Sun, Y.; Jiang, G.; Li, J.; Zou, C.; Fan, H.; Xie, Y.; Xiong, H.; et al. Improved multi-stream convolutional block attention module for sEMG-based gesture recognition. Front. Bioeng. Biotechnol. 2022, 10, 909023. [Google Scholar] [CrossRef] [PubMed]
Frame Length | 25 ms | 50 ms | 100 ms | ||||||
---|---|---|---|---|---|---|---|---|---|
Metrics | Precision | Recall | f1 Score | Precision | Recall | f1 Score | Precision | Recall | f1 Score |
class 0 | 91.27% | 94.06% | 92.13% | 96.52% | 97.68% | 97.61% | 97.33% | 98.27% | 97.30% |
class 1 | 98.14% | 96.25% | 97.21% | 97.68% | 97.68% | 97.68% | 100.00% | 98.34% | 99.13% |
class 2 | 83.64% | 85.78% | 84.52% | 91.49% | 93.02% | 92.26% | 93.73% | 92.51% | 93.66% |
class 3 | 81.87% | 75.60% | 78.74% | 95.48% | 91.50% | 93.52% | 92.66% | 94.97% | 93.82% |
class 4 | 65.53% | 66.32% | 66.85% | 87.49% | 86.78% | 86.13% | 83.68% | 86.90% | 84.14% |
class 5 | 83.28% | 83.28% | 83.28% | 90.07% | 92.46% | 91.25% | 87.62% | 83.57% | 85.12% |
class 6 | 81.77% | 86.54% | 83.67% | 83.14% | 91.51% | 87.65% | 88.61% | 89.46% | 88.54% |
class 7 | 78.22% | 80.18% | 79.43% | 92.70% | 88.44% | 90.57% | 85.79% | 87.27% | 86.55% |
class 8 | 89.30% | 81.47% | 85.39% | 87.82% | 88.50% | 87.67% | 91.19% | 88.24% | 89.43% |
class 9 | 91.72% | 90.03% | 91.% | 99.38% | 92.47% | 95.42% | 92.50% | 91.42% | 91.46% |
Average | 84.47% | 83.95% | 84.22% | 92.18% | 92.00% | 91.98% | 91.31% | 91.10% | 90.92% |
Accuracy | ||
---|---|---|
Model | Time Domain | Time–Frequency Domain |
IRDC-net | 84.29% | 91.70% |
VGG-net | 67.33% | 84.29% |
ResNet-18 | 70.85% | 83.84% |
Accuracy | ||
---|---|---|
Model | Time Domain | Time–Frequency Domain |
IRDC-net | 84.29% | 91.70% |
Inception-V1-based model | 70.00% | 86.81% |
Inception-V2-based model | 78.29% | 87.11% |
Inception-V3-based model | 78.95% | 87.58% |
Years | CNN Models | Accuracy |
---|---|---|
2020 [28] | A CNN model with an evolutionary algorithm | 81.4 ± 4.0% |
2021 [29] | Hierarchical-view pooling network | 88.4% |
2022 [30] | Deformable convolutional network | 83.10% |
2022 [31] | Dual-view multiscale convolutional network | 86.72% |
2022 [32] | Concatenate feature fusion recurrent convolutional network | 88.87% |
2022 [33] | Multi-stream convolutional block attention module–gate recurrent unit | 89.70% |
Ours | IRDC-net | 89.82% |
Accuracy | |||
---|---|---|---|
Method | Mydata | Ninapro DB1 | |
Experiment 1 | Inception module (Figure 9a) | 81.67% | 75.99% |
Experiment 2 | Inception module + residual module (Figure 9b) | 87.01% | 87.83% |
Experiment 3 | Inception module + dilated convolution (Figure 9c) | 83.17% | 80.55% |
Experiment 4 | IRDC-net (Figure 5) | 91.70% | 89.82% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, X.; Tang, L.; Zheng, Q.; Yang, X.; Lu, Z. IRDC-Net: An Inception Network with a Residual Module and Dilated Convolution for Sign Language Recognition Based on Surface Electromyography. Sensors 2023, 23, 5775. https://doi.org/10.3390/s23135775
Wang X, Tang L, Zheng Q, Yang X, Lu Z. IRDC-Net: An Inception Network with a Residual Module and Dilated Convolution for Sign Language Recognition Based on Surface Electromyography. Sensors. 2023; 23(13):5775. https://doi.org/10.3390/s23135775
Chicago/Turabian StyleWang, Xiangrui, Lu Tang, Qibin Zheng, Xilin Yang, and Zhiyuan Lu. 2023. "IRDC-Net: An Inception Network with a Residual Module and Dilated Convolution for Sign Language Recognition Based on Surface Electromyography" Sensors 23, no. 13: 5775. https://doi.org/10.3390/s23135775