KSRB-Net: a continuous sign language recognition deep learning strategy based on motion perception mechanism

Feng Xiao^1,2,
Yunrui Zhu^1,2,
Ruyu Liu³,
Jianhua Zhang ORCID: orcid.org/0000-0001-7844-6035^1,2 &
…
Shengyong Chen^1,2

315 Accesses
Explore all metrics

Abstract

Continuous sign language recognition (CSLR) is an intricate task aimed at transcribing sign language sequences from continuous video streams into sentences. Typically, deep learning-based CSLR systems are composed of a visual input encoder for feature extraction and a sequence learning model for the corresponding relationship between the input sequence and output sentence-level labels. The complex nature of sign language, characterized by an extensive vocabulary and many similar gestures and motions, renders CSLR particularly challenging. Additionally, the unsupervised nature of CSLR due to the unavailability of signing glosses for alignment necessitates detailed labeling of each word in a sentence, thereby limiting the amount of training data available. In this paper, we proposes a CSLR framework named KSRB-Net to address these critical problems. The proposed method incorporates a practical module that efficiently captures frame-wise motion information and spatio-temporal context information, which can be embedded into existing feature extraction modules. Additionally, a keyframe extraction algorithm based on the characteristics of the sign language dataset is designed to significantly accelerate the model training and reduce the risk of overfitting. Finally, connectionist temporal classification is employed as the objective function to capture the alignment proposal. The proposed method is validated on three datasets, namely the Chinese TJUT-SLRT, the Chinese USTC-CSL, and the German RWTH-Phoenix-Weather-2014. Experiment results demonstrate that the KSRB-Net achieves 98.40% accuracy and outperforms state-of-the-art methods in terms of efficiency and accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 6

Chinese Sign Language Recognition with Sequence to Sequence Learning

Fully Convolutional Networks for Continuous Sign Language Recognition

Sign Language Recognition

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability statement

The datasets used in this article are available on http://home.ustc.edu.cn/~pjh/openresources/cslr-dataset-2015/index.html, https://www-i6.informatik.rwth-aachen.de/~koller/RWTH-PHOENIX/ and http://ylr.tjut.edu.cn/kxyj.htm.

References

Pu, J., Zhou, W., Li, H.: Iterative alignment network for continuous sign language recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4165–4174 (2019)
Koller, O., Forster, J., Ney, H.: Continuous sign language recognition: towards large vocabulary statistical recognition systems handling multiple signers. Comput. Vis. Image Underst. 141, 108–125 (2015)
Article Google Scholar
Yuan, T.: TJUT Sign Language Recognition and Translation Dataset. (2020). http://ylr.tjut.edu.cn/kxyj.htm
Yang, Q., Jagannathan, S., Sun, Y.: Robust integral of neural network and error sign control of MIMO nonlinear systems. IEEE Trans. Neural Netw. Learn. Syst. 26(12), 3278–3286 (2015)
Article MathSciNet Google Scholar
Li, C., Xie, C., Zhang, B., Han, J., Zhen, X., Chen, J.: Memory attention networks for skeleton-based action recognition. IEEE Trans. Neural Netw. Learn. Syst. 33(9), 4800–4814 (2021)
Article Google Scholar
Slimane, F.B., Bouguessa, M.: Context matters: Self-attention for sign language recognition. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 7884–7891 (2021). IEEE
Xiao, F., Liu, R., Yuan, T., Fan, Z., Wang, J., Zhang, J.: Slrformer: Continuous sign language recognition based on vision transformer. In: 2022 10th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW), pp. 1–7 (2022). IEEE Computer Society
Liu, T., Zhou, W., Li, H.: Sign language recognition with long short-term memory. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 2871–2875 (2016). IEEE
Huang, J., Zhou, W., Zhang, Q., Li, H., Li, W.: Video-based sign language recognition without temporal segmentation. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Guo, D., Zhou, W., Wang, M., Li, H.: Sign language recognition based on adaptive hmms with data augmentation. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 2876–2880 (2016). IEEE
Zhang, J., Zhou, W., Xie, C., Pu, J., Li, H.: Chinese sign language recognition with adaptive hmm. In: 2016 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2016). IEEE
Guo, D., Zhou, W., Li, H., Wang, M.: Online early-late fusion based on adaptive hmm for sign language recognition. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 14(1), 1–18 (2017)
Koller, O., Zargaran, S., Ney, H.: Re-sign: Re-aligned end-to-end sequence modelling with deep recurrent cnn-hmms. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4297–4305 (2017)
Koller, O., Zargaran, O., Ney, H., Bowden, R.: Deep sign: Hybrid cnn-hmm for continuous sign language recognition. In: Proceedings of the British Machine Vision Conference 2016 (2016)
Cihan Camgoz, N., Hadfield, S., Koller, O., Bowden, R.: Subunets: End-to-end hand shape and continuous sign language recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3056–3065 (2017)
Cui, R., Liu, H., Zhang, C.: Recurrent convolutional neural networks for continuous sign language recognition by staged optimization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7361–7369 (2017)
Pu, J., Zhou, W., Li, H.: Dilated convolutional network with iterative optimization for continuous sign language recognition. In: IJCAI, vol. 3, p. 7 (2018)
de Amorim, C.C., Macêdo, D., Zanchettin, C.: Spatial-temporal graph convolutional networks for sign language recognition. In: International Conference on Artificial Neural Networks, pp. 646–657 (2019). Springer
Yin, K.: Sign language translation with transformers. arXiv preprint arXiv:2004.005882 (2020)
Camgoz, N.C., Koller, O., Hadfield, S., Bowden, R.: Sign language transformers: Joint end-to-end sign language recognition and translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10023–10033 (2020)
Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1933–1941 (2016)
Li, Y., Ji, B., Shi, X., Zhang, J., Kang, B., Wang, L.: Tea: Temporal excitation and aggregation for action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 909–918 (2020)
Zhang, L., Zhu, G., Shen, P., Song, J., Afaq Shah, S., Bennamoun, M.: Learning spatiotemporal features using 3dcnn and convolutional lstm for gesture recognition. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 3120–3128 (2017)
Wu, D., Pigou, L., Kindermans, P.-J., Le, N.D.-H., Shao, L., Dambre, J., Odobez, J.-M.: Deep dynamic neural networks for multimodal gesture segmentation and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 38(8), 1583–1597 (2016)
Article Google Scholar
Molchanov, P., Yang, X., Gupta, S., Kim, K., Tyree, S., Kautz, J.: Online detection and classification of dynamic hand gestures with recurrent 3d convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4207–4215 (2016)
Xiao, F., Shen, C., Yuan, T., Chen, S.: CRB-Net: A sign language recognition deep learning strategy based on multi-modal fusion with attention mechanism. In: SMC, p. (2021)
Chiu, C.-C., Sainath, T.N., Wu, Y., Prabhavalkar, R., Nguyen, P., Chen, Z., Kannan, A., Weiss, R.J., Rao, K., Gonina, E., et al: State-of-the-art speech recognition with sequence-to-sequence models. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4774–4778 (2018). IEEE
Liao, M., Zhang, J., Wan, Z., Xie, F., Liang, J., Lyu, P., Yao, C., Bai, X.: Scene text recognition from two-dimensional perspective. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8714–8721 (2019)
Meng, F., Zhang, J.: Dtmt: A novel deep transition architecture for neural machine translation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 224–231 (2019)
Jiang, B., Wang, M., Gan, W., Wu, W., Yan, J.: Stm: Spatiotemporal and motion encoding for action recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2000–2009 (2019)
Ning, G., Zhang, Z., Huang, C., Ren, X., Wang, H., Cai, C., He, Z.: Spatially supervised recurrent convolutional neural networks for visual object tracking. In: 2017 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–4 (2017). IEEE
Zhou, H., Zhou, W., Zhou, Y., Li, H.: Spatial-temporal multi-cue network for continuous sign language recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 13009–13016 (2020)
Cui, R., Liu, H., Zhang, C.: A deep neural framework for continuous sign language recognition by iterative training. IEEE Trans. Multimed. 21(7), 1880–1891 (2019)
Article Google Scholar
Cheng, K.L., Yang, Z., Chen, Q., Tai, Y.-W.: Fully convolutional networks for continuous sign language recognition. In: European Conference on Computer Vision, pp. 697–714 (2020). Springer
Pu, J., Zhou, W., Hu, H., Li, H.: Boosting continuous sign language recognition via cross modality augmentation. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 1497–1505 (2020)
Koller, O., Camgoz, N.C., Ney, H., Bowden, R.: Weakly supervised learning with multi-stream CNN-LSTM-HMMS to discover sequential parallelism in sign language videos. IEEE Trans. Pattern Anal. Mach. Intell. 42(9), 2306–2320 (2019)
Article Google Scholar
Guo, D., Zhou, W., Li, H., Wang, M.: Hierarchical LSTM for sign language translation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376 (2006)
Venugopalan, S., Rohrbach, M., Donahue, J., Mooney, R., Darrell, T., Saenko, K.: Sequence to sequence-video to text. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4534–4542 (2015)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). PMLR

Download references

Author information

Authors and Affiliations

Tianjin University of Technology, No. 391 Bin Shui Xi Dao Road, Xiqing District, Tianjin, 300384, People’s Republic of China
Feng Xiao, Yunrui Zhu, Jianhua Zhang & Shengyong Chen
Engineering Research Center of Learning-Based Intelligent System, Ministry of Education, Tianjin, 300384, People’s Republic of China
Feng Xiao, Yunrui Zhu, Jianhua Zhang & Shengyong Chen
Hangzhou Normal University, Hangzhou, 311121, People’s Republic of China
Ruyu Liu

Authors

Feng Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Yunrui Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Ruyu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jianhua Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shengyong Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianhua Zhang.

Ethics declarations

Conflicts of interest

The authors declared that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Xiao, F., Zhu, Y., Liu, R. et al. KSRB-Net: a continuous sign language recognition deep learning strategy based on motion perception mechanism. Vis Comput 40, 7845–7858 (2024). https://doi.org/10.1007/s00371-023-03211-3

Download citation

Accepted: 23 November 2023
Published: 26 December 2023
Issue Date: November 2024
DOI: https://doi.org/10.1007/s00371-023-03211-3

KSRB-Net: a continuous sign language recognition deep learning strategy based on motion perception mechanism

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Chinese Sign Language Recognition with Sequence to Sequence Learning

Fully Convolutional Networks for Continuous Sign Language Recognition

Sign Language Recognition

Data availability statement

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

KSRB-Net: a continuous sign language recognition deep learning strategy based on motion perception mechanism

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Chinese Sign Language Recognition with Sequence to Sequence Learning

Fully Convolutional Networks for Continuous Sign Language Recognition

Sign Language Recognition

Explore related subjects

Data availability statement

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now