Automatic Segmentation of Sign Language into Subtitle-Units

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12536))

Included in the following conference series:

European Conference on Computer Vision

2074 Accesses
7 Citations

Abstract

We present baseline results for a new task of automatic segmentation of Sign Language video into sentence-like units. We use a corpus of natural Sign Language video with accurately aligned subtitles to train a spatio-temporal graph convolutional network with a BiLSTM on 2D skeleton data to automatically detect the temporal boundaries of subtitles. In doing so, we segment Sign Language video into subtitle-units that can be translated into phrases in a written language. We achieve a ROC-AUC statistic of 0.87 at the frame level and 92% label accuracy within a time margin of 0.6s of the true labels.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Unsupervised extraction of phonetic units in sign language videos for natural language processing

Article 25 June 2022

LSA-T: The First Continuous Argentinian Sign Language Dataset for Sign Language Translation

Automatic Dense Annotation of Large-Vocabulary Sign Language Videos

Notes

1.
There are rare video segments where a hearing person is interviewed and this interview is translated into SL.
2.
https://github.com/hannahbull/clean_op_data_sl.

References

Belissen, V., Braffort, A., Gouiffès, M.: Dicta-Sign-LSF-v2: remake of a continuous French sign language dialogue corpus and a first baseline for automatic sign language processing. In: Proceedings of the Twelfth International Conference on Language Resources and Evaluation (LREC 2020), pp. 6040–6048. European Language Resource Association (ELRA), Marseille, France, May 2020
Google Scholar
Börstell, C., Mesch, J., Wallin, L.: Segmenting the Swedish sign language corpus: on the possibilities of using visual cues as a basis for syntactic segmentation. In: Beyond the Manual Channel. Proceedings of the 6th Workshop on the Representation and Processing of Sign Languages, pp. 7–10 (2014)
Google Scholar
Bragg, D., et al.: Sign language recognition, generation, and translation: an interdisciplinary perspective. In: The 21st International ACM SIGACCESS Conference on Computers and Accessibility, pp. 16–31 (2019)
Google Scholar
Bull, H., Braffort, A., Gouiffès, M.: MEDIAPI-SKEL - a 2D-skeleton video database of French sign language with aligned French subtitles. In: Proceedings of the Twelfth International Conference on Language Resources and Evaluation (LREC 2020), pp. 6063–6068. European Language Resource Association (ELRA), Marseille, France, May 2020
Google Scholar
Chan, T., Zhu, W.: Level set based shape prior segmentation. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 2, pp. 1164–1170. IEEE (2005)
Google Scholar
Crasborn, O.A.: How to recognise a sentence when you see one. Sign Lang. Linguist. 10(2), 103–111 (2007)
Article Google Scholar
De Beaugrande, R.: Sentence first, verdict afterwards: on the remarkable career of the “sentence”. Word 50(1), 1–31 (1999)
Article Google Scholar
Dreuw, P., Ney, H.: Towards automatic sign language annotation for the ELAN tool. In: Proceedings of the Third LREC Workshop on Representation and Processing of Sign Languages, pp. 50–53. European Language Resource Association (ELRA), Marrakech, Morocco, May 2008
Google Scholar
Fenlon, J., Denmark, T., Campbell, R., Woll, B.: Seeing sentence boundaries. Sign Lang. Linguist. 10(2), 177–200 (2007)
Article Google Scholar
Filhol, M., Hadjadj, M.N., Choisier, A.: Non-manual features: the right to indifference. In: 6th Workshop on the Representation and Processing of Sign Languages: Beyond the Manual Channel. Satellite Workshop to the 9th International Conference on Language Resources and Evaluation (LREC 2014), pp. 49–54 (2014)
Google Scholar
Huang, J., Zhou, W., Zhang, Q., Li, H., Li, W.: Video-based sign language recognition without temporal segmentation. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Ko, S.K., Kim, C.J., Jung, H., Cho, C.: Neural sign language translation based on human keypoint estimation. Appl. Sci. 9(13), 2683 (2019)
Article Google Scholar
Kolář, J., Lamel, L.: Development and evaluation of automatic punctuation for French and English speech-to-text. In: Thirteenth Annual Conference of the International Speech Communication Association (2012)
Google Scholar
Koller, O., Zargaran, S., Ney, H.: Re-sign: re-aligned end-to-end sequence modelling with deep recurrent CNN-HMMS. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3416–3424, July 2017. https://doi.org/10.1109/CVPR.2017.364
Sundermeyer, M., Ney, H., Schlüter, R.: From feedforward to recurrent LSTM neural networks for language modeling. IEEE/ACM Trans. Audio Speech Lang. Process. 23(3), 517–529 (2015)
Article Google Scholar
Veksler, O.: Star shape prior for graph-cut image segmentation. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5304, pp. 454–467. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88690-7_34
Chapter Google Scholar
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-second AAAI Conference on Artificial Intelligence (2018)
Google Scholar

Download references

Acknowledgements

This work has been partially funded by the ROSETTA project, financed by the French Public Investment Bank (Bpifrance). Additionally, we thank Média-Pi ! for providing the data and for the useful discussions on this idea.

Author information

Authors and Affiliations

LIMSI-CNRS, Campus universitaire 507, Rue du Belvedère, 91405, Orsay, France
Hannah Bull, Michèle Gouiffès & Annelies Braffort
University of Paris-Saclay, Route de l’Orme aux Merisiers - RD 128, 91190, Saint-Aubin, France
Hannah Bull & Michèle Gouiffès

Authors

Hannah Bull
View author publications
You can also search for this author in PubMed Google Scholar
Michèle Gouiffès
View author publications
You can also search for this author in PubMed Google Scholar
Annelies Braffort
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hannah Bull .

Editor information

Editors and Affiliations

University of Clermont Auvergne, Clermont Ferrand, France
Adrien Bartoli
Università degli Studi di Udine, Udine, Italy
Andrea Fusiello

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bull, H., Gouiffès, M., Braffort, A. (2020). Automatic Segmentation of Sign Language into Subtitle-Units. In: Bartoli, A., Fusiello, A. (eds) Computer Vision – ECCV 2020 Workshops. ECCV 2020. Lecture Notes in Computer Science(), vol 12536. Springer, Cham. https://doi.org/10.1007/978-3-030-66096-3_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-66096-3_14
Published: 03 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66095-6
Online ISBN: 978-3-030-66096-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Automatic Segmentation of Sign Language into Subtitle-Units

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Unsupervised extraction of phonetic units in sign language videos for natural language processing

LSA-T: The First Continuous Argentinian Sign Language Dataset for Sign Language Translation

Automatic Dense Annotation of Large-Vocabulary Sign Language Videos

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Automatic Segmentation of Sign Language into Subtitle-Units

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Unsupervised extraction of phonetic units in sign language videos for natural language processing

LSA-T: The First Continuous Argentinian Sign Language Dataset for Sign Language Translation

Automatic Dense Annotation of Large-Vocabulary Sign Language Videos

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation