[go: up one dir, main page]

Skip to main content

Automatic Segmentation of Sign Language into Subtitle-Units

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 Workshops (ECCV 2020)

Abstract

We present baseline results for a new task of automatic segmentation of Sign Language video into sentence-like units. We use a corpus of natural Sign Language video with accurately aligned subtitles to train a spatio-temporal graph convolutional network with a BiLSTM on 2D skeleton data to automatically detect the temporal boundaries of subtitles. In doing so, we segment Sign Language video into subtitle-units that can be translated into phrases in a written language. We achieve a ROC-AUC statistic of 0.87 at the frame level and 92% label accuracy within a time margin of 0.6s of the true labels.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    There are rare video segments where a hearing person is interviewed and this interview is translated into SL.

  2. 2.

    https://github.com/hannahbull/clean_op_data_sl.

References

  1. Belissen, V., Braffort, A., Gouiffès, M.: Dicta-Sign-LSF-v2: remake of a continuous French sign language dialogue corpus and a first baseline for automatic sign language processing. In: Proceedings of the Twelfth International Conference on Language Resources and Evaluation (LREC 2020), pp. 6040–6048. European Language Resource Association (ELRA), Marseille, France, May 2020

    Google Scholar 

  2. Börstell, C., Mesch, J., Wallin, L.: Segmenting the Swedish sign language corpus: on the possibilities of using visual cues as a basis for syntactic segmentation. In: Beyond the Manual Channel. Proceedings of the 6th Workshop on the Representation and Processing of Sign Languages, pp. 7–10 (2014)

    Google Scholar 

  3. Bragg, D., et al.: Sign language recognition, generation, and translation: an interdisciplinary perspective. In: The 21st International ACM SIGACCESS Conference on Computers and Accessibility, pp. 16–31 (2019)

    Google Scholar 

  4. Bull, H., Braffort, A., Gouiffès, M.: MEDIAPI-SKEL - a 2D-skeleton video database of French sign language with aligned French subtitles. In: Proceedings of the Twelfth International Conference on Language Resources and Evaluation (LREC 2020), pp. 6063–6068. European Language Resource Association (ELRA), Marseille, France, May 2020

    Google Scholar 

  5. Chan, T., Zhu, W.: Level set based shape prior segmentation. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 2, pp. 1164–1170. IEEE (2005)

    Google Scholar 

  6. Crasborn, O.A.: How to recognise a sentence when you see one. Sign Lang. Linguist. 10(2), 103–111 (2007)

    Article  Google Scholar 

  7. De Beaugrande, R.: Sentence first, verdict afterwards: on the remarkable career of the “sentence”. Word 50(1), 1–31 (1999)

    Article  Google Scholar 

  8. Dreuw, P., Ney, H.: Towards automatic sign language annotation for the ELAN tool. In: Proceedings of the Third LREC Workshop on Representation and Processing of Sign Languages, pp. 50–53. European Language Resource Association (ELRA), Marrakech, Morocco, May 2008

    Google Scholar 

  9. Fenlon, J., Denmark, T., Campbell, R., Woll, B.: Seeing sentence boundaries. Sign Lang. Linguist. 10(2), 177–200 (2007)

    Article  Google Scholar 

  10. Filhol, M., Hadjadj, M.N., Choisier, A.: Non-manual features: the right to indifference. In: 6th Workshop on the Representation and Processing of Sign Languages: Beyond the Manual Channel. Satellite Workshop to the 9th International Conference on Language Resources and Evaluation (LREC 2014), pp. 49–54 (2014)

    Google Scholar 

  11. Huang, J., Zhou, W., Zhang, Q., Li, H., Li, W.: Video-based sign language recognition without temporal segmentation. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)

    Google Scholar 

  12. Ko, S.K., Kim, C.J., Jung, H., Cho, C.: Neural sign language translation based on human keypoint estimation. Appl. Sci. 9(13), 2683 (2019)

    Article  Google Scholar 

  13. Kolář, J., Lamel, L.: Development and evaluation of automatic punctuation for French and English speech-to-text. In: Thirteenth Annual Conference of the International Speech Communication Association (2012)

    Google Scholar 

  14. Koller, O., Zargaran, S., Ney, H.: Re-sign: re-aligned end-to-end sequence modelling with deep recurrent CNN-HMMS. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3416–3424, July 2017. https://doi.org/10.1109/CVPR.2017.364

  15. Sundermeyer, M., Ney, H., Schlüter, R.: From feedforward to recurrent LSTM neural networks for language modeling. IEEE/ACM Trans. Audio Speech Lang. Process. 23(3), 517–529 (2015)

    Article  Google Scholar 

  16. Veksler, O.: Star shape prior for graph-cut image segmentation. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5304, pp. 454–467. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88690-7_34

    Chapter  Google Scholar 

  17. Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-second AAAI Conference on Artificial Intelligence (2018)

    Google Scholar 

Download references

Acknowledgements

This work has been partially funded by the ROSETTA project, financed by the French Public Investment Bank (Bpifrance). Additionally, we thank Média-Pi ! for providing the data and for the useful discussions on this idea.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hannah Bull .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bull, H., Gouiffès, M., Braffort, A. (2020). Automatic Segmentation of Sign Language into Subtitle-Units. In: Bartoli, A., Fusiello, A. (eds) Computer Vision – ECCV 2020 Workshops. ECCV 2020. Lecture Notes in Computer Science(), vol 12536. Springer, Cham. https://doi.org/10.1007/978-3-030-66096-3_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-66096-3_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-66095-6

  • Online ISBN: 978-3-030-66096-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics