[go: up one dir, main page]

Skip to main content

Hard and Soft Labeling for Hebrew Paleography: A Case Study

  • Conference paper
  • First Online:
Document Analysis Systems (DAS 2022)

Abstract

Paleography studies the writing styles of manuscripts and recognizes different styles and modes of scripts. We explore the applicability of hard and soft-labeling for training deep-learning models to classify Hebrew scripts. In contrast to the hard-labeling scheme, where each document image has one label representing its class, the soft-labeling approach labels an image by a label vector. Each element of the vector is the similarity of the document image to a certain regional writing style or graphical mode. In addition, we introduce a dataset of medieval Hebrew manuscripts that provides complete coverage of major Hebrew writing styles and modes. A Hebrew paleography expert manually annotated the ground truth for soft-labeling. We compare the applicability of soft and hard-labeling approaches on the presented dataset, analyze, and discuss the findings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://sfardata.nli.org.il/.

References

  1. Abdalhaleem, A., Barakat, B.K., El-Sana, J.: Case study: fine writing style classification using siamese neural network. In: 2nd International Workshop on Arabic and Derived Script Analysis and Recognition, pp. 62–66 (2018)

    Google Scholar 

  2. Beit-Arié, M.: Hebrew codicology. Tentative Typology of Technical Practices Employed in Hebrew Dated Medieval Manuscripts, Jerusalem (1981)

    Google Scholar 

  3. Beit-Arié, M.: Hebrew Codicology. ZFDM Repository (2021). https://doi.org/10.25592/uhhfdm.8849

  4. Beit-Arié, M., Engel, E.: Specimens of mediaeval Hebrew scripts, vol. 3. Israel Academy of Sciences and Humanities (1987, 2002, 2017)

    Google Scholar 

  5. Christlein, V., Bernecker, D., Maier, A., Angelopoulou, E.: Offline writer identification using convolutional neural network activation Features. In: Gall, J., Gehler, P., Leibe, B. (eds.) Pattern Recognition, GCPR 2015. LNCS, vol. 9358, pp. 540–552. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24947-6_45

  6. Christlein, V., Gropp, M., Fiel, S., Maier, A.: Unsupervised feature learning for writer identification and writer retrieval. In: 14th International Conference on Document Analysis and Recognition, vol. 1, pp. 991–997 (2017)

    Google Scholar 

  7. Cloppet, F., Eglin, V., Helias-Baron, M., Kieu, C., Vincent, N., Stutzmann, D.: ICDAR2017 competition on the classification of medieval handwritings in Latin script. In: 14th International Conference on Document Analysis and Recognition, vol. 1, pp. 1371–1376 (2017)

    Google Scholar 

  8. Cloppet, F., Eglin, V., Stutzmann, D., Vincent, N., et al.: ICFHR2016 competition on the classification of medieval handwritings in Latin script. In: 15th International Conference on Frontiers in Handwriting Recognition, pp. 590–595 (2016)

    Google Scholar 

  9. Dhali, M.A., Jansen, C.N., de Wit, J.W., Schomaker, L.: Feature-extraction methods for historical manuscript dating based on writing style development. Pattern Recogn. Lett. 131, 413–420 (2020)

    Article  Google Scholar 

  10. Droby, A., Kurar Barakat, B., Vasyutinsky Shapira, D., Rabaev, I., El-Sana, J.: VML-HP: Hebrew paleography dataset. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) Document Analysis and Recognition – ICDAR 2021. LNCS, vol. 12824, pp. 205–220. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86337-1_14

  11. Fiel, S., Sablatnig, R.: Writer identification and writer retrieval using the fisher vector on visual vocabularies. In: 12th International Conference on Document Analysis and Recognition, pp. 545–549 (2013)

    Google Scholar 

  12. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  13. He, S., Samara, P., Burgers, J., Schomaker, L.: Discovering visual element evolutions for historical document dating. In: 15th International Conference on Frontiers in Handwriting Recognition, pp. 7–12 (2016)

    Google Scholar 

  14. He, S., Samara, P., Burgers, J., Schomaker, L.: Historical manuscript dating based on temporal pattern codebook. Comput. Vis. Image Underst. 152, 167–175 (2016)

    Article  Google Scholar 

  15. He, S., Sammara, P., Burgers, J., Schomaker, L.: Towards style-based dating of historical documents. In: 14th International Conference on Frontiers in Handwriting Recognition, pp. 265–270 (2014)

    Google Scholar 

  16. Hosoe, M., Yamada, T., Kato, K., Yamamoto, K.: Offline text-independent writer identification based on writer-independent model using conditional autoencoder. In: 16th International Conference on Frontiers in Handwriting Recognition, pp. 441–446 (2018)

    Google Scholar 

  17. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)

    Google Scholar 

  18. Keglevic, M., Fiel, S., Sablatnig, R.: Learning features for writer retrieval and identification using triplet CNNs. In: 16th International Conference on Frontiers in Handwriting Recognition, pp. 211–216 (2018)

    Google Scholar 

  19. Richler, B.: Hebrew manuscripts in the Vatican library: catalogue, pp. 1–790 (2008)

    Google Scholar 

  20. Richler, B., Beit-Arié, M.: Hebrew manuscripts in the biblioteca palatina in parma: catalogue; palaeographical and codicological descriptions (2011)

    Google Scholar 

  21. Schor, U., Raziel-Kretzmer, V., Lavee, M., Kuflik, T.: Digital research library for multi-hierarchical interrelated texts: from ‘Tikkoun Sofrim’ text production to text modeling. In: Classics@18 (2021)

    Google Scholar 

  22. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  23. Sirat, C.: Hebrew Manuscripts of the Middle Ages. Cambridge University Press, Cambridge (2002)

    Google Scholar 

  24. Stökl Ben Ezra, D., Brown-DeVost, B., Jablonski, P.: Exploiting insertion symbols for marginal additions in the recognition process to establish reading order. In: Barney Smith, E.H., Pal, U. (eds.) Document Analysis and Recognition – ICDAR 2021 Workshops. LNCS, vol. 12917, pp. 317–324. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86159-9_22

  25. Studer, L., et al.: A comprehensive study of imagenet pre-training for historical document image analysis. In: International Conference on Document Analysis and Recognition, pp. 720–725 (2019)

    Google Scholar 

  26. Szegedy, C., et al.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

    Google Scholar 

  27. Vidal-Gorène, C., Decours-Perez, A.: A computational approach of Armenian paleography. In: Barney Smith, E.H., Pal, U. (eds.) Document Analysis and Recognition – ICDAR 2021 Workshops. LNCS, vol. 12917, pp. 295–305. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86159-9_20

  28. Wecker, A.J., et al.: Tikkoun sofrim: a webapp for personalization and adaptation of crowdsourcing transcriptions. In: Adjunct Publication of the 27th Conference on User Modeling, Adaptation and Personalization, pp. 109–110 (2019)

    Google Scholar 

  29. Wolf, L., Potikha, L., Dershowitz, N., Shweka, R., Choueka, Y.: Computerized paleography: tools for historical manuscripts. In: 18th IEEE International Conference on Image Processing, pp. 3545–3548 (2011)

    Google Scholar 

  30. Yardeni, A., et al.: The Book of Hebrew Script: History, Palaeography, Script Styles, Calligraphy and Design. Carta Jerusalem, Jerusalem (1997)

    Google Scholar 

Download references

Acknowledgment

This research was partially supported by The Frankel Center for Computer Science at Ben-Gurion University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmad Droby .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Droby, A., Shapira, D.V., Rabaev, I., Barakat, B.K., El-Sana, J. (2022). Hard and Soft Labeling for Hebrew Paleography: A Case Study. In: Uchida, S., Barney, E., Eglin, V. (eds) Document Analysis Systems. DAS 2022. Lecture Notes in Computer Science, vol 13237. Springer, Cham. https://doi.org/10.1007/978-3-031-06555-2_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-06555-2_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-06554-5

  • Online ISBN: 978-3-031-06555-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics