Abstract
Paleography studies the writing styles of manuscripts and recognizes different styles and modes of scripts. We explore the applicability of hard and soft-labeling for training deep-learning models to classify Hebrew scripts. In contrast to the hard-labeling scheme, where each document image has one label representing its class, the soft-labeling approach labels an image by a label vector. Each element of the vector is the similarity of the document image to a certain regional writing style or graphical mode. In addition, we introduce a dataset of medieval Hebrew manuscripts that provides complete coverage of major Hebrew writing styles and modes. A Hebrew paleography expert manually annotated the ground truth for soft-labeling. We compare the applicability of soft and hard-labeling approaches on the presented dataset, analyze, and discuss the findings.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Abdalhaleem, A., Barakat, B.K., El-Sana, J.: Case study: fine writing style classification using siamese neural network. In: 2nd International Workshop on Arabic and Derived Script Analysis and Recognition, pp. 62–66 (2018)
Beit-Arié, M.: Hebrew codicology. Tentative Typology of Technical Practices Employed in Hebrew Dated Medieval Manuscripts, Jerusalem (1981)
Beit-Arié, M.: Hebrew Codicology. ZFDM Repository (2021). https://doi.org/10.25592/uhhfdm.8849
Beit-Arié, M., Engel, E.: Specimens of mediaeval Hebrew scripts, vol. 3. Israel Academy of Sciences and Humanities (1987, 2002, 2017)
Christlein, V., Bernecker, D., Maier, A., Angelopoulou, E.: Offline writer identification using convolutional neural network activation Features. In: Gall, J., Gehler, P., Leibe, B. (eds.) Pattern Recognition, GCPR 2015. LNCS, vol. 9358, pp. 540–552. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24947-6_45
Christlein, V., Gropp, M., Fiel, S., Maier, A.: Unsupervised feature learning for writer identification and writer retrieval. In: 14th International Conference on Document Analysis and Recognition, vol. 1, pp. 991–997 (2017)
Cloppet, F., Eglin, V., Helias-Baron, M., Kieu, C., Vincent, N., Stutzmann, D.: ICDAR2017 competition on the classification of medieval handwritings in Latin script. In: 14th International Conference on Document Analysis and Recognition, vol. 1, pp. 1371–1376 (2017)
Cloppet, F., Eglin, V., Stutzmann, D., Vincent, N., et al.: ICFHR2016 competition on the classification of medieval handwritings in Latin script. In: 15th International Conference on Frontiers in Handwriting Recognition, pp. 590–595 (2016)
Dhali, M.A., Jansen, C.N., de Wit, J.W., Schomaker, L.: Feature-extraction methods for historical manuscript dating based on writing style development. Pattern Recogn. Lett. 131, 413–420 (2020)
Droby, A., Kurar Barakat, B., Vasyutinsky Shapira, D., Rabaev, I., El-Sana, J.: VML-HP: Hebrew paleography dataset. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) Document Analysis and Recognition – ICDAR 2021. LNCS, vol. 12824, pp. 205–220. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86337-1_14
Fiel, S., Sablatnig, R.: Writer identification and writer retrieval using the fisher vector on visual vocabularies. In: 12th International Conference on Document Analysis and Recognition, pp. 545–549 (2013)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
He, S., Samara, P., Burgers, J., Schomaker, L.: Discovering visual element evolutions for historical document dating. In: 15th International Conference on Frontiers in Handwriting Recognition, pp. 7–12 (2016)
He, S., Samara, P., Burgers, J., Schomaker, L.: Historical manuscript dating based on temporal pattern codebook. Comput. Vis. Image Underst. 152, 167–175 (2016)
He, S., Sammara, P., Burgers, J., Schomaker, L.: Towards style-based dating of historical documents. In: 14th International Conference on Frontiers in Handwriting Recognition, pp. 265–270 (2014)
Hosoe, M., Yamada, T., Kato, K., Yamamoto, K.: Offline text-independent writer identification based on writer-independent model using conditional autoencoder. In: 16th International Conference on Frontiers in Handwriting Recognition, pp. 441–446 (2018)
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Keglevic, M., Fiel, S., Sablatnig, R.: Learning features for writer retrieval and identification using triplet CNNs. In: 16th International Conference on Frontiers in Handwriting Recognition, pp. 211–216 (2018)
Richler, B.: Hebrew manuscripts in the Vatican library: catalogue, pp. 1–790 (2008)
Richler, B., Beit-Arié, M.: Hebrew manuscripts in the biblioteca palatina in parma: catalogue; palaeographical and codicological descriptions (2011)
Schor, U., Raziel-Kretzmer, V., Lavee, M., Kuflik, T.: Digital research library for multi-hierarchical interrelated texts: from ‘Tikkoun Sofrim’ text production to text modeling. In: Classics@18 (2021)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Sirat, C.: Hebrew Manuscripts of the Middle Ages. Cambridge University Press, Cambridge (2002)
Stökl Ben Ezra, D., Brown-DeVost, B., Jablonski, P.: Exploiting insertion symbols for marginal additions in the recognition process to establish reading order. In: Barney Smith, E.H., Pal, U. (eds.) Document Analysis and Recognition – ICDAR 2021 Workshops. LNCS, vol. 12917, pp. 317–324. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86159-9_22
Studer, L., et al.: A comprehensive study of imagenet pre-training for historical document image analysis. In: International Conference on Document Analysis and Recognition, pp. 720–725 (2019)
Szegedy, C., et al.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Vidal-Gorène, C., Decours-Perez, A.: A computational approach of Armenian paleography. In: Barney Smith, E.H., Pal, U. (eds.) Document Analysis and Recognition – ICDAR 2021 Workshops. LNCS, vol. 12917, pp. 295–305. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86159-9_20
Wecker, A.J., et al.: Tikkoun sofrim: a webapp for personalization and adaptation of crowdsourcing transcriptions. In: Adjunct Publication of the 27th Conference on User Modeling, Adaptation and Personalization, pp. 109–110 (2019)
Wolf, L., Potikha, L., Dershowitz, N., Shweka, R., Choueka, Y.: Computerized paleography: tools for historical manuscripts. In: 18th IEEE International Conference on Image Processing, pp. 3545–3548 (2011)
Yardeni, A., et al.: The Book of Hebrew Script: History, Palaeography, Script Styles, Calligraphy and Design. Carta Jerusalem, Jerusalem (1997)
Acknowledgment
This research was partially supported by The Frankel Center for Computer Science at Ben-Gurion University.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Droby, A., Shapira, D.V., Rabaev, I., Barakat, B.K., El-Sana, J. (2022). Hard and Soft Labeling for Hebrew Paleography: A Case Study. In: Uchida, S., Barney, E., Eglin, V. (eds) Document Analysis Systems. DAS 2022. Lecture Notes in Computer Science, vol 13237. Springer, Cham. https://doi.org/10.1007/978-3-031-06555-2_33
Download citation
DOI: https://doi.org/10.1007/978-3-031-06555-2_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06554-5
Online ISBN: 978-3-031-06555-2
eBook Packages: Computer ScienceComputer Science (R0)