Abstract
Document binarization plays a key role in information extraction pipelines from document images. In this paper, we propose a robust binarization algorithm that aims at obtaining highly accurate binary maps with reduced computational burden. The proposed technique exploits the effectiveness of the classic Sauvola thresholding set within a DNN environment. Our model learns to combine multi-scale Sauvola thresholds using a featurewise attention module that exploits the visual context of each pixel. The resulting binarization map is further enhanced by a spatial error concealment procedure to recover missing or severely degraded visual information. Moreover, we propose to employ an automatic color removal module that is responsible for suppressing any binarization irrelevant information from the image. This is especially important for structured documents, such as payment forms, where colored structures are used for better user experience and readability. The resulting model is compact, explainable and end-to-end trainable. The proposed technique outperforms the state-of-the-art algorithms in terms of binarization accuracy and successfully extracted information rates.
Similar content being viewed by others
References
Li, D., Wu, Y., Zhou, Y: SauvolaNet: learning adaptive sauvola network for degraded document binarization. In: Proceedings of ICDAR, Lausanne (2021)
Gatos, B., Ntirogiannis, K., Pratikakis, I.: ICDAR 2009 document image binarization contest (DIBCO 2009). In: Proceedings of ICDAR, Barcelona (2009)
Pratikakis, I., Gatos, B., Ntirogiannis, K.: H-DIBCO 2010 - handwritten document image binarization competition. In: Proceedings of ICFHR, Kolkata (2010)
Pratikakis, I., Gatos, B., Ntirogiannis, K.: ICDAR 2011 document image binarization contest (DIBCO 2011). In: Proceedings of ICDAR, Beijing (2011)
Pratikakis, I., Gatos, B., Ntirogiannis, K.: ICFHR 2012 competition on handwritten document image binarization (H-DIBCO 2012). In: Proceedings of ICFHR, Bari (2012)
Pratikakis, I., Gatos, B., Ntirogiannis, K.: ICDAR 2013 document image binarization contest (DIBCO 2013). In: Proceedings of ICDAR, Washington (2013)
Ntirogiannis, K., Gatos, B., Pratikakis, I.: ICFHR2014 competition on handwritten document image binarization (H-DIBCO 2014). In: Proceedings of ICFHR, Hersonissos (2014)
Pratikakis, I., Zagoris, K., Barlas, G., Gatos, B.: ICFHR2016 handwritten document image binarization contest (H-DIBCO 2016). In: Proceedings of ICFHR, Shenzhen (2016)
Pratikakis, I., Zagoris, K., Barlas, G., Gatos, B.: ICDAR2017 competition on document image binarization (DIBCO 2017). In: Proceedings of ICDAR, Kyoto (2017)
Pratikakis, I., Zagoris, K., Kaddas, P., Gatos, B.: ICFHR2018 competition on handwritten document image binarization (H-DIBCO 2018). In: Proceedings of ICFHR, Niagara Falls (2018)
Nafchi, H.Z., Ayatollahi, S.M., Moghaddam, R.F., Cheriet, M.: An efficient ground truthing tool for binarization of historical manuscripts. In: Proceedings of ICDAR (2013)
Deng, F., Wu, Z., Lu, Z., Brown, M.S.: Binarizationshop: a user-assisted software suite for converting old documents to black-and-white. In: Proceedings of Annual Joint Conf. on Digital Libraries (2010)
Hedjam, R., Nafchi, H.Z., Moghaddam, R.F., Kalacska, M., Cheriet, M.: ICDAR 2015 contest on multispectral text extraction (MS-Tex 2015). In: Proceedings of ICDAR (2015)
Calvo-Zaragoza, J., Gallego, A.J.: A selectional auto-encoder approach for document image binarization. Pattern Recogn. 86, 34–47 (2019)
Niblack, W.: An Introduction to Digital Image Processing, Strandberg Publishing Company (1985)
He, S., Schomaker, L.: Deepotsu: document enhancement and binarization using iterative deep learning. Pattern Recogn. 91, 379–390 (2019)
Sauvola, J., Pietikäinen, M.: Adaptive document image binarization. Pattern Recogn. 33, 225–236 (2000)
Hadjadj, Z., Meziane, A., Cherfa, Y., Cheriet, M., Setitra, I.: Isauvola: improved sauvola’s algorithm for document image binarization. In: Proceedings of ICIAR, Póvoa de Varzim (2016)
De, R., Chakraborty, A., Sarkar, R.: Document image binarization using dual discriminator generative adversarial networks. IEEE Signal Process. Lett. 27, 1090–1094 (2020)
Zhao, J., Shi, C., Jia, F., Wang, Y., Xiao, B.: Document image binarization with cascaded generators of conditional generative adversarial networks. Pattern Recogn. 96, 106968 (2019)
Jemni, S.K., Souibgui, M.A., Kessentini, Y., Fornés, A.: Enhance to read better: a multi-task adversarial network for handwritten document image enhancement. Pattern Recogn. 123, 108370 (2021)
Otsu, N.: A threshold selection method from gray-level histograms. Automatica (1975)
Vo, G.D., Park, C.: Robust regression for image binarization under heavy noise and nonuniform background. Pattern Recogn. 81, 224–239 (2018)
Moghaddam, R.F., Cheriet, M.: A multi-scale framework for adaptive binarization of degraded document images. Pattern Recogn. 43, 2186–2198 (2010)
Peng, X., Wang, C., Cao, H.: Document binarization via multi-resolutional attention model with DRD loss, In: Proceedings of ICDAR, Sydney (2020)
Tensmeyer, C., Martínez, T.: Document image binarization with fully convolutional neural networks. In: Proceedings of ICDAR, Kyoto (2017)
Lazzara, G., Géraud, T.: Efficient multiscale Sauvola’s binarization. Int. J. Document Anal. Recogn. 17, 105–123 (2014)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention (2015)
Wan, A.M., Mohamed, M.M.A.K.: Binarization of document image using optimum threshold modification. J. Phys. 1019, 012022 (2018)
Kaur, A., Rani, U., Gurpreet, S.J.: Modified Sauvola Binarization for Degraded Document Images. Engineering Applications of Artificial Intelligence (2020)
Koloda, J., Peinado, A.M., Sánchez, V.: Kernel-based MMSE multimedia signal reconstruction and its application to spatial error concealment. IEEE Trans. Multimed. (2014)
Koloda, J., Seiler, J., Peinado, A.M., Kaup, A.: Scalable kernel-based minimum mean square error estimator for accelerated image error concealment. IEEE Trans. Broadcasting 63, 59–70 (2017)
Geiger, A., Lenz, P., Urtasun, R.: Are We Ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In: Proceedings of CVPR, Providence (2012)
Gini GmbH. https://gini.net/en/products/extract/gini-smart. Accessed 15 Jan 2023
Gini Photo Payment. https://gini.net/en/gini-now-processes-over-7-million-photo-payments-per-month. Accessed 15 Jan 2023
Document Data Capture. https://www.bitkom.org/sites/default/files/file/import/130302-Document-Data-Capture.pdf. Accessed 15 Jan 2023
Lin, Y.-S., Ju, R.-Y., Chen, C.-C., Lin, T.-Y., Chiang, J.-S.: Three-Stage Binarization of Color Document Images Based on Discrete Wavelet Transform and Generative Adversarial Networks, arXiv preprint (2022)
CADB Testset. https://github.com/gini/vision-cadb-testset.git. Accessed 15 Jan 2023
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Koloda, J., Wang, J. (2023). Context Aware Document Binarization and Its Application to Information Extraction from Structured Documents. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14187. Springer, Cham. https://doi.org/10.1007/978-3-031-41676-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-41676-7_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41675-0
Online ISBN: 978-3-031-41676-7
eBook Packages: Computer ScienceComputer Science (R0)