Abstract
Ancient documents are usually degraded by the presence of strong background artifacts. These are often caused by the so-called bleed-through effect, a pattern that interferes with the main text due to seeping of ink from the reverse side. A similar effect, called show-through and due to the nonperfect opacity of the paper, may appear in scans of even modern, well-preserved documents. These degradations must be removed to improve human or automatic readability. For this purpose, when a color scan of the document is available, we have shown that a simplified linear pattern overlapping model allows us to use very fast blind source separation techniques. This approach, however, cannot be applied to grayscale scans. This is a serious limitation, since many collections in our libraries and archives are now only available as grayscale scans or microfilms. We propose here a new model for bleed-through in grayscale document images, based on the availability of the recto and verso pages, and show that blind source separation can be successfully applied in this case too. Some experiments with real-ancient documents arepresented and described.
Similar content being viewed by others
References
Leedham, G., Varma, S., Patankar, A., Govindaraju, V.: Separating text and background in degraded document images—a comparison of global thresholding techniques for multi-stage thresholding. In: Proceedings of the 8th International Workshop on Frontiers in Handwriting Recognition, Niagara on the Lake, Canada, pp. 244–249 (2002)
Govindaraju, V., Srihari, N.: Separating handwritten text from overlapping nontextual contours. In: Proceedings of the International Workshop on Frontiers in Handwriting Recognition, Chateau de Bonas, France, pp. 111–119 (1991)
Franke, K., Köppen, M.: A computer-based system to support forensic studies on handwritten documents. IJDAR 3, 218–231 (2001)
Sharma, G.: Show-through cancellation in scans of duplex printed documents. IEEE Trans. Image Process. 10(5), 736–754 (2001)
Dubois, E., Pathak, A.: Reduction of bleed-through in scanned manuscript documents. In: Proceedings of the IS&T Image Processing, Image Quality, Image Capture Systems Conference, Montreal, Canada, pp. 177–180 (2001)
Tan, C.L., Cao, R., Peiyi, S.: Restoration of archival documents using a wavelet technique. IEEE Trans. Pattern Anal. Machine Intell. 24, 1399–1404 (2002)
Dano, P.: Joint restoration and compression of document images with bleed-through distortion. Master thesis, Ottawa-Carleton Institute for Electrical and Computer Engineering, School of Information Technology and Engineering, University of Ottawa (2003)
Nishida, H., Suzuki, T.: Correcting of show-through effects on document images by multiscale analysis. In: Proceedings of the 16th Conference on Pattern Recognition, Quebec City, Canada, pp. 65–68 (2002)
Nishida, H., Suzuki, T.: A multiscale approach to restoring scanned color document images with show-through effects. In: Proceedings of the 7th International Conference on Document Analysis and Recognition (ICDAR 2003) (2003)
Tonazzini, A., Bedini, L., Salerno, E.: Independent component analysis for document restoration. IJDAR 7(1), 17–27 (2004)
Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. Wiley, New York (2001)
Tonazzini, A., Salerno, E., Mochi, M., Bedini, L.: Bleed-through removal from degraded documents using a color decorrelation method. In: Document Analysis Systems VI, LNCS 3163, pp. 229–240. Springer, Berlin Heidelberg New York (2004)
Tonazzini, A., Salerno, E., Mochi, M., Bedini, L.: Blind source separation techniques for detecting hidden texts and textures in document images. In: Image Analysis and Recognition, LNCS 3212, Part II, pp. 241–248. Springer, Berlin Heidelberg New York (2004)
Salerno, E., Tonazzini, A., Bedini, L.: Digital image analysis to enhance underwritten text in the Archimedes palimpsest. IJDAR (submitted)
Cichocki, A., Amari, S.-I.: Adaptive Blind Signal and Image Processing. Wiley, New York (2002)
Bell, A.J., Sejnowski, T.J.: An information maximization approach to blind separation and blind deconvolution. Neural Comput. 7, 1129–1159 (1995)
Ohta, Y., Kanade, T., Sakai, T.: Color information for region segmentation. Comput. Graph. Vis. Image Process. 13, 222–241 (1980)
Hyvärinen, A., Oja, E.: Independent component analysis: algorithms and applications. Neural Netw. 13, 411–430 (2000)
Author information
Authors and Affiliations
Corresponding author
Additional information
Anna Tonazzini graduated cum laude in Mathematics from the University of Pisa, Italy, in 1981. In 1984 she joined the Istituto di Scienza e Tecnologie dell'Informazione of the Italian National Research Council (CNR) in Pisa, where she is currently a researcher at the Signals and Images Laboratory. She cooperated in special programs for basic and applied research on image processing and computer vision, and is co-author of over 60 scientific papers. Her present interest is on inverse problems theory, image restoration and reconstruction, document analysis and recognition, independent component analysis, neural networks and learning.
Emanuele Salerno graduated in Electronic Engineering from the University of Pisa, Italy, in 1985. In September 1987 he joined the Italian National Research Council (CNR) at the Department of Signal and Image Processing, Information Processing Institute (now Institute of Information Science and Technologies, ISTI, Signals and Images Laboratory), Pisa, Italy, where he has been working in applied inverse problems, image reconstruction and restoration, microwave nondestructive evaluation, and blind signal separation. He has been assuming different responsibilities in research programs in nondesctructive testing, robotics, numerical models for image reconstruction and computer vision, neural networks techniques in astrophysical imagery. At present, he is local scientific responsible in the framework of the European Space Agency's “Planck Surveyor Satellite” mission, and takes part in the European CRAFT project “ISyReADeT”, for document image restoration.
Luigi Bedini graduated cum laude in Electronic Engineering from the University of Pisa, Italy, in 1968. Since 1970 he has been a Researcher of the Italian National Research Council, Istituto di Scienza e Tecnologie dell'Informazione, Pisa, Italy. His interests have been in modelling, identification, and parameter estimation of biological systems applied to non-invasive diagnostic techniques. At present, his research interest is in the field of digital signal processing, image reconstruction and neural networks applied to image processing. He is co-author of more than 80 scientific papers. From 1971 to 1989, he was Associate Professor of System Theory at the Computer Science Department, University of Pisa, Italy.
Rights and permissions
About this article
Cite this article
Tonazzini, A., Salerno, E. & Bedini, L. Fast correction of bleed-through distortion in grayscale documents by a blind source separation technique. IJDAR 10, 17–25 (2007). https://doi.org/10.1007/s10032-006-0015-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-006-0015-z