Skip to main content

Andrei Dvornic

The most important step in automatic content conversion is the preprocessing step. Having a very good scanned document is almost a safe bet that the document will have the content extracted with a good confidence level. The current paper... more
The most important step in automatic content conversion is the preprocessing step. Having a very good scanned document is almost a safe bet that the document will have the content extracted with a good confidence level. The current paper describes some preprocessing methods which can be used in large images that must be scanned by pieces because they simply don't fit entirely the scanner area.
In this paper we address the problem of binarization of scanned documents which is a preprocessing requirement for most algorithms aimed at document image analysis. Two new approaches which focus on problem areas like low contrast... more
In this paper we address the problem of binarization of scanned documents which is a preprocessing requirement for most algorithms aimed at document image analysis. Two new approaches which focus on problem areas like low contrast documents, noise, and backside image showing through the paper sheet are presented in the following. First of all we propose a technique which is based on an initial preprocessing step followed by a conversion from the continuous space to the bitonal document. The first stage of this process focuses on document characteristics enhancement through contrast stretching for each color channel. The second step is a locally adaptive binarization process using color thresholding based on a Gaussian blur effect. Apart from that we present a noise-removal conversion technique based on combining the result of a series of threshold masks. Experimental results are given in order to verify the effectiveness of the proposed technique.
Bitonal conversion is a basic preprocessing step in Automatic Content Analysis, a very active research area in the past years. The information retrieval process is performed usually on black and white documents in order to increase... more
Bitonal conversion is a basic preprocessing step in Automatic Content Analysis, a very active research area in the past years. The information retrieval process is performed usually on black and white documents in order to increase efficiency and use simplified investigation techniques. This paper presents a number of new modern conversion algorithms which are aimed at becoming an alternative to current approaches used in the industry. The proposed methods are suitable for both scanned images and documents in electronic format. Firstly, an algorithm consisting of a contrast enhancement step, followed by a conversion based on adaptive levelling of the document is presented. Then a new multi-threshold technique is suggested as a solution for noise interferences, a common feature of scanned books and newspapers. Finally, three more approaches adapted to the particular properties of electronic documents are introduced. Experimental results are given in order to verify the effectiveness ...
This paper describes an approach towards obtaining the normalized measure of text resemblance in scanned images. The technique, aimed at automatic content conversion, is relying on the detection of standard character features and uses a... more
This paper describes an approach towards obtaining the normalized measure of text resemblance in scanned images. The technique, aimed at automatic content conversion, is relying on the detection of standard character features and uses a sequence of procedures and algorithms applied sequentially on the input document. The approach makes use solely of the geometrical characteristics of characters, ignoring information regarding context or the character-recognition.
The most important step in automatic content conversion is the preprocessing step. Having a very good scanned document is almost a safe bet that the document will have the content extracted with a good confidence level. The current paper... more
The most important step in automatic content conversion is the preprocessing step. Having a very good scanned document is almost a safe bet that the document will have the content extracted with a good confidence level. The current paper describes some preprocessing methods which can be used in large images that must be scanned by pieces because they simply don’t fit entirely the scanner area. We propose a novel digital multi-pass scanned image merge scheme for newspapers or other historical documents, allowing further content exploitation in an efficient way. The goal is to combine multiple images with or without overlapping fields of view in order to produce segmented panorama or high resolution document.
This paper describes an approach towards obtaining the normalized measure of text resemblance in sca nned images. The technique, aimed at automatic content conversion, is relying on the detection of standard character features and uses a... more
This paper describes an approach towards obtaining the normalized measure of text resemblance in sca nned images. The technique, aimed at automatic content conversion, is relying on the detection of standard character features and uses a sequence of procedures and algorithms applied sequentially on the input document. The approach makes use solely of the geometrical characteristics of characters, ignoring information regarding context or the character-recognition.
This paper describes an approach towards obtaining a normalized measure of text resemblance in scanned images, relying on the detection of standard character features, and using a sequence of procedures and algorithms on input images, for... more
This paper describes an approach towards obtaining a normalized measure of text resemblance in scanned images, relying on the detection of standard character features, and using a sequence of procedures and algorithms on input images, for automatic content conversion purposes. The approach relies solely on geometrical characteristics of the characters, without any information regarding context or the recognition of characters.
The most important step in automatic content conversion is the preprocessing step. Having a very good scanned document is almost a safe bet that the document will have he content extracted with a good confidence level. The current... more
The most important step in automatic content conversion is the preprocessing step. Having a very good scanned document is almost a safe bet that the document will have
he content extracted with a good confidence level.
The current paper describes some preprocessing methods which can be used in large images that must be scanned by pieces because they simply don’t fit entirely the scanner area.
In this paper we address the problem of binarization of scanned documents, a preprocessing requirement for most algorithms aimed at document image analysis. Two new approaches are presented, both focusing on problem areas like low... more
In this paper we address the problem of binarization of scanned documents, a preprocessing requirement for most algorithms aimed at document image analysis. Two new approaches are presented, both focusing on problem areas like low contrast, noise, and backside image showing through the paper sheet. The first proposed technique is based on an initial preprocessing step followed by the actual conversion from the continuous space to the bitonal one. The algorithm starts by enhancing document characteristics by means of contrast stretching for each color channel. Then a locally adaptive binarization process is performed, using color thresholding based on the Gaussian blur effect. The second proposed method, a conversion technique aimed at noise-removal, is performed by combining the results of a series of threshold masks. Experimental results are given in order to verify the effectiveness of the proposed algorithms.
This paper presents a new approach for black and white conversion of greyscale images. The algorithm can be easily modified to work on colour images by changing the comparison pattern of the peaks. It uses a scanning method for black... more
This paper presents a new approach for black and white conversion of greyscale images. The algorithm can be easily modified to work on colour images by changing the comparison pattern of the peaks. It uses a scanning method for black pixel peaks and an independent threshold associated to a histogram of the scanned area. The results have been good for different types of documents.
This paper describes an approach towards obtaining the normalized measure of text resemblance in scanned images. The technique, aimed at automatic content conversion, is relying on the detection of standard character features and uses a... more
This paper describes an approach towards obtaining the normalized measure of text resemblance in
scanned images. The technique, aimed at automatic content conversion, is relying on the detection of standard
character features and uses a sequence of procedures and algorithms applied sequentially on the input document.
The approach makes use solely of the geometrical characteristics of characters, ignoring information regarding
context or the character-recognition.