Disclosure of Invention
The invention aims to overcome the defects in the prior art and provide the method for judging the inversion of the Chinese text image, which has high processing speed and strong applicability. The technical scheme is as follows
A Chinese text image inversion discrimination method based on Chinese character stroke characteristics comprises the following steps:
(1) judging the type of the input scanning text image, and if the input scanning text image is a gray image, keeping the type unchanged; if the image is a color image, converting the color image into a gray image, and expressing the gray image by I;
(2) denoising, contrast enhancement and binarization processing are carried out on the gray level image I, and an obtained binarization processing result is represented by B;
(3) dividing B into 36 sub-regions with equal size, selecting 16 sub-regions near the center position as candidate text regions, and using Ti(i ═ 1,2,. 16);
(4) for each candidate text region, calculating the number of corresponding Chinese text points according to the following formula and performing normalization processing to obtain R representing the characteristics of each candidate text subregioni:
Wherein (s, T) ∈ T
iRepresents R
iIn the step (2), M multiplied by N is the total pixel point number of the gray level image I;
(5) presetting two threshold values TH less than 12And TH3For a certain TiIf TH is satisfied2<Ri<TH3Selecting the region as a text region for feature extraction, using Hk(K ═ 1, 2.., K) denotes a set of text regions used for feature extraction, K being the total number of text regions;
(6) two templates are selected, wherein the
template 1 is a human-shaped
stroke detection template 2 for describing the left-falling and right-falling composition structure in the Chinese character and is used for describing the cross-folding strokes in the Chinese character
Detecting a template of the strokes of the shape;
(7) calculating a characteristic value corresponding to the text region by using a stroke characteristic extraction algorithm: n is a radical of1、N2、M1And M2Stroke feature extractionThe algorithm is as follows:
1) for a certain text region H for feature extractionkCarrying out thinning processing on the text points by using a morphological thinning technology to finally obtain character skeleton structure information with single pixel width, and expressing a thinning result by using S;
2) and (3) respectively carrying out template matching on the S by using the template 1 and the template 2, wherein for the template 1, the process is as follows: for HkAligning a certain skeleton point with the reference point position of the template 1, and performing 'same or' operation on each point in the adjacent domain of the skeleton point and the position corresponding to the template 1; calculate HkThe accumulated value of the 'same or' operation result of each internal skeleton point is normalized, and the calculation result is expressed as Uk(j) Wherein j represents HkNumber of skeleton points in (1) for HkThe j-th skeleton point, if U is satisfiedk(j)>TH3Marking the point as a type 1 characteristic point; for the template 2, marking the feature points of the type 2 according to a similar process;
3) for each HkUsing step 2) to judge whether the point is two kinds of characteristic points, respectively accumulating the number of the 1 st type characteristic points and the 2 nd type characteristic points of the text image, using the total number of the two kinds of characteristic points as two characteristic values, and using N as the two characteristic values1And N2Represents;
4) rotating the image S by 180 DEG, the processing result being denoted by S ', repeating steps 2) and 3) for S', by M1And M2Representing the total number of the 1 st type characteristic points and the 2 nd type characteristic points corresponding to the S';
(9) detecting inversion of the text image:
1) according to the characteristic quantity N1、N2、M1And M2Calculating a composite characteristic value F, i.e. having
2) Presetting a threshold TH larger than 0.64And judging whether the text image is inverted or not according to the value size of the F, namely, judging whether the text image is inverted or not
Preferably, take TH2=0.2,TH30.4; threshold TH4The value is between 0.6 and 0.8.
The step (2) comprises the following steps:
1) denoising the gray level image I by adopting a median filtering technology with a template of 3 multiplied by 3, and expressing a processing result by G;
2) performing contrast enhancement processing on the filtering result image G by using a CLAHE technology, wherein a processing result is represented by E;
3) calculating the global threshold TH of E by using Otsu method1And use TH1And E is subjected to binarization processing, and the processing result is represented by B.
The Chinese text image inversion distinguishing method based on the Chinese character stroke characteristics does not need the text line extraction and single Chinese character positioning process, directly adopts the stroke matching template to search in the selected text region, and has the characteristics of high processing speed and strong applicability.
Detailed Description
Firstly, preprocessing an input text color image to improve the visual quality of the text image; then selecting a text area from the text image; next, the shape of the "human" is selected and
calculating the text region by using the template of two types of strokesThe number of feature points with two types of strokes; and finally, judging whether the text image is inverted or not by using the composite characteristic quantity. Fig. 1 shows a block diagram of the proposed method. The specific treatment process comprises the following steps: preprocessing, text region selection, stroke feature extraction and inversion judgment.
1. Pretreatment of
The purpose of the pre-processing is to improve the visual quality of the text image. In consideration of various degradation reasons existing in the generation process of the text image, the preprocessing step comprises the steps of graying, denoising, contrast enhancement, binarization and the like.
(1) Graying:
it is determined whether the input text image is a grayscale image. If the image is a gray image, keeping unchanged; in the case of color images, C is usedR、CGAnd CBRespectively representing red, green and blue channels, and taking the minimum value in the three channels as the gray value output, namely:
I(x,y)=min{CR(x,y),CG(x,y),CB(x,y)} (1)
in the formula, I represents an image after the graying process, x (x is 1, 2., H) and y (y is 1, 2., W) represent spatial position information of a pixel point in the image, and H and W are the height and width of the text image, respectively.
(2) De-noising
Considering that the text image is possibly polluted by noise in the acquisition and digitization processes, and the noise has obvious influence on the post-processing process, the median filtering technology is adopted to carry out denoising processing on the gray level image I. The template used for median filtering is a 3 x 3 square. The denoised image is denoted by G.
(3) Contrast enhancement
In the process of scanning the text image, due to the scanning device, the contrast of the text image may be low, or the illumination may be uneven, and if the process is not performed, the binarization process may be affected. The invention adopts contrast limited adaptive histogram equalization technology (CLAHE) to carry out contrast enhancement processing on the filtering image G, and the processing result is represented by E.
(4) Binarization processing
The method provided by the invention only needs to acquire binary information of the text, so that the Otsu method is used for calculating the global threshold corresponding to E, and TH is used1And (4) showing. Using TH1And (3) carrying out binarization processing on the E, wherein a processing result is represented by B, and the specific method comprises the following steps:
and B, representing the text point by the point set with the value of 1, and representing the background point by the point set with the value of 0.
2. Text region selection
For most text images, the text is generally located in a region near the center of the image. The proposed method divides the text image into 36 regions that do not overlap and are equal in area, as shown in fig. 2. The method selects 16 regions near the center of the image as candidate text regions, and uses TiAnd (i ═ 1, 2.., 16), as indicated by the dark regions in fig. 2.
According to the text binary image B, calculating the number of text points in each candidate text region by using a formula (3), performing normalization processing, and using RiThat means, that is:
wherein (s, T) ∈ TiRepresents RiThe coordinates of each pixel point in (1).
For a certain TiIf TH is satisfied2<Ri<TH3Then the region is selected as the text region for feature extraction. By Hk(K1, 2.., K) denotes a set of text regions used for feature extraction, and K is the total number of text regions.
3. Stroke feature extraction
By observing the structural characteristics of the strokes of 5000 Chinese characters commonly used, the cross sum of human shapes is found
The appearance times of the shape transverse-folding and cross-folding two stroke characteristics in the forward character are obviously higher than that in the inverted character. Therefore, the proposed method uses these two stroke features for inversion discrimination. FIG. 3 shows a "human" stroke detection template (called template 1) for describing the structure of the left-falling and right-falling components of Chinese characters. FIG. 4 is
And the stroke detection template (called template 2) of the shape is used for describing the cross-folding strokes in the Chinese characters. In fig. 4 and 5, "O" represents 1, i.e., a text point, and "x" represents 0, i.e., a background point; the dark areas indicate the location of the reference points.
The stroke characteristics used for inversion discrimination are calculated using the following algorithm.
The stroke feature extraction algorithm comprises the following steps:
1) for a certain text region HkAnd thinning the text points by using a morphological thinning technology to finally obtain character skeleton structure information with single pixel width, and expressing a thinning result by using S.
2) Using the template 1 and the template 2 shown in fig. 3 and 4, respectively, S is subjected to template matching. The specific process is as follows: selecting HkAligning a certain skeleton point with a reference point position of a template 1 (or a template 2), and then carrying out 'same or' operation on each point at corresponding positions in two areas with the size of 5 multiplied by 5, namely outputting 1 when the two points are simultaneously 1 or simultaneously 0; otherwise, the output is 0. Calculate HkThe accumulated value of the 'same or' operation result of each internal skeleton point is normalized, and the calculation result is expressed as Uk(j) Wherein j represents HkNumber of skeleton points in (1). For HkThe j-th skeleton point, if U is satisfiedk(j)>TH3Then mark the point as a type 1 feature point (or type 2 feature point)
3) For each HkUsing step 2 to judge whether the point is two kinds of characteristic points, accumulating the number of the 1 st type characteristic points (or 2 nd type characteristic points) of the text image, and adding the two kinds of characteristic pointsTotal number as two characteristic values, using N1And N2And (4) showing.
4) Rotating the image S by 180 DEG, the processing result being denoted by S ', repeating steps 2 and 3 for S', using M1And M2And (4) representing the total number of the 1 st type characteristic points and the 2 nd type characteristic points corresponding to the S'.
4. Text image inversion detection
According to the characteristic quantity N1、N2、M1And M2Calculating a composite eigenvalue F using equation (4), i.e.
Judging whether the text image is inverted or not according to the value size of the F, namely, judging whether the text image is inverted or not
Matlab2016b under a Windows 1064-bit education system is adopted as an experimental simulation platform, and a hardware platform is an Intel i5-4590CPU and a 16G memory.
120 text scanning images acquired by a high-resolution scanner are selected as a test set, wherein 58 text images are inverted, and 62 text images are forward. The document image types comprise various types such as books, academic journals, academic papers, PPT documents and the like. The typesetting mode comprises a single column mode and a double column mode, and the image content not only has common characters, but also comprises formulas, images, tables and the like. In acquiring the experimental image, the paper used for scanning is not required to be completely set, allowing some tilt to occur. In addition, in order to verify the universality of the method, the selected text contains common fonts in Chinese characters, and the method comprises the following steps: four characters, namely a Song style, an imitation Song style, a regular script and a black body, and the size of the characters is from two to six.
The text used for scanning was of standard a4 size (210mm × 297mm), the scanning accuracy was 300dpi, and the resolution of the scanned image was 2480 × 3508. The algorithm simulation was performed using Matlab, and the average time to process an image was about 330 ms. If the C language implementation algorithm with higher execution efficiency is replaced, the processing speed is higher, and the requirement of real-time processing can be met.
The proposed method uses four thresholds: TH1~TH4Wherein TH is1Automatically obtained by Otsu method according to the processed image; threshold TH2And TH3The method is used for distinguishing the text area from the non-text area, and in the text area, the text points occupy the area and the ratio of all pixel points in the area is in a certain range. If the ratio is too small, it may be a formula or a table area, and if the ratio is too large, it may be an image area. The extracted algorithm takes TH2=0.2,TH30.4; threshold TH4Can be between 0.6 and 0.8, TH4The larger the value of (A), the more accurate the stroke structure is detected, but the more easily the stroke structure is interfered by noise, and the TH is extracted by the extracted algorithm40.7. Experimental results show that whether all 120 text images are inverted or not can be judged correctly by adopting the method. Fig. 5 shows a part of the experimental image and the corresponding F-value. According to the experimental result, the inversion condition of the text image can be rapidly and effectively judged by adopting the method.