CN107609482B

CN107609482B - Chinese text image inversion discrimination method based on Chinese character stroke characteristics

Info

Publication number: CN107609482B
Application number: CN201710695383.2A
Authority: CN
Inventors: 王建; 庞彦伟
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2017-08-15
Filing date: 2017-08-15
Publication date: 2021-02-19
Anticipated expiration: 2037-08-15
Also published as: CN107609482A

Abstract

The invention relates to a method for inverting Chinese text images based on stroke features of Chinese characters, which comprises using I to represent a grayscale image of an input scanned text image; The value processing result is represented by B; B is divided into 36 sub-regions of equal size, and 16 sub-regions close to the center are selected as candidate text regions; for each candidate text region, the number of corresponding Chinese text points is calculated and calculated. Do normalization processing: select the text area used for feature extraction; select two templates, template 1 is the "human" stroke detection template used to describe the structure of left-handed and right-handed in Chinese characters. 2 is used to describe the Chinese characters. horizontally folded

Shaped stroke detection template; use stroke feature extraction algorithm to calculate the feature value corresponding to the text area; text image inversion detection.

Description

Chinese text image inversion discrimination method based on Chinese character stroke characteristics

Technical Field

The invention relates to a text image processing technology, in particular to an inversion discrimination technology for a Chinese text image.

Background

With the continuous development of computer technology, OCR-based text image digitization technology is widely used. In completing the OCR process, the direction of the characters in the text image has a great influence on the character recognition performance. When the characters are inclined, if the characters are not corrected, the recognition rate of the characters is seriously influenced. Existing OCR technology is almost completely ineffective, especially when the text is inverted (i.e., deviated from the normal orientation by about 180 °). Therefore, before OCR is performed on the text image, it is necessary to determine whether the text image has an inversion condition, so that the subsequent recognition process is performed normally.

The existing inversion discrimination method for Chinese text images is mainly divided into four types: OCR recognition result-based methods, image projection feature-based methods, text punctuation symbol-based methods, and stroke feature-based methods. (1) The OCR-based method is to perform OCR recognition twice on the original image and the rotated image respectively, and judge whether the original image is inverted or not by comparing the difference of the recognition results of the two times. This method is inefficient to perform because of the need for two OCR recognitions. (2) The method based on the projection characteristics performs projection processing on the image, performs classification analysis on projection data or determines the direction of the text image by using the similarity between the character line and a positive direction data template. However, when the text image contains noise or complex background, the accuracy of detection is significantly reduced. (3) The method based on text punctuation marks judges the direction of a text according to the relative position attribute of the punctuation marks and text lines in the text typesetting, improves the efficiency and the accuracy of the inversion judgment of a text image to a certain extent, but has lower detection accuracy when the characters and the punctuation marks in the text lines are relatively staggered due to text distortion. In addition, the method only uses punctuation features, and is ineffective for text images with few punctuations. (4) The text image inversion judgment method based on the Chinese character stroke features judges the positive and negative directions of a text according to the outline and the trend features of the left-falling stroke. The method well overcomes the defects of the punctuation-based method and has good effect on the inclination condition of the text image. However, the method needs to extract a single Chinese character and analyze the stroke characteristics of each Chinese character, and the processing speed is low. In addition, if the resolution of the text image is low or the scanning quality is poor, the detection effect of the method is reduced.

In the patent, cinnarizer et al (patent application No. 2012103138349) propose a punctuation-based text up-down direction detection method. The method proposed in this patent determines the direction of the text based on the relative position attributes of the punctuation marks and the lines of the text. The punctuation-based method completely depends on punctuation characteristics and is ineffective for text images with few punctuations, so the method has limited application range. Wangki et al (patent application No. 2017100902409) proposed a text image inversion fast detection method based on text line classification. The method divides text lines into three categories: the text lines are left indented text lines, right indented text lines and non-indented text lines, and whether the text image has inversion is judged according to the relative number of the left indented text lines and the right indented text lines. However, the method has poor detection effect on the text images laid out in two or more columns.

Disclosure of Invention

The invention aims to overcome the defects in the prior art and provide the method for judging the inversion of the Chinese text image, which has high processing speed and strong applicability. The technical scheme is as follows

A Chinese text image inversion discrimination method based on Chinese character stroke characteristics comprises the following steps:

(1) judging the type of the input scanning text image, and if the input scanning text image is a gray image, keeping the type unchanged; if the image is a color image, converting the color image into a gray image, and expressing the gray image by I;

(2) denoising, contrast enhancement and binarization processing are carried out on the gray level image I, and an obtained binarization processing result is represented by B;

(3) dividing B into 36 sub-regions with equal size, selecting 16 sub-regions near the center position as candidate text regions, and using T_i(i ═ 1,2,. 16);

(4) for each candidate text region, calculating the number of corresponding Chinese text points according to the following formula and performing normalization processing to obtain R representing the characteristics of each candidate text subregion_i：

Wherein (s, T) ∈ T_iRepresents R_iIn the step (2), M multiplied by N is the total pixel point number of the gray level image I;

(5) presetting two threshold values TH less than 1₂And TH₃For a certain T_iIf TH is satisfied₂<R_i<TH₃Selecting the region as a text region for feature extraction, using H_k(K ═ 1, 2.., K) denotes a set of text regions used for feature extraction, K being the total number of text regions;

(6) two templates are selected, wherein the template 1 is a human-shaped stroke detection template 2 for describing the left-falling and right-falling composition structure in the Chinese character and is used for describing the cross-folding strokes in the Chinese character

Detecting a template of the strokes of the shape;

(7) calculating a characteristic value corresponding to the text region by using a stroke characteristic extraction algorithm: n is a radical of₁、N₂、M₁And M₂Stroke feature extractionThe algorithm is as follows:

1) for a certain text region H for feature extraction_kCarrying out thinning processing on the text points by using a morphological thinning technology to finally obtain character skeleton structure information with single pixel width, and expressing a thinning result by using S;

2) and (3) respectively carrying out template matching on the S by using the template 1 and the template 2, wherein for the template 1, the process is as follows: for H_kAligning a certain skeleton point with the reference point position of the template 1, and performing 'same or' operation on each point in the adjacent domain of the skeleton point and the position corresponding to the template 1; calculate H_kThe accumulated value of the 'same or' operation result of each internal skeleton point is normalized, and the calculation result is expressed as U_k(j) Wherein j represents H_kNumber of skeleton points in (1) for H_kThe j-th skeleton point, if U is satisfied_k(j)>TH₃Marking the point as a type 1 characteristic point; for the template 2, marking the feature points of the type 2 according to a similar process;

3) for each H_kUsing step 2) to judge whether the point is two kinds of characteristic points, respectively accumulating the number of the 1 st type characteristic points and the 2 nd type characteristic points of the text image, using the total number of the two kinds of characteristic points as two characteristic values, and using N as the two characteristic values₁And N₂Represents;

4) rotating the image S by 180 DEG, the processing result being denoted by S ', repeating steps 2) and 3) for S', by M₁And M₂Representing the total number of the 1 st type characteristic points and the 2 nd type characteristic points corresponding to the S';

(9) detecting inversion of the text image:

1) according to the characteristic quantity N₁、N₂、M₁And M₂Calculating a composite characteristic value F, i.e. having

2) Presetting a threshold TH larger than 0.6₄And judging whether the text image is inverted or not according to the value size of the F, namely, judging whether the text image is inverted or not

Preferably, take TH₂＝0.2，TH₃0.4; threshold TH₄The value is between 0.6 and 0.8.

The step (2) comprises the following steps:

1) denoising the gray level image I by adopting a median filtering technology with a template of 3 multiplied by 3, and expressing a processing result by G;

2) performing contrast enhancement processing on the filtering result image G by using a CLAHE technology, wherein a processing result is represented by E;

3) calculating the global threshold TH of E by using Otsu method₁And use TH₁And E is subjected to binarization processing, and the processing result is represented by B.

The Chinese text image inversion distinguishing method based on the Chinese character stroke characteristics does not need the text line extraction and single Chinese character positioning process, directly adopts the stroke matching template to search in the selected text region, and has the characteristics of high processing speed and strong applicability.

Drawings

FIG. 1 is a flow chart of the proposed method

FIG. 2 text area example

FIG. 3 is a "people" stroke detection template

FIG. 4 is

Stroke detection template

FIG. 5 is a partial experimental image and corresponding feature values F

Detailed Description

Firstly, preprocessing an input text color image to improve the visual quality of the text image; then selecting a text area from the text image; next, the shape of the "human" is selected and

calculating the text region by using the template of two types of strokesThe number of feature points with two types of strokes; and finally, judging whether the text image is inverted or not by using the composite characteristic quantity. Fig. 1 shows a block diagram of the proposed method. The specific treatment process comprises the following steps: preprocessing, text region selection, stroke feature extraction and inversion judgment.

1. Pretreatment of

The purpose of the pre-processing is to improve the visual quality of the text image. In consideration of various degradation reasons existing in the generation process of the text image, the preprocessing step comprises the steps of graying, denoising, contrast enhancement, binarization and the like.

(1) Graying:

it is determined whether the input text image is a grayscale image. If the image is a gray image, keeping unchanged; in the case of color images, C is used_R、C_GAnd C_BRespectively representing red, green and blue channels, and taking the minimum value in the three channels as the gray value output, namely:

I(x,y)＝min{C_R(x,y),C_G(x,y),C_B(x,y)} (1)

in the formula, I represents an image after the graying process, x (x is 1, 2., H) and y (y is 1, 2., W) represent spatial position information of a pixel point in the image, and H and W are the height and width of the text image, respectively.

(2) De-noising

Considering that the text image is possibly polluted by noise in the acquisition and digitization processes, and the noise has obvious influence on the post-processing process, the median filtering technology is adopted to carry out denoising processing on the gray level image I. The template used for median filtering is a 3 x 3 square. The denoised image is denoted by G.

(3) Contrast enhancement

In the process of scanning the text image, due to the scanning device, the contrast of the text image may be low, or the illumination may be uneven, and if the process is not performed, the binarization process may be affected. The invention adopts contrast limited adaptive histogram equalization technology (CLAHE) to carry out contrast enhancement processing on the filtering image G, and the processing result is represented by E.

(4) Binarization processing

The method provided by the invention only needs to acquire binary information of the text, so that the Otsu method is used for calculating the global threshold corresponding to E, and TH is used₁And (4) showing. Using TH₁And (3) carrying out binarization processing on the E, wherein a processing result is represented by B, and the specific method comprises the following steps:

and B, representing the text point by the point set with the value of 1, and representing the background point by the point set with the value of 0.

2. Text region selection

For most text images, the text is generally located in a region near the center of the image. The proposed method divides the text image into 36 regions that do not overlap and are equal in area, as shown in fig. 2. The method selects 16 regions near the center of the image as candidate text regions, and uses T_iAnd (i ═ 1, 2.., 16), as indicated by the dark regions in fig. 2.

According to the text binary image B, calculating the number of text points in each candidate text region by using a formula (3), performing normalization processing, and using R_iThat means, that is:

wherein (s, T) ∈ T_iRepresents R_iThe coordinates of each pixel point in (1).

For a certain T_iIf TH is satisfied₂<R_i<TH₃Then the region is selected as the text region for feature extraction. By H_k(K1, 2.., K) denotes a set of text regions used for feature extraction, and K is the total number of text regions.

3. Stroke feature extraction

By observing the structural characteristics of the strokes of 5000 Chinese characters commonly used, the cross sum of human shapes is found

The appearance times of the shape transverse-folding and cross-folding two stroke characteristics in the forward character are obviously higher than that in the inverted character. Therefore, the proposed method uses these two stroke features for inversion discrimination. FIG. 3 shows a "human" stroke detection template (called template 1) for describing the structure of the left-falling and right-falling components of Chinese characters. FIG. 4 is

And the stroke detection template (called template 2) of the shape is used for describing the cross-folding strokes in the Chinese characters. In fig. 4 and 5, "O" represents 1, i.e., a text point, and "x" represents 0, i.e., a background point; the dark areas indicate the location of the reference points.

The stroke characteristics used for inversion discrimination are calculated using the following algorithm.

The stroke feature extraction algorithm comprises the following steps:

1) for a certain text region H_kAnd thinning the text points by using a morphological thinning technology to finally obtain character skeleton structure information with single pixel width, and expressing a thinning result by using S.

2) Using the template 1 and the template 2 shown in fig. 3 and 4, respectively, S is subjected to template matching. The specific process is as follows: selecting H_kAligning a certain skeleton point with a reference point position of a template 1 (or a template 2), and then carrying out 'same or' operation on each point at corresponding positions in two areas with the size of 5 multiplied by 5, namely outputting 1 when the two points are simultaneously 1 or simultaneously 0; otherwise, the output is 0. Calculate H_kThe accumulated value of the 'same or' operation result of each internal skeleton point is normalized, and the calculation result is expressed as U_k(j) Wherein j represents H_kNumber of skeleton points in (1). For H_kThe j-th skeleton point, if U is satisfied_k(j)>TH₃Then mark the point as a type 1 feature point (or type 2 feature point)

3) For each H_kUsing step 2 to judge whether the point is two kinds of characteristic points, accumulating the number of the 1 st type characteristic points (or 2 nd type characteristic points) of the text image, and adding the two kinds of characteristic pointsTotal number as two characteristic values, using N₁And N₂And (4) showing.

4) Rotating the image S by 180 DEG, the processing result being denoted by S ', repeating steps 2 and 3 for S', using M₁And M₂And (4) representing the total number of the 1 st type characteristic points and the 2 nd type characteristic points corresponding to the S'.

4. Text image inversion detection

According to the characteristic quantity N₁、N₂、M₁And M₂Calculating a composite eigenvalue F using equation (4), i.e.

Judging whether the text image is inverted or not according to the value size of the F, namely, judging whether the text image is inverted or not

Matlab2016b under a Windows 1064-bit education system is adopted as an experimental simulation platform, and a hardware platform is an Intel i5-4590CPU and a 16G memory.

120 text scanning images acquired by a high-resolution scanner are selected as a test set, wherein 58 text images are inverted, and 62 text images are forward. The document image types comprise various types such as books, academic journals, academic papers, PPT documents and the like. The typesetting mode comprises a single column mode and a double column mode, and the image content not only has common characters, but also comprises formulas, images, tables and the like. In acquiring the experimental image, the paper used for scanning is not required to be completely set, allowing some tilt to occur. In addition, in order to verify the universality of the method, the selected text contains common fonts in Chinese characters, and the method comprises the following steps: four characters, namely a Song style, an imitation Song style, a regular script and a black body, and the size of the characters is from two to six.

The text used for scanning was of standard a4 size (210mm × 297mm), the scanning accuracy was 300dpi, and the resolution of the scanned image was 2480 × 3508. The algorithm simulation was performed using Matlab, and the average time to process an image was about 330 ms. If the C language implementation algorithm with higher execution efficiency is replaced, the processing speed is higher, and the requirement of real-time processing can be met.

The proposed method uses four thresholds: TH₁～TH₄Wherein TH is₁Automatically obtained by Otsu method according to the processed image; threshold TH₂And TH₃The method is used for distinguishing the text area from the non-text area, and in the text area, the text points occupy the area and the ratio of all pixel points in the area is in a certain range. If the ratio is too small, it may be a formula or a table area, and if the ratio is too large, it may be an image area. The extracted algorithm takes TH₂＝0.2，TH₃0.4; threshold TH₄Can be between 0.6 and 0.8, TH₄The larger the value of (A), the more accurate the stroke structure is detected, but the more easily the stroke structure is interfered by noise, and the TH is extracted by the extracted algorithm₄0.7. Experimental results show that whether all 120 text images are inverted or not can be judged correctly by adopting the method. Fig. 5 shows a part of the experimental image and the corresponding F-value. According to the experimental result, the inversion condition of the text image can be rapidly and effectively judged by adopting the method.

Claims

1. a Chinese text image inversion discrimination method based on Chinese character stroke feature, comprises the following steps:

(1) Judging the type of the input scanned text image, if it is a grayscale image, it will remain unchanged; if it is a color image, it will be converted into a grayscale image, and I will represent the grayscale image;

(2) performing denoising, contrast enhancement and binarization processing on the grayscale image I, and the obtained binarization processing result is represented by B;

(3) Divide B into 36 sub-regions of equal size, and select 16 sub-regions close to the center position as candidate text regions, represented by T _i , i=1,2,...,16;

(4) For each candidate text region, the number of corresponding Chinese text points is calculated according to the following formula and normalized to obtain R _i representing the characteristics of each candidate text sub-region:

In the formula, (s, t)∈T _i represents the coordinates of each pixel in Ri _, B(s, t) represents the result of binarizing the pixel (s, t), and M×N is the grayscale image The total number of pixels of I;

(5) Two thresholds TH ₂ and TH ₃ less than 1 are preset. For a certain T _i , if TH ₂ <R _i < TH ₃ is satisfied, the corresponding candidate text area is selected as the text area used for feature extraction, Use H _k , k=1,2,...,K to represent the text area set used for feature extraction, and K is the total number of text areas;

(6) select two templates, template 1 is used to describe the "person" shape stroke detection template 2 of the left-handed and right-handed composition structure in Chinese characters, and 2 is used to describe the horizontal folding strokes in Chinese characters

Shaped stroke detection template;

(7) Using the stroke feature extraction algorithm, calculate the feature values corresponding to the text area: N ₁ , N ₂ , M ₁ and M ₂ , and the stroke feature extraction algorithm is as follows:

1) For a certain text area H _k used for feature extraction, use morphological thinning technology to thin the text points, and finally obtain the text skeleton structure information with a width of one pixel, and use S to represent the thinning result;

2) Use template 1 and template 2 to perform template matching on S respectively. For template 1, the process is: for a certain skeleton point in H _k , align it with the reference point position of template 1, and then match the skeleton point in the neighborhood. Perform the "exclusive-OR" operation with each point at the corresponding position of template 1; calculate the accumulated value of the "exclusive-OR" operation result of each skeleton point in H _k , and perform normalization processing. The calculation result is represented by U _k (j), where j represents the serial number of the skeleton point in H _k . For the j-th skeleton point in H _k , if U _k (j)>TH ₃ is satisfied, the point is marked as the first type feature point; for template 2, according to the similar The process of marking type 2 feature points;

3) For all the skeleton points in each H _k , use step 2) to judge whether the point is a two-class feature point, and accumulate the first-class feature points and the second-class feature points of the text image respectively. The total number of feature points are taken as two feature values respectively, represented by N ₁ and N ₂ ;

4) Rotate the refinement result S by 180°, the processing result is represented by S', repeat steps 2) and 3) for S', and use M ₁ and M ₂ to represent the first type feature points and the second type corresponding to S' the total number of feature points;

(8) Text image inversion detection:

1) Calculate the composite eigenvalue F according to the feature quantities N ₁ , N ₂ , M ₁ and M ₂ , that is, there is

2) Preset a threshold value TH ₄ greater than 0.6, and according to the value of F, determine whether the text image is inverted, that is, there is

2 . The method according to claim 1 , wherein TH ₂ =0.2 and TH ₃ =0.4; the threshold TH ₄ is between 0.6 and 0.8. 3 .

3. method according to claim 1, is characterized in that, the step of step (2) is as follows:

1) The grayscale image I is denoised by the median filtering technique with a template of 3×3, and the processing result is represented by G;

2) The CLAHE technology is used to perform contrast enhancement processing on the filtering result image G, and the processing result is represented by E;

3) Use the Otsu method to calculate the global threshold TH ₁ of E, and use the global threshold TH ₁ to binarize E, and the processing result is represented by B.