Disclosure of Invention
In order to solve the defects in the prior art, the invention aims to provide an irregular arc-shaped distortion method for text images facing OCR character recognition, which automatically generates arc-shaped distortion sample images suitable for OCR arc-shaped distortion character recognition.
In order to achieve the purpose of the invention, the technical scheme adopted by the invention is as follows:
An irregular arc-shaped distortion method of a text image facing OCR text recognition comprises the following steps:
step one, acquiring an original text image and an original text box annotation file;
step two, calculating the maximum arc distortion height value of the original text image;
Partitioning the image according to the width of the original image, and generating different arc-shaped distortion height values and the number of arc-shaped waves in different areas;
calculating offset distances of image pixels of different areas in the Y-axis direction according to the arc distortion height values and the number of arc waves of the different areas to obtain new coordinates of the image pixels;
and fifthly, calculating the original text box annotation coordinates according to the arc distortion height values and the arc wave numbers of different areas, and generating a new multi-point annotation text box and an annotation file.
Further, in the first step, an original text image and an original text box annotation file are obtained, including:
reading the width Weight and the Height of the original text image;
the original text box annotation file is read, wherein the file name comprises annotation point coordinates [ (x 1, y 1), (x 2, y 2), (x 3, y 3), (x 4, y 4) ], namely [ text box upper left corner annotation point coordinates, text box upper right corner annotation point coordinates, text box lower left corner annotation point coordinates ], and text content is annotated.
Further, in the second step, calculating a maximum arc distortion height value of the original text image includes:
(2.1) calculating the height values of all text boxes in the annotation file, and taking the minimum height value;
calculating the upper boundary distance values of all the text boxes in the annotation file from the original image, and taking the minimum distance value;
(2.3) taking the minimum value of the three calculated values as the maximum height value max_radius_value of the irregular arc distortion of the image.
Further, in step (2.1), the coordinates of the labeling point of each text box are read, the minimum circumscribed rectangle of the text box is obtained, the height value of the text box is obtained through the minimum circumscribed rectangle, and the height value of each text box in the labeling file is calculated.
Further, in step (2.2), the coordinates of the labeling point of each text box are read, wherein the values y1 and y2 represent the upper boundary distance between the text box and the original text image, the values Height-y3 and Height-y4 represent the lower boundary distance between the text box and the original text image, and the upper and lower boundary distances between each text box in the labeling file and the original text image are calculated.
Further, in the third step, different areas generate different arc distortion height values and arc wave numbers, including:
The different regions randomly generate arc twist height values, and a value random_radius_value is randomly generated between the 1-max_radius_value values as the arc twist height value.
The number of the arc waves is randomly generated in different areas, and the recommended value of the number of the arc waves is between 1 and 5.
Further, in the fourth step, an offset value of the image pixel point in the Y-axis direction is calculated:
Y’=random_radian_value*sin((X_n/Width)*2*3.14*num)(1)
Wherein, range_radius_value represents the arc distortion height value of the image of the region, num represents the number of arc waves, width represents the Width value of the image of the region, and X_n represents the coordinate of the pixel point of the image on the X axis;
the new coordinates of the pixel point are (X _ n, Y _ n + Y'), where Y _ n represents the coordinates of the image pixel point in the Y-axis.
Further, in the fifth step, recalculating the original text box annotation coordinates to generate a new multi-point annotation text box, including:
Dividing the original four-point annotation into multiple point annotation points according to the number of characters of the annotation text content, calculating new coordinates of the divided annotation points, namely multiple point annotation coordinate points, according to the arc distortion height value and the number of arc waves, generating a new text box according to the generated irregular arc distortion image and the generated new coordinates, and generating an annotation file by the new coordinates and the original annotation text content.
Further, the original four-point label is divided into multiple-point labels by adopting a single-line segment point dividing method, which comprises the following steps:
Connecting every two adjacent marking points into a line segment to generate four directional line segments, wherein the four directional line segments are respectively an upper left marking point- > -an upper right marking point, an upper right marking point- > -a lower right marking point, a lower right marking point- > -a lower left marking point and an upper left marking point- >;
taking a line segment formed by the upper left marking point-the upper right marking point and the lower right marking point-the lower left marking point as a horizontal line segment, dividing the line segment into multiple points, generating multiple marking points, and calculating an x value;
Taking a line segment formed by the upper right marking point, the lower right marking point and the lower left marking point as vertical line segments, dividing the line segment into multiple points, generating multiple marking points, and calculating y values;
and combining the calculated x and y values in a one-to-one correspondence manner to generate a multipoint labeling point.
Compared with the prior art, the text image irregular arc distortion method for OCR character recognition has the advantages that the irregular arc distortion of the text image and the recalculation of the corresponding text box are achieved, the text image with the irregular arc distortion and the corresponding annotation file are generated, and a large number of training samples are provided for an OCR training arc text recognition model.
According to the method, the minimum height value of all the text boxes, the minimum distance value from the text boxes to the upper boundary of the original image and the minimum distance value from the text boxes to the lower boundary of the original image are calculated, and the minimum value in the three values is taken as the maximum height value of the maximum arc distortion of the image, so that the arc distortion height of each image can be adjusted in a self-adaptive mode, and the problem that the image content cannot be identified due to overlarge arc distortion degree of the image is avoided.
According to the invention, the image is divided into different areas according to the image width, and the different areas are subjected to arc distortion according to different arc distortion height values, so that a large number of different irregular arc distortion images are produced.
According to the method, the coordinate points of the original image are segmented into multi-point labeling coordinates according to the arc distortion height and the number of characters labeling the text content, and a multi-point labeling frame conforming to the arc distortion text is generated after subsequent recalculation, so that the characters are perfectly attached.
Detailed Description
The technical scheme of the application is further described below with reference to the accompanying drawings and examples. The following examples are only for more clearly illustrating the technical aspects of the present application, and are not intended to limit the scope of the present application.
As shown in fig. 1, the irregular arc distortion method for text images facing OCR word recognition according to the present invention includes the following steps:
Step one, acquiring an original text image and a corresponding labeling file of an original text box;
the width (Weight) and Height (Height) of the original text image are read in preparation for subsequent segmentation of the text image and calculation of new coordinates of the image pixels.
And reading an original text box annotation file, and obtaining all annotation data, wherein the annotation data comprises file names, and the annotation point coordinates [ (x 1, y 1), (x 2, y 2), (x 3, y 3), (x 4, y 4) ], namely [ text box upper left corner annotation point coordinates, text box upper right corner annotation point coordinates, text box lower left corner annotation point coordinates ], and annotation text contents.
Step two, calculating the maximum arc distortion height value of the original text image, as shown in fig. 2;
(2.1) calculating the height values of all text boxes in the annotation file, and taking the minimum height value;
And D, obtaining the minimum circumscribed rectangle of each text box through the coordinates of the marking point of each text box read in the step I, obtaining the height value of the text box through the minimum circumscribed rectangle, and calculating the height value of each text box in the marking file, and taking the minimum value.
The minimum value is taken because the text content in the original text image is inconsistent in size, for example, the partial text size is one number, the partial text size is seven number, when the arc distortion height is too large, the effect on the large text is small, but the effect on the small text is large, even the small text is distorted and cannot be recognized, and the text content is destroyed, as shown in fig. 6. Therefore, the invention can summarize and view the actual production results, take the minimum text box height value to carry out arc distortion, not only can ensure that large-size fonts generate arc distortion, but also can ensure that small-size fonts do not destroy text contents while generating arc distortion.
Calculating the upper boundary distance values of all the text boxes in the annotation file from the original image, and taking the minimum distance value;
And (3) calculating the upper and lower boundary distances from each text box in the annotation file to the original text image by the annotation point coordinates of each text box read in the step one, wherein the values y1 and y2 represent the upper boundary distance from the text box to the original text image, the values Height-y3 and Height-y4 represent the lower boundary distance from the text box to the original text image, and the minimum value is taken.
This minimum is taken because when the upper or lower boundary is too small, and the arc distortion height is too large, it causes the text to shift excessively beyond the original text image height and disappear, as shown in fig. 7. Therefore, the invention can ensure that the text can be subjected to arc distortion by taking the minimum boundary distance value through summarizing and checking the actual production results, and can also ensure that the text content cannot exceed the height of the original image because of overlarge arc distortion.
(2.3) Taking the minimum value of the three calculated values as the maximum height value of the irregular arc distortion of the image;
the maximum arc distortion height value of each image can be calculated in a good self-adaptive manner by calculating the minimum height value of the text box, the minimum distance value from the text box to the upper boundary of the original image and the minimum distance value from the text box to the lower boundary of the original image and taking the minimum value of the three values as the maximum arc distortion height value max_radius_value of the image, so that the situation that the text content cannot be distinguished due to overlarge distortion of the image after arc distortion and the text content is lost due to overlarge distortion of the image after arc distortion is avoided.
Dividing the image according to the width of the original image, generating an arc distortion height value and the number of arc waves for each area, and generating different arc distortion height values and the number of arc waves for different areas;
for example, the original image is divided into two adjacent areas with the center point [ Weight/2,0] in the width direction of the original image as a demarcation point, and an arc distortion height value is randomly generated for each area. The original image may be divided into a plurality of adjacent areas as desired.
The method of randomly generating the arc-shaped warp height value is to randomly generate a value random_radius_value between 1-max_radius_value values. Since the range_radius_value is different for each block region, the arc distortion height is different for each block region, so that the original image can be deformed with irregular arc distortion.
The number of arc waves is randomly generated, the frequency of arc distortion can be controlled, and the recommended value is 1-5. If the number of the arc waves is too large, the arc distortion frequency of the image is too high, a plurality of arc waves appear, and the image content is destroyed, as shown in fig. 8. Therefore, the method and the device have the advantages that the actual production result summary is used for checking, the value is 1-5, the text content is not destroyed, and the image can be distorted well.
Calculating offset distances of image pixels of the area in the Y-axis direction according to the arc distortion height values and the number of arc waves of different areas, and calculating new coordinates of the image pixels;
The original text image is subjected to irregular arc distortion, only the offset value of each pixel point coordinate in the original image in the Y-axis direction is needed to be calculated, no change occurs in the X-axis direction, and calculation is not needed.
The formula for calculating the offset value of the coordinates of the image pixels in the Y-axis direction is as follows:
Y’=random_radian_value*sin((X_n/Width)*2*3.14*num) (1)
Wherein, range_radius_value represents the arc distortion height value of the image of the region, num represents the number of arc waves, width represents the Width value of the image of the region, and x_n represents the coordinate of the pixel point of the image on the X axis.
Calculating the new Y-axis coordinates of the image pixel points:
Y_n’=Y_n+Y’ (2)
wherein Y_n represents the coordinate of the pixel point on the Y axis, and Y' represents the offset distance value of the pixel point on the Y axis.
The new coordinates of the pixel point are (x_n, y_n'). And simultaneously, generating a blank image with the same width and height as those of the original image, copying the pixel points to the blank image according to the new coordinates when the new coordinates are calculated in the steps, and generating the irregular arc-shaped distorted image after all the pixel points on the original image are copied to the blank image.
And fifthly, dividing the original four-point annotation into multi-point annotation points according to the number of characters of the annotation text content, and recalculating original text box annotation coordinates according to the arc distortion height values and the number of arc waves of different areas to generate a new multi-point annotation text box and an annotation file.
After the irregular arc distortion of the image, the image content changes, the original text box cannot frame the arc distortion text content, and the original text box needs to be recalculated into the multi-point labeling text box, so that the text content is completely framed.
In order to convert the four-point marked text box into the multi-point marked text box, the invention also designs a single-line segment and point method for converting the four-point marked text box into the multi-point marked text box, which comprises the following specific steps:
As shown in fig. 3, the four-point labeling text box is composed of four labeling points, and every two adjacent labeling points are connected into a line segment, namely four directional line segments are generated, wherein the four directional line segments are respectively an upper left labeling point- > -an upper right labeling point, an upper right labeling point- > -a lower right labeling point, a lower right labeling point- > -a lower left labeling point and an upper left labeling point- >. And dividing the original text box into a multi-point labeling box according to the number of characters of the labeling text content.
And taking a line segment formed by the upper left marking point, the upper right marking point and the lower right marking point as horizontal line segments, dividing the line segment into multiple points, and generating multiple marking points. The calculation method comprises the following steps:
x=[x for x in np.linspace(x0,x1,math.ceil(len(word)+1))] (3)
wherein X0 and X1 respectively represent coordinates of two endpoints of a horizontal line segment on an X axis; len (word) +1 denotes the number of marked text characters plus 1, np.linspace (X0, X1, match.ceil (len (word) +1) denotes the uniform division of the line segment between the coordinates X0, X1 into len (word) +1 points.
And taking a line segment formed by the upper right marking point, the lower right marking point and the lower left marking point as vertical line segments, dividing the line segment into multiple points, and generating multiple marking points. The calculation method comprises the following steps:
y=[y for y in np.linspace(y0,y1,math.ceil(len(word)+1))] (4)
Wherein Y0, Y1 respectively represent coordinates of two endpoints of a vertical line segment on a Y axis, len (word) +1 represents the number of marked text characters added with 1, np.
And combining the calculated x and y values in a one-to-one correspondence manner to generate a multi-point labeling point, as shown in fig. 4.
And calculating new coordinates of the divided marking points according to the arc distortion height value and the number of the arc waves, namely putting all coordinate points into the formula (1) and the formula (2) for calculation to obtain corresponding new coordinates, namely the multi-point marking coordinate points. And (3) generating a new text box according to the irregular arc distorted image generated in the step (4) and the generated new coordinates, wherein a final effect diagram is shown in fig. 5, and generating a labeling file by the new coordinates and the original labeling text content.
While the applicant has described and illustrated the embodiments of the present invention in detail with reference to the drawings, it should be understood by those skilled in the art that the above embodiments are only preferred embodiments of the present invention, and the detailed description is only for the purpose of helping the reader to better understand the spirit of the present invention, and not to limit the scope of the present invention, but any improvements or modifications based on the spirit of the present invention should fall within the scope of the present invention.