CN114943973B

CN114943973B - Text correction method, device, computer equipment and storage medium

Info

Publication number: CN114943973B
Application number: CN202110182043.6A
Authority: CN
Inventors: 李德健
Original assignee: Guangzhou Shiyuan Electronics Thecnology Co Ltd
Current assignee: Guangzhou Shiyuan Electronics Thecnology Co Ltd
Priority date: 2021-02-09
Filing date: 2021-02-09
Publication date: 2024-10-18
Anticipated expiration: 2041-02-09
Also published as: CN114943973A

Abstract

The invention discloses a text correction method, a text correction device, computer equipment and a storage medium. The method comprises the following steps: acquiring a text line image to be corrected and a text line outline image, wherein the text line outline image is a binarization image corresponding to the text line image to be corrected; determining at least one segmentation quantity value, respectively segmenting the text line outline image according to each segmentation quantity value, and determining a corresponding control point set of the text line outline image after each segmentation operation; and respectively performing perspective transformation on the text image to be corrected according to each control point set to obtain each corresponding corrected text image. The method can solve the problem that the identification algorithm cannot effectively identify the characters because the text lines in the text line image which is scratched out of the image to be identified are bent in the prior art, and effectively corrects the scratched bent text line image so as to ensure that an accurate character identification result is obtained after the corrected image is input into the identification algorithm.

Description

Text correction method, device, computer equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of image processing, in particular to a text correction method, a text correction device, computer equipment and a storage medium.

Background

In recent years, optical character recognition (optical character recognition, or OCR) technology has been applied to various industries such as identification card recognition, invoice recognition, and the like. A typical OCR system typically contains two modules, text detection and text recognition, where text detection is one of the core modules, the main purpose of which is to take a picture of each text line from an input picture.

The main stream method of text detection at present is to obtain text line pictures by a deep learning algorithm, send the pictures to be detected into a convolutional neural network, then predict the scores of text areas to obtain the scores of the characters belonging to each pixel point, binarize the scores to obtain a text mask, take the outline of the text mask as the text outline and scratch out the text area small graph according to the circumscribed rectangle.

However, under the condition that the characters in the picture to be detected are bent, the characters in the small picture of the scratched character area are bent, and the false recognition is easily caused by directly inputting the detection result into the recognition algorithm.

Disclosure of Invention

In view of the above, embodiments of the present invention provide a text correction method, apparatus, computer device, and storage medium, so as to solve the problem in the prior art that a recognition algorithm cannot effectively recognize characters due to bending of text lines in a text line image that is scratched in an image to be recognized.

In a first aspect, an embodiment of the present invention provides a text correction method, including:

acquiring a text line image to be corrected and a text line outline image, wherein the text line outline image is a binarization image corresponding to the text line image to be corrected;

determining at least one segmentation quantity value, respectively segmenting the text line outline image according to each segmentation quantity value, and determining a control point set corresponding to the text line outline image after each segmentation operation;

And respectively performing perspective transformation on the text image to be corrected according to each control point set to obtain each corresponding corrected text image.

In a second aspect, an embodiment of the present invention further provides a text correction apparatus, including:

The system comprises an acquisition module, a correction module and a correction module, wherein the acquisition module is used for acquiring a text line image to be corrected and a text line outline image, and the text line outline image is a binarization image corresponding to the text line image to be corrected;

The determining module is used for determining at least one segmentation quantity value, respectively segmenting the text line outline image according to each segmentation quantity value and determining a corresponding control point set of the text line outline image after each segmentation operation;

and the correction module is used for respectively performing perspective transformation on the text image to be corrected according to each control point set to obtain corresponding corrected text images.

In a third aspect, an embodiment of the present invention further provides a computer device, where the computer device

Comprising the following steps: one or more processors;

a memory for storing one or more programs;

The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the text correction method as provided in the first aspect.

In a fourth aspect, embodiments of the present invention also provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the text correction method as provided in the first aspect.

The text correction method, the device, the computer equipment and the storage medium provided by the above, wherein the method comprises the following steps: acquiring a text line image to be corrected and a text line outline image, wherein the text line outline image is a binarization image corresponding to the text line image to be corrected; determining at least one segmentation quantity value, respectively segmenting the text line outline image according to each segmentation quantity value, and determining a corresponding control point set of the text line outline image after each segmentation operation; and respectively performing perspective transformation on the text image to be corrected according to each control point set to obtain each corresponding corrected text image. In the text correction method, the text line profile is divided according to the division quantity values to obtain the control point sets, perspective transformation is performed on the text images to be corrected according to the control point sets to obtain the corrected text images, and compared with the prior art, the corrected text images obtained in the embodiment are non-curved text images, and more accurate character recognition effects can be obtained by performing character recognition on the basis of the corrected text images, so that the accuracy of picture character recognition is improved.

Drawings

Fig. 1 is a flowchart of a text correction method according to an embodiment of the invention;

fig. 2 is a flow chart of a text correction method according to a second embodiment of the present invention;

Fig. 3 is a schematic flow chart of an example of a text correction method according to a second embodiment of the present invention;

Fig. 4 is a schematic flow chart of determining a control point set in a text correction method according to a second embodiment of the present invention;

fig. 5 is a schematic flow chart of determining a corrected text image in a text correction method according to a second embodiment of the present invention;

fig. 6 is a flowchart of a text correction method according to a second embodiment of the present invention for obtaining a text recognition result;

Fig. 7 is a flowchart illustrating a text line image to be corrected according to a text correction method according to a second embodiment of the present invention;

fig. 8 is a schematic structural diagram of a text correction apparatus according to a third embodiment of the present invention;

Fig. 9 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.

Example 1

Fig. 1 is a schematic flow chart of a text correction method according to an embodiment of the present invention, where the method is applicable to correcting curved text lines in a text picture, and the method may be performed by a text correction device, where the text correction device may be implemented by software and/or hardware, and the text correction device may be integrated on a computer device.

As shown in fig. 1, a text correction method provided in a first embodiment of the present invention includes the following steps:

step 101: and acquiring a text line image to be corrected and a text line outline image.

The text line outline image is a corresponding binarized image in the text line image to be corrected.

In this embodiment, the text line image to be corrected may be obtained from an original image, where the original image may be any image including text lines, and the text lines may be a line of characters composed of characters or symbols.

The text line outline image in this embodiment is a binary image corresponding to the corrected text line image, and the text line outline image may be further understood as a binary image cut out on the binary image of the original image according to a minimum bounding rectangle, where the minimum bounding rectangle may be the minimum bounding rectangle corresponding to the text line in the original image.

Specifically, the manner of acquiring the text line image to be corrected may be: inputting the original image into a preset neural network model to obtain a text score map of the original image; binarizing the text score graph to obtain a mask of text lines, and calculating to obtain the outlines of the text lines in the original image through the mask; the minimum circumscribed rectangle of the text line can be obtained according to the coordinates of the outline of the text line; and correspondingly cutting out a text line image from the original image according to the minimum circumscribed rectangle, wherein the text line image is the text line image to be corrected. The text score map may include a probability score that each pixel point in the original image is a text line.

Step 102, determining at least one segmentation quantity value, respectively segmenting the text line outline image according to each segmentation quantity value, and determining a corresponding control point set of the text line outline image after each segmentation operation.

The segmentation number value is understood to be the number of segmentation lines determined on the text line contour image, and the text line contour image can be segmented into segmentation number values plus 1 sub-image area after the text line contour image is segmented according to the segmentation number value. In this step, after determining a segmentation number value, the text line contour image may be further segmented according to the segmentation number value so as to obtain a corresponding control point set later.

In this embodiment, after the text line contour image is obtained from the text line image to be corrected, at least one segmentation number value may be determined in the text line contour image, where the segmentation number value may be determined according to the contour angle of the text line contour. After determining one segmentation number value, the rest segmentation number values can be obtained based on the segmentation number value, and for example, after determining one segmentation number value according to the contour angle of the text line contour, the segmentation number value can be added by 1 to obtain a second segmentation number value, and the segmentation number value is subtracted by 1 to obtain a third segmentation number value. The number of the obtained division number values is not particularly limited here, and may be selected according to actual conditions.

For example, the segmentation of the text line contour image according to each segmentation number value may be: the text line contour image is segmented according to a first segmentation number value, then segmented according to a second segmentation number value, and finally segmented according to a third segmentation number value.

Specifically, the manner of segmenting the text line contour image according to one segmentation number value of the segmentation number values may be: the segmentation number value dividing lines are obtained by segmentation on the long frame of the text line outline image, and the text line outline image can be segmented into +1 segments of segmentation number value.

In this embodiment, the control point set may be a set formed by each control point on the text line contour image, and the control point set may be obtained after the text line contour image is segmented. It will be appreciated that each set of control points may include a set of control points corresponding to each segmentation operation, and illustratively each set of control points may include a set of control points corresponding to the text line outline image after the first segmentation, a set of control points corresponding to the text line outline image after the second segmentation, and a set of control points corresponding to the text line outline image after the third segmentation.

Specifically, determining the corresponding control point set of the text line contour image after the first segmentation operation may include: determining intersection points of the parting lines and the text line profile on the text line profile image, determining the height between the two intersection points on each parting line, taking the average value of all the heights as the height of the text line, and determining a control point set according to the intersection points on each parting line and the height of the text line.

And 103, respectively performing perspective transformation on the text images to be corrected according to the control point sets to obtain corresponding corrected text images.

In this embodiment, perspective transformation can be understood as mapping characters in text lines on one picture onto another blank image according to a mapping relationship. The corrected text image may be an image obtained after the image to be corrected is corrected, and the text lines in the corrected text image have a smaller degree of curvature or no curvature than the degree of curvature of the text lines in the image to be corrected.

In this step, a corresponding number of control point sets may be obtained according to the number of the division number values, and a corresponding number of corrected text images may be obtained by perspective transformation according to each control point set, it being understood that the number of corrected text images may be determined by the number of the division number values.

Further, a description will be given of a corrected text image obtained by performing perspective transformation based on a control point set. Specifically, finding corresponding points on the text line image to be corrected according to the quantity information and the distribution characteristics of each control point in the control point set, adding a blank image with the same size as the text line image to be corrected, and uniformly setting the control points on the blank image according to the quantity information and the distribution characteristics of the corresponding points; finally, according to the corresponding points on the text line image to be corrected and the control points on the blank image, the characters on the text line image to be corrected can be accurately mapped onto the blank image through perspective transformation, so that the corrected text image is obtained.

In this embodiment, the principle that the text on the text line image to be corrected can be mapped onto the blank image through perspective transformation to obtain the corrected text image is that: any adjacent four corresponding points in the corresponding points on the text line image to be corrected, which can form a rectangular frame, can determine a word, and map the word into the rectangular frame formed by four corresponding control points on the blank image according to the four corresponding points, so that the word on the text line image to be corrected is mapped onto the blank image to obtain a corrected text image. Because the distribution of the control points on the blank image is more uniform than the distribution of the corresponding points on the text line image to be corrected, the character bending degree obtained after the text line image to be corrected is mapped to the blank image can be greatly improved. Therefore, the text correction method can effectively avoid the interference of the bending of the text lines in the text line image to be corrected on the text recognition.

Optionally, after step 103, a text correction method provided in this embodiment may further include the following steps: inputting each corrected text image into a text recognition model to obtain recognition results of characters in text lines of each corrected text image, determining the average recognition rate of the text lines in each corrected text image according to the recognition results, and taking the corrected text image corresponding to the maximum average recognition rate as a final segmentation result.

The above process can be further understood that by the above method, a corrected text image with the best segmentation effect can be determined from the corrected text images, and the character recognition accuracy obtained after the corrected text image is input into the text recognition model is higher. Therefore, according to the method corresponding to the corrected text picture, the text image to be corrected can be effectively corrected, and the bending degree of text lines in the corrected text image can be greatly improved, so that after the corrected text image is input into the text recognition model, the text recognition model can accurately recognize the text lines in the corrected text picture.

According to the technical scheme provided by the first embodiment, firstly, a text line image to be corrected and a text line outline image are obtained; then determining at least one segmentation quantity value, respectively segmenting the text line outline image according to each segmentation quantity value, and determining a corresponding control point set of the text line outline image after each segmentation operation; and finally, respectively performing perspective transformation on the text image to be corrected according to each control point set to obtain each corresponding corrected text image. In the text correction method, the text line profile is divided according to the division quantity values to obtain the control point sets, perspective transformation is performed on the text images to be corrected according to the control point sets to obtain the corrected text images, and compared with the prior art, the corrected text images obtained in the embodiment are non-curved text images, and more accurate character recognition effects can be obtained by performing character recognition on the basis of the corrected text images, so that the accuracy of picture character recognition is improved.

Example two

Fig. 2 is a flow chart of a text correction method according to a second embodiment of the present invention, where the second embodiment is optimized based on the above embodiments.

As shown in fig. 2, a text correction method provided in a second embodiment of the present invention includes the following steps:

Step 201, acquiring a text line image to be corrected and a text line outline image.

The text line outline image is a binarization image corresponding to the text line image to be corrected.

In this embodiment, fig. 3 is a schematic flow chart of an example of a text correction method provided in the second embodiment of the present invention, and in order to better understand the execution of the method provided in the second embodiment, a description is further given of the implementation process of fig. 3 in the form of effect presentation, specifically, the process of performing text correction is recorded in detail in steps a to j included in fig. 3. The white area part in the step a is the text line outline.

Step 202, determining at least one segmentation quantity value, for each segmentation quantity value, vertically segmenting along the long frame of the text line outline image to obtain segmentation line segments formed by intersecting the segmentation quantity value with the text line outline, and determining the line segment length of each segmentation line segment.

In this embodiment, determining the at least one segmentation number value may calculate a contour angle of the text line contour according to coordinates of a text line contour point in the text line contour image, and further determine the segmentation number value according to the magnitude of the contour angle. The profile point coordinates may be coordinate values corresponding to each profile point.

Optionally, determining the at least one segmentation number value may include: determining the contour angle of the text line contour by a least square method according to the contour point coordinate information of the text line contour in the text line contour image; searching a preset data association table, and determining a corresponding reference value of the contour angle; the reference value, the reference value plus 1, and the reference value minus1 are respectively recorded as the division number values.

It should be noted that, the calculation of the contour angle by the least square method is not described in detail herein for the prior art.

The text line outline may be a peripheral outline of a text line in the text line outline image, and if one text line is inclined, the text line outline has a certain inclination angle. The preset data association table may be a preset association table, and the relationship between the profile angle and the reference value may be recorded in the data association table, that is, one profile angle corresponds to one reference value; the reference value may be a division value used when the picture correction is performed for the first time to obtain the first corrected picture.

Specifically, after the contour angle is determined, a preset data association table can be searched, a corresponding reference value can be determined according to the contour angle in the table, and then each segmentation value can be determined according to the reference value. The method of determining each of the division number values from the reference number may be to use the reference number as one division number value, add 1 to the reference number as one division number value on the basis of the reference number, and subtract 1 from the reference number as one division number value. For example, if the reference value is N, N-1, N, N +1 may be used as three division value.

Further, dividing the text line contour image according to each of the division number values may include: and for each segmentation quantity value, vertically segmenting along the long frame of the text line outline image to obtain segmentation line segments formed by intersecting the segmentation quantity values with the text line outline, and determining the line segment length of each segmentation line segment.

The segmentation number value line segments can be obtained by vertically segmenting along the upper and lower long frames on the text line outline image, and the intersecting part of each line segment and the text line outline is taken as a segmentation line segment. The text line contour image is segmented for each segmentation number value, that is, a plurality of segmented text line contour images can be obtained after segmentation.

An example is illustrated for a segmentation number value N: referring to steps b to d in fig. 3, as shown in step b, N segmentation positions are uniformly taken on the text line profile; then, as shown in step c, traversing the pixel points in the horizontal direction of N segmentation positions on the text line profile graph, and finding out the intersection point of the segmentation line and the text line profile, wherein the line segment between the two intersection points is the segmentation line segment; and then, as shown in the step d, taking the distance between two intersection points on each segment line as the segment length of the segmented segment to obtain N segment lengths.

And 203, determining a corresponding control point set of the text line contour image after the segmentation operation according to the endpoint coordinate information of each segmentation line segment and the corresponding line segment length.

In this embodiment, the endpoint coordinate information of each segment may include endpoint coordinate information of all segments obtained after three segments.

Fig. 4 is a flowchart illustrating a method for determining a control point set in a text correction method according to a second embodiment of the present invention. Here, the control point set corresponding to the text line contour image after the first segmentation operation is determined according to the endpoint coordinate information of the segmented line segment obtained after the first segmentation and the corresponding line segment length, as shown in fig. 4, step 203 may include the following steps:

step 2031, determining an average value of the lengths of the segments as the text height, and determining the midpoint of each segment according to the endpoint coordinate information of each segment.

In this step, the average value of the line segment lengths can be calculated after each line segment length is obtained, and the average value is used as the height of the characters in the text line. The midpoint of each segment can be further calculated according to the coordinate information of the upper and lower endpoints in each segment.

Taking the calculation of the midpoint of a segment as an example, the average value of the abscissa of the upper endpoint coordinate and the abscissa of the lower endpoint coordinate of the segment is calculated as the abscissa of the midpoint of the segment, the average value of the ordinate of the upper endpoint coordinate and the ordinate of the lower endpoint coordinate of the segment is calculated as the ordinate of the midpoint of the segment, and the position information of the midpoint can be determined according to the abscissa of the midpoint. The midpoint of each segment may be calculated according to the above procedure.

By way of example, as shown in step e of fig. 3, the average value of the lengths of the respective line segments can be calculated as the text height by a formula.

Step 2032, sequentially connecting the midpoints and extending the line segments corresponding to the first midpoint and the last midpoint respectively until intersecting the text line contour to form a first number of midpoint line segments.

Where the first number is the number of partitions plus 1, illustratively, if the current number of partitions is N, the first number is n+1.

In this step, each midpoint is sequentially connected in turn to obtain a transverse line segment, the first midpoint on the transverse line segment extends leftwards until intersecting the text line contour, and the last midpoint on the transverse line segment extends rightwards until intersecting the text line contour, so that a plurality of midpoint connecting line segments can be determined on the formed transverse line segment.

Exemplary, as shown in step f in fig. 3, the midpoints of two intersecting points on the segmentation line are taken, the midpoints are connected and extend in the left-right direction to intersect with the text line outline to obtain intersecting points of two ends of the text line, the two intersecting points and all the midpoints are taken as end points, and a line segment between every two end points is a midpoint line segment.

In step 2033, the starting points of the midpoint connecting segments are used as target points, and each vertical line perpendicular to the midpoint connecting segment where the corresponding target point is located and passing through each target point is determined.

In this step, the leftmost starting point of each midpoint line segment is taken as the target point, and a vertical line perpendicular to the midpoint line segment where the target point is located is made by the target point, so that the segmentation number value plus 2 vertical lines can be obtained.

Illustratively, as shown in step g in fig. 3, vertical lines are made at each end point, and 10 vertical lines in the figure are vertical lines.

Step 2034, determining, for each vertical line, a coordinate point pair having a distance value from the target point included in the vertical line half the height of the text.

In this step, on each vertical line, a coordinate point pair is formed by taking two coordinate points at the upper and lower ends of the target point with the target point as the center on the vertical line, and a plurality of coordinate point pairs can be obtained. Wherein, the distance between two coordinate points included in one coordinate point pair and the target point is half of the height of the characters.

Illustratively, as shown in step h of FIG. 3, a coordinate point pair is taken on each vertical line, each coordinate point in the coordinate point pair being at a distance h/2 from the target point.

And 2035, respectively marking each coordinate point pair as a control point pair to form a control point set corresponding to the text line outline image after the segmentation operation.

In this step, the coordinate point pairs obtained according to the above steps are recorded as control point pairs, and all the control point pairs on the text line contour image form a corresponding control point set after the segmentation operation.

For example, if the text line contour image is subjected to three segmentation operations according to different segmentation quantity values, three control point sets can be obtained, and a plurality of control point pairs can be obtained in each segmentation operation.

Step 204, for each control point set, sequentially associating each control point pair included in the control point set to the text image to be corrected.

In this step, it is necessary to sequentially associate each control point pair included in the three control point sets obtained after the three dividing operations to the text image to be corrected, respectively. Taking a control point set as an example, all control points are associated to the text image to be corrected according to the coordinates of all control points included in the control point set.

The image on the left side in step i in fig. 3 is an image of a text to be corrected, and all points on the image are points obtained by associating a control point set to the image of the text to be corrected.

Step 205, generating a blank image with the same height as the text image to be corrected, uniformly adding a second number of reference points on two long frames of the blank image, and determining two reference points with the same abscissa as a reference point pair.

The second number is the same as the number of control points contained in the set of control points.

In this embodiment, this step is performed for all three text images to be corrected having different sets of control points obtained in step 204.

In this step, a description is given of a text image to be corrected having a control point set, firstly, a blank image with the same size as the text image to be corrected is generated, and then, reference points with the same number as the control points in the control point set can be uniformly selected on two long frames of the blank image, as shown in the right image in step i in fig. 3.

Wherein two reference points having the same abscissa can be determined as one reference point pair, 10 reference point pairs can be obtained as shown in step i in fig. 3.

And 206, determining a corrected text image corresponding to the control point set according to each control point pair and the reference point pair.

Similarly, the corrected text image obtained in this step may include three corrected text images obtained by performing perspective transformation three times respectively after the segmentation operation according to the three different segmentation number values.

Fig. 5 is a flowchart illustrating a text image correction determining process according to a second embodiment of the present invention. The following description will take an example of obtaining a corrected text image. As shown in fig. 5, step 206 may include the steps of:

Step 2061, sequentially adopting two adjacent control point pairs in the control point set to form a source perspective transformation point group.

In this step, two adjacent control point pairs are sequentially acquired in the text line contour image to form a source perspective transformation point group, for example, 9 source perspective transformation point groups are included in the left image in step i in fig. 3.

Step 2062, for each source perspective transformation point group, adopting two adjacent reference point pairs in the same sequence on the blank image to form a corresponding target perspective transformation point group, and determining the corresponding source text of the source perspective transformation point group on the image to be corrected.

In this step, a plurality of target perspective transformation point groups are formed from adjacent reference point pairs on the blank image according to each source perspective transformation point group correspondence. The right image in step i of fig. 3 includes 9 target perspective transformation point groups.

The determining that the source text corresponding to the source perspective transformation point set on the text image to be corrected may be understood that one source perspective transformation point set may form a rectangular frame, where text may be corresponding to the rectangular frame, as shown in step i of fig. 3, two adjacent control point pairs on the leftmost side in the left image in step i, that is, text corresponding to a rectangle formed by one source perspective transformation point set is "me".

According to the mode, all characters of the Chinese character lines in the text image to be corrected can be corresponding through the source perspective transformation point group.

Step 2063, mapping the source text to the blank image by perspective transformation performed on the source perspective transformation point set and the target perspective transformation point set.

In the step, the source text in the text line of the text image to be corrected corresponding to each source perspective transformation point group is mapped into a rectangular frame formed by each target perspective transformation point group on the blank image.

Step 2064, determining a blank image containing each source text in the image to be corrected as a corrected text image corresponding to the control point set.

In this step, the corrected text image may be an image obtained by mapping the source text on the blank image. It should be noted that each control point set may correspond to one corrected text image. As shown in step j of fig. 3, step j is a corrected text image obtained from a set of control points.

Step 207, selecting a target text image from the corrected text images, and obtaining a text recognition result corresponding to the target text image.

In this embodiment, three corrected text images may be obtained by executing the forehead according to the above steps, and further, it is necessary to determine, from among the three corrected text images, one corrected image whose text recognition result is most accurate as the target text image.

Fig. 6 is a flowchart illustrating a text recognition result obtained in a text correction method according to a second embodiment of the present invention. As shown in fig. 6, step 207 may include the steps of:

Step 2071, for each corrected text image, inputting the corrected text image into a preset text recognition model to obtain the text contained in the corrected text image and the corresponding prediction probability.

In this step, the preset text recognition model may be any preset model with a text recognition function, for example, the text recognition model may be a full convolutional neural network (Fully Convolutional Network, FCN), a recognition network CRNN, or the like.

Three corrected text images are input into a preset text recognition model, and all the characters included in each corrected text image and the prediction probability of each character can be output.

Step 2072, counting the number of characters contained in the text, and determining the average value of the prediction probabilities of the corrected text image according to the prediction probabilities of the characters and the number of characters.

In the step, the number of characters included in the three corrected text images is counted respectively, and the character prediction probability corresponding to the three corrected text images is calculated. Wherein the text may comprise characters.

For example, taking the case of calculating the text prediction probability corresponding to a corrected text image as an example, if the text line of the corrected text image includes 4 text, the prediction probability of each text is 0.3, 0.7, 0.9 and 0.5, respectively, (0.3+0.7+0.8+0.6)/4=0.6 may be calculated, and 0.6 may be taken as the average of the prediction probabilities of the corrected text image.

Step 2073, using the corrected text image corresponding to the maximum prediction probability average value as the target text image, and using the text contained in the target text image as the text recognition result.

In the step, the maximum prediction probability average value can be determined according to the prediction probability average values corresponding to the three correction images, and the correction text image corresponding to the maximum prediction probability average value is the target text image.

The following describes the above steps 201 to 207 with a specific example. As shown in fig. 3, it mainly comprises the following steps:

and a step a, cutting out a text line outline image from the binarized image according to the minimum circumscribed rectangle of the text line in the text line image to be corrected.

Step b, calculating the contour angle of the text line contour by using a least square method according to the coordinates of each contour point in the text line contour image; n segmentation positions are uniformly taken in the text line contour image.

And c, traversing pixel points in the horizontal direction of the segmentation position on the text line contour image, and finding out the segmentation line, namely the intersection point of the segmentation line segment and the text line contour.

And d, taking the distance between two intersection points on each segment line to obtain N distances, namely the length of the line segment.

And e, taking the average value of the N distances as the height h of the characters.

F, taking the midpoints of two intersection points on each segment line, namely the midpoints of the segment line, connecting the midpoints to obtain a central skeleton of the text line, namely connecting the midpoints in sequence, and connecting line segments corresponding to the first midpoint and the last midpoint respectively; and extending the central skeleton to the left and right sides to obtain intersection points of the central skeleton and the left and right boundaries of the text line outline image, wherein the intersection points are used as intersection points of two ends of the text line, the obtained midpoint and the two intersection points are used as end points on the central skeleton, and the connecting line between the end points, namely the midpoint connecting line section, forms an end point connecting line.

And g, making vertical lines on all the endpoints of the central skeleton, namely taking the initial endpoint of each midpoint connecting line segment as a target point, and respectively determining all the vertical lines which are perpendicular to the midpoint connecting line segment where the corresponding target point is located and pass through each target point.

And h, taking two control points, namely coordinate point pairs, on each vertical line, wherein the distances between the two control points and the end points above and below the end points are h/2.

And i, determining corresponding control points on the text line image to be corrected, namely sequentially associating all control point pairs included in the control point set to the text line image to be corrected, generating a blank image with the same size as the text line image to be corrected, uniformly taking points on the blank image, wherein the number of points is the same as that of the control points, and uniformly adding a second number of reference points serving as the corresponding points of the control points.

And j, performing perspective transformation for n+1 times by using a set of perspective transformed control points, namely a source perspective transformed point set, as every four adjacent control points, and mapping the text line image to be corrected onto a blank image to obtain a corrected image.

Step k, setting N as N-1, and re-executing the steps 3 to 11 to obtain a new corrected text image; re-executing steps 3 to 11 with N set to n+1 can obtain another Zhang Xin corrected text images, i.e. three corrected text images can be obtained in total; three corrected text images are input into a text recognition model to obtain three recognition results.

And finally, calculating the prediction probability average value of all characters in each corrected text image, and taking the corrected text image corresponding to the maximum prediction probability average value as a final segmentation result.

Further, the text correction method may further include the following: inputting the original image as input data into a preset neural network model to obtain a text score graph of the original image; and determining the text line image to be corrected and the text line outline image of the text line image to be corrected in the original image through binarization processing and matting processing of the text score map.

The preset neural network model can be a convolutional neural network, and the convolutional neural network can comprise a full convolutional network, a U-net and other networks; the text score map may include a probability score for each pixel point in the original image to lose weight text line, and the text score map may be identical to the original image in size; the original image may be an image including text lines without any processing.

The above-described process of determining the text line image to be corrected is described below in a specific example. Fig. 7 is a flowchart illustrating a process of obtaining a text line image to be corrected according to a second embodiment of the present invention, where, as shown in fig. 7, a process of obtaining the text line image to be corrected may include the following steps:

and step 1, inputting the original image into a convolutional neural network to obtain a text score graph.

And 2, performing binarization processing on the text score map, setting a threshold value according to an application scene, determining pixels with score values larger than the threshold value in the original image as text, and determining pixels with score smaller than the threshold value as background, so that a binarized image with the same size as the original image can be obtained.

And step 3, acquiring text lines from the binarized image, connecting adjacent points of the upper part, the lower part, the left part, the right part, the upper left part, the lower left part, the upper right part and the lower right part of the text lines into a region to obtain connected regions of the text lines, and extracting coordinates of edges of the connected regions to obtain outline coordinates of the text lines.

And 4, determining the minimum circumscribed rectangle of the text line according to the outline coordinates of the text line.

And 5, cutting out a text line image to be corrected from the original image according to the coordinate of the minimum circumscribed rectangle.

And 6, cutting out a text line outline image on the binarized image according to the coordinate of the minimum circumscribed rectangle.

The text correction method provided by the second embodiment of the invention is further refined on the basis of the scheme provided by the first embodiment, and the text recognition result is further obtained after the corresponding corrected text images are obtained. According to the method, the text line image to be corrected is segmented, the control point set is determined, the characters in the text line image to be corrected can be accurately determined, the characters in the text line image to be corrected are subjected to perspective transformation according to the source perspective transformation point set and the target perspective transformation point set and mapped onto the blank image, the bending degree of the text line in the corrected text image is greatly improved, the corrected text image is input into the text recognition model, more accurate character recognition rate can be obtained, and the accuracy of character recognition is greatly improved.

Example III

Fig. 8 is a schematic structural diagram of a text correction apparatus according to a third embodiment of the present invention, where the apparatus is applicable to correcting curved text lines in a text picture, and the text correction apparatus may be implemented by software and/or hardware and is generally integrated on a computer device.

As shown in fig. 8, the apparatus includes the following modules: an acquisition module 81, a determination module 82 and a correction module 83.

An acquiring module 81, configured to acquire a text line image to be corrected and a text line contour image.

Optionally, the obtaining module 81 is specifically configured to input the original image as input data to a preset neural network model, and obtain a text score map of the original image; and determining the text line image to be corrected and the text line outline image of the text line image to be corrected in the original image through binarization processing and matting processing of the text score map.

The determining module 82 is configured to determine at least one segmentation number value, segment the text line contour image according to each segmentation number value, and determine a set of control points corresponding to the text line contour image after each segmentation operation.

Optionally, the determining module 82 includes a first determining module for determining at least one segmentation number value. The first determining module is specifically used for determining the contour angle of the text line contour through a least square method according to the contour point coordinate information of the text line contour in the text line contour image; searching a preset data association table, and determining a corresponding reference value of the profile angle; the reference value, the reference value plus 1, and the reference value minus 1 are respectively recorded as the division number values.

Optionally, the determining module 82 includes a first determining unit, configured to determine a set of control points corresponding to the text line contour image after each segmentation operation.

The first determining unit is specifically configured to: for each segmentation quantity value, vertically segmenting along a long frame of the text line outline image to obtain segmentation line segments formed by intersecting the segmentation quantity values with the text line outline, and determining the line segment length of each segmentation line segment; and determining a control point set corresponding to the text line contour image after the segmentation operation according to the endpoint coordinate information of each segmentation line segment and the corresponding line segment length.

Further, the first determining unit further includes a first subunit, configured to determine, according to the endpoint coordinate information of each segmented line segment and the corresponding line segment length, a control point set corresponding to the text line contour image after the segmentation operation.

The first subunit is specifically configured to: determining the average value of the lengths of all the line segments as the height of the characters, and determining the middle point of each divided line segment according to the endpoint coordinate information of each divided line segment; sequentially connecting the midpoints, and respectively extending the connecting line segments corresponding to the first midpoint and the last midpoint until the connecting line segments intersect with the text line profile to form a first number of midpoint connecting line segments, wherein the first number is the dividing number value plus 1; taking the initial end point of each midpoint connecting line segment as a target point, and respectively determining each vertical line which passes through each target point and is perpendicular to the midpoint connecting line segment where the corresponding target point is located; for each vertical line, determining a coordinate point pair with a distance value of half of the height of the characters from a target point contained in the vertical line; and marking each coordinate point pair as a control point pair respectively to form a control point set corresponding to the text line outline image after the segmentation operation.

The correction module 83 is configured to obtain each corresponding corrected text image by performing perspective transformation on the text image to be corrected according to each control point set.

Optionally, the correction module 83 is specifically configured to: for each control point set, sequentially associating each control point pair included in the control point set to the text image to be corrected; generating a blank image with the same height as the text image to be corrected, uniformly adding a second number of reference points on two long frames of the blank image, determining two reference points with the same abscissa as a reference point pair, wherein the second number is the same as the number of control points contained in a control point set; and determining a corrected text image corresponding to the control point set according to each control point pair and the reference point pair.

Optionally, the correction module 83 includes a correction unit, configured to determine, according to each control point pair and the reference point pair, a corrected text image corresponding to the control point set.

The correction unit is specifically used for: sequentially adopting two adjacent control point pairs in the control point set to form a source perspective transformation point group; for each source perspective transformation point group, adopting two adjacent reference point pairs in the same sequence on a blank image to form a corresponding target perspective transformation point group, and determining the corresponding source characters of the source perspective transformation point groups on the image to be corrected; mapping the source text to the blank image through perspective transformation performed on the source perspective transformation point set and the target perspective transformation point set; and determining a blank image containing each source text in the image to be corrected as a corrected text image corresponding to the control point set.

Optionally, the text correction apparatus further includes a screening module, configured to screen the target text image from each corrected text image, and obtain a text recognition result corresponding to the target text image.

Specifically, the screening module is specifically configured to: inputting the corrected text images into a preset text recognition model aiming at each corrected text image to obtain characters contained in the corrected text images and corresponding prediction probabilities; counting the number of characters contained in the text image, and determining a prediction probability average value of the corrected text image according to the prediction probability of each character and the number of the characters; and taking the corrected text image corresponding to the maximum prediction probability average value as a target text image, and taking the characters contained in the target text image as a character recognition result.

The text correction device can execute the text correction method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

Example IV

Fig. 9 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention. As shown in fig. 9, a computer device according to a fourth embodiment of the present invention includes: one or more processors 91 and memory 92; the number of processors 91 in the computer device may be one or more, one processor 91 being taken as an example in fig. 9; the processor 91 and the memory 92 of the computer device may be connected by a bus or otherwise, in fig. 9 by way of example.

The memory 92 is a computer readable storage medium, and may be used to store one or more programs, such as a software program, a computer executable program, and modules, corresponding to the text correction method provided in the embodiment of the present invention (for example, the modules in the text correction apparatus include the acquisition module 81, the determination module 82, and the correction module 83). The processor 91 executes various functional applications of the computer device and data processing by running the software programs, instructions and modules stored in the storage 92, i.e., implements the text correction method in the above-described method embodiment.

Memory 92 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of the computer device, etc. In addition, memory 92 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some examples, memory 92 may further include memory remotely located relative to processor 91, which may be connected to the device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

And, when one or more programs included in the above-mentioned computer device are executed by the one or more processors 91, the programs perform the following operations:

Example five

A fifth embodiment of the present invention provides a computer-readable storage medium having stored thereon a computer program for executing a text correction method when executed by a processor, the method comprising:

Of course, the storage medium containing the computer executable instructions provided in the embodiments of the present invention is not limited to the method operations described above, and may also perform the related operations in the text correction method provided in any embodiment of the present invention.

From the above description of embodiments, it will be clear to a person skilled in the art that the present invention may be implemented by means of software and necessary general purpose hardware, but of course also by means of hardware, although in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a FLASH Memory (FLASH), a hard disk, or an optical disk of a computer, etc., and include several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments of the present invention.

It should be noted that, in the above embodiment of the text correction apparatus, each unit and module included are only divided according to the functional logic, but not limited to the above division, so long as the corresponding functions can be implemented; in addition, the specific names of the functional units are also only for distinguishing from each other, and are not used to limit the protection scope of the present invention.

Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims

1. A text correction method, comprising:

according to each control point set, performing perspective transformation on the text line images to be corrected respectively to obtain corresponding corrected text images;

the determining at least one segmentation number value comprises:

Determining the contour angle of the text line contour by a least square method according to the contour point coordinate information of the text line contour in the text line contour image;

Searching a preset data association table, and determining a corresponding reference value of the profile angle;

the reference value, the reference value plus 1, and the reference value minus 1 are respectively recorded as division number values.

2. The method according to claim 1, wherein the dividing the text line contour image according to the dividing quantity values respectively determines a corresponding control point set of the text line contour image after each dividing operation, including;

For each segmentation quantity value, vertically segmenting along a long frame of the text line outline image to obtain segmentation line segments formed by intersecting the segmentation quantity value and the text line outline, and determining the line segment length of each segmentation line segment;

and determining a corresponding control point set of the text line contour image after the segmentation operation according to the endpoint coordinate information of each segmentation line segment and the corresponding line segment length.

3. The method according to claim 2, wherein determining the set of control points corresponding to the text line contour image after the segmentation operation according to the endpoint coordinate information of each segment and the corresponding segment length comprises:

Determining the average value of the lengths of the line segments as the height of the characters, and determining the midpoint of each of the divided line segments according to the endpoint coordinate information of each of the divided line segments;

Sequentially connecting the midpoints, and respectively extending connecting line segments corresponding to the first midpoint and the last midpoint until the connecting line segments intersect with the text line profile to form a first number of midpoint connecting line segments, wherein the first number is the segmentation number value plus 1;

Taking the initial end point of each midpoint connecting line segment as a target point, and respectively determining each vertical line which is perpendicular to the midpoint connecting line segment where the corresponding target point is located and passes through each target point;

for each vertical line, determining a coordinate point pair with a distance value from a target point contained in the vertical line being half of the height of the character;

and respectively marking each coordinate point pair as a control point pair to form a control point set corresponding to the text line outline image after the segmentation operation.

4. The method according to claim 1, wherein said obtaining each corresponding corrected text image from each set of control points by performing perspective transformation on each of the text line images to be corrected, respectively, comprises:

For each control point set, sequentially associating each control point pair included in the control point set to the text line image to be corrected;

generating a blank image with the same height as the text line image to be corrected, uniformly adding a second number of reference points on two long frames of the blank image respectively, and determining two reference points with the same abscissa as a reference point pair, wherein the second number is the same as the number of control points contained in the control point set;

And determining the corrected text image corresponding to the control point set according to each control point pair and each reference point pair.

5. The method of claim 4, wherein determining a corrected text image corresponding to the set of control points from each of the pairs of control points and the pair of reference points comprises:

sequentially adopting two adjacent control point pairs in the control point set to form a source perspective transformation point group;

for each source perspective transformation point group, adopting two adjacent reference point pairs in the same sequence on the blank image to form a corresponding target perspective transformation point group, and determining the source characters corresponding to the source perspective transformation point groups on the text line image to be corrected;

mapping the source text onto the blank image through perspective transformation performed on the source perspective transformation point set and the target perspective transformation point set;

and determining blank images containing all the source characters in the text line images to be corrected as corrected text images corresponding to the control point set.

6. The method according to claim 1, wherein after obtaining corresponding corrected text images by performing perspective transformation on the text line images to be corrected according to the control point sets, respectively, the method further comprises:

And screening target text images from the corrected text images, and obtaining a character recognition result corresponding to the target text images.

7. The method of claim 6, wherein the selecting the target text image from each corrected text image and obtaining the text recognition result corresponding to the target text image comprises:

Inputting the corrected text images into a preset text recognition model aiming at each corrected text image to obtain characters and corresponding prediction probabilities contained in the corrected text images;

counting the number of characters contained in the corrected text image, and determining a prediction probability average value of the corrected text image according to the prediction probability of each character and the number of the characters;

And taking the corrected text image corresponding to the maximum prediction probability average value as a target text image, and taking the characters contained in the target text image as a character recognition result.

8. The method of any one of claims 1-7, further comprising:

Inputting an original image as input data into a preset neural network model to obtain a text score graph of the original image;

and determining a text line image to be corrected and a text line outline image of the text line image to be corrected in the original image through binarization processing and matting processing of the text score map.

9. A text correction apparatus, comprising:

The correction module is used for respectively performing perspective transformation on the text line images to be corrected according to the control point sets to obtain corresponding corrected text images;

the determining module includes a first determining module for determining at least one segmentation number value; the first determining module is specifically configured to determine a contour angle of the text line contour by using a least square method according to contour point coordinate information of the text line contour in the text line contour image; searching a preset data association table, and determining a corresponding reference value of the profile angle; the reference value, the reference value plus 1, and the reference value minus 1 are respectively recorded as division number values.

10. A computer device, the computer device comprising:

One or more processors;

a memory for storing one or more programs;

the one or more programs are executed by the one or more processors to cause the one or more processors to implement the text correction method of any of claims 1 to 8.

11. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when executed by a processor, implements the text correction method according to any one of claims 1 to 8.