Detailed Description
Exemplary embodiments of the present disclosure will be described hereinafter with reference to the accompanying drawings. In the interest of clarity and conciseness, not all features of an actual implementation are described in this specification. It will of course be appreciated that in the development of any such actual embodiment, numerous implementation-specific decisions must be made in order to achieve the developer's specific goals, such as compliance with system-and business-related constraints, and that these constraints will vary from one implementation to another. Moreover, it will be appreciated that such a development effort might be complex and time-consuming, but would nevertheless be a routine undertaking for those of ordinary skill in the art having the benefit of this disclosure.
It is also noted herein that, in order to avoid obscuring the disclosure with unnecessary details, only the device structures and/or processing steps closely related to the solution according to the present disclosure are shown in the drawings, while other details not greatly related to the present disclosure are omitted.
As described above, the existing text recognition method is not satisfactory in terms of word positioning performance.
In order to overcome the defects of the prior art, the invention provides a novel single word positioning algorithm based on a CRNN and an over-segmentation algorithm. First, a single line text line of a handwriting is identified using CRNN and the position of each character is located. This positioning result is relatively coarse and only provides a rough character range. Meanwhile, the image is subjected to over-segmentation by using an over-segmentation algorithm, namely the image is segmented into strokes. Then, the end position of each character given by the CRNN is appropriately corrected by utilizing the soft-max score generated in the CRNN recognition process, so that the positioning accuracy of the CRNN is improved. Next, each recognized character is initialized and then the core strokes of each character in the text line are found and marked so that each character in the text line is matched to at least one stroke as accurately as possible. Finally, based on these core strokes and some prior knowledge, each unlabeled stroke is labeled with a suitable algorithm to label each stroke to each character identified by the CRNN, thereby obtaining a stroke corresponding to each character, i.e., a distribution area of each character.
In the following, first, in connection with fig. 1 and 2a to 2c, it is described how the end position of each character in the recognized text line, which is obtained using CRNN and the over-segmentation algorithm, is corrected before marking the core strokes.
FIG. 1 illustrates a flowchart of a method of correcting the end position of each character in a recognized text line prior to marking a core stroke, according to one embodiment.
First, in step 101, a line of image text is identified using the CRNN algorithm, and the recognized characters in the line of text are stroke-segmented using the over-segmentation algorithm.
Specifically, in the present embodiment, as shown in fig. 2a and 2b, the image text line is first identified using the CRNN algorithm to obtain the content of the text line, the possible candidates of each character and their corresponding confidence, and the end timestamp position of each character. Fig. 2a shows the top ten candidates predicted in each of the 12 th to 14 th time stamps and their corresponding confidence values. The vertical line in fig. 2b shows the end timestamp position given by the CRNN for each character, i.e. the position with the highest probability among the adjacent timestamps. For example, in fig. 2b, the probability of the character "mountain" at the 12 th time stamp is highest, and thus the end position of the character "mountain" is the 12 th time stamp. As can be seen from fig. 2b, the CRNN may provide a coarse and less trusted end position for each character.
Next, each character identified by the CRNN is segmented into discrete strokes using an over-segmentation algorithm, the result of which is shown in the block in fig. 2 b.
As can be seen from fig. 2b, the word ending position provided by CRNN is not accurate. Preferably, a fine correction may be made in advance to the ending position provided by the CRNN to correct some relatively obvious and simple errors.
Next, in step 102, a character in the text line identified by the CRNN is selected as the current character. And proceeds to step 103 to determine if the current character is the same as its next character.
In the case where the current character is different from the next character thereof (i.e., "no" in step 103), the process proceeds to step 1031. In step 1031, a next timestamp after the end position of the current character is found, and for the recognition result of the timestamp, it is determined whether the current character exists in the previous P candidates and whether the corresponding confidence is greater than a threshold TH2.
If so, then proceed to step 1033 where the next timestamp is repartitioned into the ending location of the current character. Then, the process proceeds to step 1036. In step 1036, it is determined whether the next timestamp of the new end position is the end position of the next character (identified by the CRNN). If so, the end position of the current character has been corrected and proceeds to step 104. If not, go back to step 1031 and continue searching for the ending location of the current character.
If the result of step 1031 is no, then proceed to step 1035 without correcting the ending position of the current character. Then, the process proceeds to step 104.
In step 104, it is determined whether the current character is the last character of the characters to be corrected. If so, the process ends and the end positions of all characters to be corrected are corrected one time. If not, proceed to step 105. In step 105, the next character in the characters to be corrected is selected as the new current character and returns to step 103 again, and iterative judgment is performed according to the same logic until the end positions of all the characters to be corrected are corrected.
Preferably, p=2 is taken and th2=0.01. However, it should be understood that the values of P and TH2 are not limited thereto, but may be set as needed.
For example, as shown in FIG. 2a, the 12 th timestamp has the highest confidence of the character "mountain", and thus the timestamp is also the ending location of the character "mountain" identified by the CRNN. However, as can be seen from fig. 2b, this end position is clearly inaccurate, since a part of the strokes of the character "mountain" is not included. In practice, the 13 th timestamp still belongs to the character "mountain", while the 14 th timestamp does not.
In order to correct the end position of the character "mountain", in step 103, each time stamp is searched for in turn back from its end time stamp 12, based on the result of the CRNN recognition (as shown in fig. 2 a). As a result of searching, it was found that the "mountain" character exists in the 2 nd candidate in the 13 th timestamp, and the confidence of the candidate is 0.02, which is higher than the threshold value 0.01, the timestamp 13 is considered to still belong to the character "mountain", and further, it was found that in the timestamp 14, the confidence of the "mountain" character is lower than the threshold value 0.01, and therefore, the timestamp 13, instead of the timestamp 14, is marked as the end position of the "mountain" character, i.e., the end position of the "mountain" character is extended backward to the 13 th timestamp, and the corrected end position is shown in fig. 2 c.
In the case where the current character is the same as the next character (i.e., "yes" in step 103), the process proceeds to step 1032. In step 1032, for each timestamp between the two identical character end positions, it is determined whether the current character is present in all of the first Q candidates for each timestamp. If so, proceed to step 104 to determine if the current character is the last character in the characters to be corrected. If so, the process ends and the end positions of all characters to be corrected are corrected one time. If not, proceed to step 105. In step 105, the next character in the characters to be corrected is selected as the new current character and returns to step 103 again, and iterative judgment is performed according to the same logic until the end positions of all the characters to be corrected are corrected.
It should be noted that in order to prevent the wrong confusion of the correction method by the superimposed words, it is necessary to detect the text recognized by the CRNN in advance. If two consecutive identical characters occur at a certain position, then it is necessary to perform confusion detection, step 1032, and if the character is always present in the first Q candidates within all time stamps between the end positions of the two characters, the information is considered to be confused, and the correction method cannot accurately correct it, so that the end position of the first character (i.e., the current character) in the stack is not corrected.
Preferably, q=5, P and Q are positive integers greater than 1 and P is less than Q. However, it should be understood that the values of P and Q are not limited thereto, but may be set as needed.
By the correction method, the end position of the character in the text line identified by the CRNN can be corrected to the correct end position, so that the positioning performance of the CRNN is improved.
A method 300 for locating each character in an identified line of text according to an embodiment of the invention is described below in conjunction with fig. 3-7 e.
As shown in fig. 3, first, in step S1, each character in the text line is marked with a core stroke, the mark indicating which character in the text line the stroke belongs to.
Preferably, in this embodiment, the text line is a text line identified by CRNN. However, it should be understood that the present invention is not limited thereto, but that the text lines may also be text lines identified by any other existing recognition engine that have not undergone or have undergone positional correction.
In order to establish a correspondence between each overdragged stroke and the character identified by the CRNN, at least one core stroke needs to be first found and marked for each character identified by the CRNN. By core stroke is meant the stroke most likely belonging to a character among all strokes of that character. Since the way the core strokes are selected will seriously affect the performance of the subsequent steps, only those strokes most likely belonging to the character are selected as core strokes in step S1.
Based on the ending location of each character given by the CRNN, an approximate distribution range for each character, referred to herein as the recognition range for that character, may be obtained. While this range is often judged less accurately in the edge region, it is often judged more accurately in some core ranges (e.g., the middle right region). Further, it is obviously believed that the higher the overlap area between a stroke and the recognition range of a character, the more likely that this stroke is to belong to the character.
FIG. 4 illustrates a flowchart of marking a core stroke (i.e., step S1) according to one embodiment.
In step 401, a stroke whose stroke range contains or is contained in the recognition range of a character in a text line and overlaps with the core range of the character is marked as a core stroke of the character.
Specifically, in the present embodiment, if a stroke overlaps the core region of a character and the stroke is completely contained within the CRNN identification range of the character or, conversely, the distribution range of the stroke completely covers the CRNN identification range of the character, the stroke is marked as the core stroke of the character.
Preferably, if the CRNN recognition range of one character is set to 0 to 1 in the lateral direction, a section of 0.4 to 0.8 in the recognition range of the character is generally regarded as the core region of the character. It should be understood that this interval value is only one example, and the present invention is not limited thereto.
Next, at step 402, it is determined whether there are characters in the text line that have not yet marked the core strokes. If so, in step 403, the unlabeled strokes that overlap the core range of the character are labeled as core strokes of the character. If not, proceed to step S2, which will be described in detail below.
Specifically, in this embodiment, after step 401 is performed, if there are still characters in the text line that do not contain core strokes, then the constraint needs to be relaxed slightly, and searching is performed among the remaining unlabeled strokes, so that a highest-probability stroke is found for these characters as the core stroke. Accordingly, in step 403, the unlabeled strokes that overlap the core range of the character are labeled as core strokes of the character.
Next, in step 404, it is determined whether there are characters in the text line that have not yet marked the core strokes. If so, then in step 405, the unlabeled strokes having the greatest overlap ratio with the recognition range of the character are labeled as the core strokes of the character. If not, proceed to step S2, which will be described in detail below.
It should be noted that after step 405 is performed, if there is still a character in the text line with an unlabeled core stroke, then the CRNN is considered to have a more serious recognition or positioning error for that character, and therefore the character is deleted and not positioned.
It should also be noted that in step 403, if a stroke is marked as a core stroke of multiple characters at the same time, that stroke is not marked as a core stroke of any character. Similarly, in steps 401 and 405, if a stroke is marked as the core stroke of multiple characters at the same time, the stroke is marked as the core stroke of the leftmost character.
FIG. 7a shows an example of a text line labeled with core strokes by the method shown in FIG. 4, where dark and light colors represent core strokes corresponding to different characters, and gray colors represent unlabeled strokes.
Returning to FIG. 3, after marking the core strokes for each character in the text line, step S2 continues. In step S2, an unlabeled sticky stroke that is sticky to a labeled stroke and an unlabeled isolated stroke are labeled based on the labeled strokes, wherein the isolated stroke refers to the only one unlabeled stroke between two labeled strokes.
Specifically, in the present embodiment, after each character is marked with a core stroke as much as possible, other unmarked strokes need to be marked based on these marked strokes. Strokes with the same label may be considered as a whole, i.e. a collection of labeled strokes of a category.
FIG. 5 illustrates a flowchart of marking sticky strokes and isolated strokes, according to an embodiment.
As shown in FIG. 5, in step 501, if an unlabeled stroke is sticky with a labeled stroke, the unlabeled stroke is merged into the collection of labeled strokes.
It should be noted that in some special cases, some strokes may stick to more than two marked stroke sets. At this time, since the classification cannot be determined, it is not classified and fused temporarily.
FIG. 7b illustrates an example of a sticky stroke classification. As shown in FIG. 7b, all the stuck strokes are categorized into the recognition character to which they belong.
Next, in step 502, the unlabeled isolated stroke is merged to the labeled stroke closest thereto.
Specifically, in this embodiment, an isolated stroke refers to only one unlabeled stroke between two labeled strokes, at which time this unlabeled stroke is referred to as an isolated stroke, and it is apparent that when an isolated stroke exists, the stroke belongs to either the set to which the left-hand stroke belongs or the set to which the right-hand stroke belongs. Thus, isolated strokes may be categorized by the following distance formula:
D=max(s1,c1)-min(s2,c2)+1 (1)
Where s 1,s2 represents the pixel coordinates of the left and right edges of the isolated stroke, respectively, and c 1,c2 represents the pixel coordinates of the left and right edges of the object from which the distance is calculated.
It should be noted that this object of calculating distance may be another marked stroke, or a set of marked strokes, or the recognition range of the character provided by the CRNN.
It should also be noted that when D >1, it indicates that there is no overlap region between the two objects, and when D.ltoreq.1, it indicates that there is an overlap region between the two, and the width of the overlap region is 2-D pixels.
It should be understood that, because of the randomness of writing, when the distance between an isolated stroke and two adjacent strokes is very close, i.e. the absolute value of the difference between the two distances is less than the threshold TH1, the attribution of the isolated stroke cannot be judged merely according to the distance information, and new information needs to be introduced at this time. In this case, the method according to the present embodiment calculates the overlap width of the recognition ranges of the isolated stroke and the two adjacent characters according to the formula (1), and assigns the isolated stroke to the character class having the largest overlap width.
Preferably, the threshold TH1 is 8 pixels. However, it is to be understood that the present invention is not limited thereto, and the value of TH1 may be set as needed.
Finally, in step 503, it is determined whether all sticky strokes and isolated strokes are marked. If so, step S3, which will be described below, is continued. If not, return to step 501.
In particular, in this embodiment, since new sticky or isolated strokes may be generated after sorting sticky strokes and isolated strokes, steps 501 and 502 need to be iterated until no new sticky or isolated strokes are generated.
FIG. 7c illustrates an example of isolated stroke classification. In this example, the stroke "1" located in the middle of the right side of the character "rest" is closer to the character "rest" and is thus classified into the category of the character "rest".
Returning to fig. 3, step S3 is continued. In step S3, the two adjacent strokes of the first N pairs or the first M% pairs that are closest to each other are merged together, wherein the two adjacent strokes are not marked to different characters, and wherein N is an integer greater than or equal to 1 and M is any number between 0 and 100.
Specifically, in this embodiment, after steps S1 and S2, some strokes may remain unclassified, at which time they may be classified only using a greedy strategy.
FIG. 6 illustrates a flow diagram for merging together two adjacent strokes of a first N pairs or a first M% pairs that are closest to each other, according to one embodiment.
First, in step 601, all strokes having the same mark obtained through steps S1 and S2 are merged together and regarded as one stroke.
Next, in step 602, a distance between two adjacent strokes is calculated. Specifically, in the present embodiment, for example, the distances between all adjacent strokes may be calculated according to formula (1).
Next, in step 603, the two adjacent strokes of the first N pairs or the first M% pairs that are closest to each other are merged together, wherein the two adjacent strokes are not marked to different characters, and wherein N is an integer greater than or equal to 1 and M is any number between 0 and 100. Specifically, in this embodiment, all distances calculated in step 602 are sorted by size, the first N or first M% of the smallest distances are selected, and adjacent pairs of strokes corresponding to the distances are merged into one category, i.e., belong to the same character.
It should be noted that if a selected adjacent pair of strokes has been marked to two different CRNN recognition characters, then this distance is skipped and no selection is made. In addition, the strategy when classifying and merging strokes is similar to that of isolated strokes, namely, the strokes are firstly merged onto adjacent strokes which are closer to each other according to the distance between the strokes, and if the strokes are respectively close to the two distances between the left and right strokes, the strokes are classified onto the character with larger overlapping degree.
It should also be noted that in calculating the distance, it is possible to merge several unlabeled strokes with each other, thereby creating a new category that does not exist in the characters identified by the CRNN. Thus, this new class is just an intermediate variable in fusion combining. After multiple iterations in the whole process, the new temporary categories are gradually merged and combined into the recognition character category generated by the CRNN, and the final category only corresponds to the recognition character generated by the CRNN.
FIG. 7d shows one example of merging the previous N pairs or the previous M% pairs of the nearest strokes. In this example, N is chosen to be 1, and the stroke "1" is closest to the stroke set "ke", so they are labeled here as the same category and are combined.
Returning to FIG. 3, finally, in step S4, it is determined whether there are still unlabeled strokes. If so, return to step S2. If not, the method 300 ends.
Specifically, in the present embodiment, it is determined whether there are strokes that have not been marked onto the character identified by the CRNN. If so, go back to step S2. After performing multiple iterative computations, all strokes are classified and the newly generated temporary categories also go to merge and disappear. Thus, all strokes are classified and marked onto the character recognized by the CRNN, in other words, the position information corresponding to all recognized characters is accurately located.
FIG. 7e illustrates an example where all strokes are marked to a recognized character. As shown in fig. 7e, all strokes corresponding to the recognized character are found.
The method 300 according to the present embodiment assigns each stroke to their corresponding recognized characters with high accuracy based on the ending location of the CRNN and the spatial distribution between the strokes, thereby enabling the localization of the stroke level of each character. The localization results provide the possibility to fuse recognition results with each other between a plurality of recognition engines.
By testing the method 300 according to the present embodiment using a plurality of handwritten japanese datasets (AIBU, questionnaire, cogent and PFU), the recognition accuracy as shown in table 1 below is obtained. As can be seen from table 1, the average accuracy of the method 300 is about 95% and thus provides a high degree of accuracy.
Data set (image number) |
AIBU(897) |
Questionnaire(319) |
Cogent(26) |
PFU(25) |
Aggregate (1267) |
Accuracy of |
95.0% |
94.2% |
93.5% |
94.9% |
95.0% |
TABLE 1
The methods discussed above may be implemented entirely by a computer executable program, or may be implemented partially or entirely using hardware and/or firmware. When implemented in hardware and/or firmware, or when a computer-executable program is loaded into a hardware device that can run the program, a device for locating each character in an identified line of text is implemented as will be described below. Hereinafter, an overview of these devices is given without repeating some of the details that have been discussed above, but it should be noted that while these devices may perform the methods described previously, the methods do not necessarily employ or are not necessarily performed by those components of the described devices.
Fig. 8 shows an apparatus 800 for locating each character in a recognized text line, which includes a first marking device 801, a second marking device 802, and a merging device 803, according to one embodiment. The first marking means 801 is used to mark each character in the text line with a core stroke, the mark indicating which character in the text line the stroke belongs to. The second marking means 802 is for marking, based on the marked strokes, unmarked sticky strokes that are sticky to the marked strokes and unmarked isolated strokes, wherein an isolated stroke refers to only one unmarked stroke between two marked strokes. The merging means 803 is used to merge together two adjacent strokes of the first N pairs or the first M% pairs that are closest to each other, wherein the two adjacent strokes are not marked to different characters, and wherein N is an integer greater than or equal to 1 and M is any value between 0 and 100. After processing by the second marking means and the merging means, all strokes are marked to characters in the text line.
The apparatus 800 for locating each character in the identified text line shown in FIG. 8 corresponds to the method 300 for locating each character in the identified text line shown in FIG. 3. Accordingly, relevant details of the various means in the apparatus 800 for locating each character in the identified lines of text have been given in detail in the description of the method 300 for locating each character in the identified lines of text of fig. 3 and are not repeated here.
The individual constituent modules, units in the apparatus described above may be configured by means of software, firmware, hardware or a combination thereof. The specific means or manner in which the configuration may be used is well known to those skilled in the art and will not be described in detail herein. In the case of implementation by software or firmware, a program constituting the software is installed from a storage medium or a network to a computer (for example, a general-purpose computer 900 shown in fig. 9) having a dedicated hardware structure, and the computer can execute various functions and the like when various programs are installed.
Fig. 9 is a block diagram of an exemplary architecture of a general-purpose personal computer in which methods and/or apparatus according to embodiments of the present invention may be implemented. As shown in fig. 9, a Central Processing Unit (CPU) 901 performs various processes according to a program stored in a Read Only Memory (ROM) 902 or a program loaded from a storage section 908 to a Random Access Memory (RAM) 903. In the RAM 903, data required when the CPU901 executes various processes and the like is also stored as needed. The CPU901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output interface 905 is also connected to the bus 904.
Input section 906 (including a keyboard, a mouse, and the like), output section 907 (including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker, and the like), storage section 908 (including a hard disk, and the like), and communication section 909 (including a network interface card such as a LAN card, a modem, and the like) are connected to input/output interface 905. The communication section 909 performs communication processing via a network such as the internet. The drive 910 may also be connected to the input/output interface 905, as needed. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed on the drive 910 as needed, so that a computer program read out therefrom is installed into the storage section 908 as needed.
In the case of implementing the above-described series of processes by software, a program constituting the software is installed from a network such as the internet or a storage medium such as the removable medium 911.
It will be understood by those skilled in the art that such a storage medium is not limited to the removable medium 911 shown in fig. 9, in which the program is stored, which is distributed separately from the device to provide the program to the user. Examples of the removable medium 911 include magnetic disks (including floppy disks (registered trademark)), optical disks (including compact disk read-only memories (CD-ROMs) and Digital Versatile Disks (DVDs)), magneto-optical disks (including Mini Disks (MDs) (registered trademark)), and semiconductor memories. Or the storage medium may be a ROM 902, a hard disk contained in the storage section 908, or the like, in which a program is stored and distributed to users together with a device containing them.
The invention also proposes a corresponding computer program code, a computer program product storing machine-readable instruction code. The instruction codes, when read and executed by a machine, may perform the method 300 described above in accordance with embodiments of the present invention.
Accordingly, a storage medium configured to carry the above-described program product storing machine-readable instruction codes is also included in the disclosure of the present invention. Including but not limited to floppy disks, optical disks, magneto-optical disks, memory cards, memory sticks, and the like.
Through the above description, the embodiments of the present disclosure provide the following technical solutions, but are not limited thereto.
Appendix 1. A method for locating each character in an identified text line, comprising:
step S1, marking each character in the text line with a core stroke, wherein the mark indicates which character in the text line the stroke belongs to;
Step S2, marking an unlabeled sticky stroke and an unlabeled isolated stroke that is sticky to the labeled stroke based on the labeled strokes, wherein the isolated stroke refers to only one unlabeled stroke between two labeled strokes, and
Step S3, merging together two adjacent strokes of the first N pairs or the first M% pairs that are closest to each other, wherein the two adjacent strokes are not marked to different characters, and wherein N is an integer greater than or equal to 1, and M is any number between 0 and 100,
Steps S2 and S3 are repeated until all strokes have been marked to the character in the text line.
Supplementary note 2. The method according to supplementary note 1, wherein the marking core strokes sequentially comprise:
the strokes whose stroke ranges contain or are contained in the recognition range of the character in the text line and overlap the core range of the character are marked as core strokes of the character,
For a character in the text line that has not yet marked a core stroke, marking the unmarked stroke that overlaps the core range of the character as the core stroke of the character, and
For a character in the text line that has not yet marked a core stroke, marking the unmarked stroke having the largest overlapping proportion with the recognition range of the character as the core stroke of the character.
Supplementary note 3. According to the method of supplementary note 2,
Wherein if a stroke overlapping the core range of the character is marked as a core stroke of multiple characters at the same time, the stroke is not marked as a core stroke of any character,
And wherein if a stroke range contains a stroke that is either contained within the recognition range of the character and overlaps the core range of the character or a stroke that has the greatest overlap ratio with the recognition range of the character is marked as a core stroke of a plurality of characters at the same time, marking the stroke as the core stroke of the leftmost character.
Supplementary note 4. The method according to supplementary note 3, wherein, if the identification range is set to 1, the core range is 0.4 to 0.8.
Supplementary note 5. The method of supplementary note 1, wherein marking the unlabeled sticky strokes and unlabeled isolated strokes based on the marked strokes includes:
Merging the unlabeled sticky stroke into the marked stroke if the unlabeled sticky stroke is sticky with the marked stroke, and
Merging said unlabeled isolated stroke to the nearest labeled stroke, and
The above steps are repeated until all the sticky strokes and isolated strokes are marked.
Supplementary note 6. The method according to supplementary note 1, wherein said step S3 further comprises:
Merging all strokes with the same mark obtained in the step S1 and the step S2;
calculating the distance between two adjacent strokes, and
The two adjacent strokes of the first N pairs or the first M% pairs that are closest to each other are merged together.
Supplementary note 7. The method according to supplementary note 5 or 6, wherein the distance is based on a difference between a maximum value among leftmost positions of one stroke and the other stroke among two adjacent strokes and a minimum value among rightmost positions of the one stroke and the other stroke.
Supplementary note 8 the method according to any one of supplementary notes 1 to 6, wherein:
If the absolute value of the difference between the distance of one stroke and its two adjacent strokes is less than a first threshold, marking the stroke to the character whose recognition range overlaps the range of the stroke the greatest, and if the absolute value of the difference between the distance of one stroke and its two adjacent strokes is greater than the first threshold, marking the stroke to the character nearest thereto.
Supplementary note 9. The method according to supplementary note 8, wherein the first threshold is 8 pixels.
Supplementary note 10 the method according to any one of supplementary notes 1 to 6, wherein the identified text line is obtained by a convolutional recurrent neural network and the strokes of the character are obtained by an over-segmentation algorithm.
Supplementary note 11. The method according to supplementary note 10, further comprising correcting an end position of each character in the recognized text line obtained by the convolutional recurrent neural network prior to marking the core stroke.
Supplementary note 12 the method of supplementary note 11, wherein the modifying includes:
in case the character is different from its next character:
If it is determined that the character exists in the first P candidates of the recognition result of the next timestamp after the end position of the character and the corresponding confidence is greater than the second threshold, the next timestamp is re-divided into the end positions of the character,
Iterating the repartitioning step until moving to that timestamp before the ending position of the next character of the character, and
In the case that the character is identical to its next character:
If the character is present in the first Q candidates for each timestamp between the two identical character ending locations, the ending location of the character is not changed, and
If none of the first Q candidates of each timestamp between the two identical character end positions exists, the same processing as in the case where the character is different from its next character is performed,
Wherein P and Q are positive integers greater than 1 and P is less than Q.
Supplementary note 13. The method according to supplementary note 12, wherein the second threshold is 0.01.
Supplementary notes 14. The method according to any of supplementary notes 1 to 6, further comprising fusing text lines identified by different recognition models with the final localization result to obtain an accurate recognition result.
Supplementary note 15. An apparatus for locating each character in an identified line of text, comprising:
first marking means configured to mark a core stroke for each character in the text line, the mark indicating to which character in the text line the stroke belongs;
A second marking device configured to mark, based on marked strokes, unmarked sticky strokes and unmarked isolated strokes that are sticky to the marked strokes, wherein the isolated strokes refer to only one unmarked stroke between two marked strokes, and
Merging means configured to merge together two adjacent strokes of the first N pairs or the first M% pairs that are closest to each other, wherein the two adjacent strokes are not marked to different characters, and wherein N is an integer greater than or equal to 1, and M is any number between 0 and 100,
Wherein all strokes are marked to characters in said text line after processing by said second marking means and said merging means.
Supplementary note 16 the apparatus according to supplementary note 15, wherein the first marking device is further configured to:
the strokes whose stroke ranges contain or are contained in the recognition range of the character in the text line and overlap the core range of the character are marked as core strokes of the character,
For a character in the text line that has not yet marked a core stroke, marking the unmarked stroke that overlaps the core range of the character as the core stroke of the character, and
For a character in the text line that has not yet marked a core stroke, marking the unmarked stroke having the largest overlapping proportion with the recognition range of the character as the core stroke of the character.
Supplementary notes 17 the apparatus according to supplementary notes 15 or 16, wherein the second marking device is further configured to:
Merging the unlabeled sticky stroke into the marked stroke if the unlabeled sticky stroke is sticky with the marked stroke, and
Merging said unlabeled isolated stroke to the nearest labeled stroke, and
The above process is repeated until all sticky strokes and isolated strokes are marked.
Supplementary notes 18 the apparatus of supplementary notes 17, wherein the combining means is further configured to:
merging all strokes with the same mark together;
calculating the distance between two adjacent strokes, and
The two adjacent strokes of the first N pairs or the first M% pairs that are closest to each other are merged together.
Supplementary note 19 the apparatus according to supplementary note 15, further comprising a correction device configured to correct an end position of each character in the identified text line obtained by convolutionally recurrent neural network prior to marking the core stroke
Supplementary note 20. A computer-readable storage medium storing a program executable by a processor to:
an operation S1, marking each character in the text line with a core stroke, the mark indicating which character in the text line the stroke belongs to;
Operation S2 of marking, based on the marked strokes, an unmarked sticky stroke that is sticky to the marked strokes and an unmarked isolated stroke that is the only unmarked stroke between the two marked strokes, and
Operation S3 of merging together two adjacent strokes of the first N pairs or the first M% pairs that are closest to each other, wherein the two adjacent strokes are not marked to different characters, and wherein N is an integer greater than or equal to 1, and M is any number between 0 and 100,
Operations S2 and S3 are repeated until all strokes are marked to a character in the text line.
Finally, it is also noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Furthermore, without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Although the embodiments of the present invention have been described in detail above with reference to the accompanying drawings, it should be understood that the above-described embodiments are merely configured to illustrate the present invention and do not constitute a limitation of the present invention. Various modifications and alterations to the above described embodiments may be made by those skilled in the art without departing from the spirit and scope of the invention. The scope of the invention is, therefore, indicated only by the appended claims and their equivalents.