Embodiment
In connection with accompanying drawing example embodiment of the present invention is described hereinafter.For clarity and conciseness, all features of actual embodiment are not described in instructions.Yet, should understand, in the process of any this practical embodiments of exploitation, must make a lot of decisions specific to this actual embodiment, in order to realize developer's objectives, for example, meet those restrictive conditions with system and traffic aided, and these restrictive conditions may change to some extent along with the difference of embodiment.In addition, might be very complicated and time-consuming although will also be appreciated that development, concerning the those skilled in the art that have benefited from present disclosure, this development only is routine task.
At this, what also need to illustrate a bit is, for fear of having blured the present invention because of unnecessary details, only show in the accompanying drawings with according to the closely-related apparatus structure of the solution of the present invention and/or treatment step, and omitted other details little with relation of the present invention.
For the ease of deepening the understanding to the principle of the invention, hereinafter will with shown in Figure 1 be exemplified as specifically how the example explanation is separated into the markd character picture of tool character picture and marking image and character picture identified to obtain identifying after character.As shown in fig. 1, Fig. 1 (a) illustrates the example with markd character picture that will identify, Fig. 1 (b) illustrates according to embodiments of the invention the character picture that carries out output after mark and the character separation with markd character picture shown in Fig. 1 (a), and Fig. 1 (c) illustrates according to embodiments of the invention the marking image that carries out output after mark and the character separation with markd character picture shown in Fig. 1 (a).
The below will at first describe the according to an embodiment of the invention basic functional principle of character identifying method with reference to Fig. 2 to Figure 14.
As shown in Figure 2, character identifying method according to this embodiment of the invention comprises: selected marker character step S210, be used for to select the candidate region of the character picture that will the identify character that serves as a mark; Extraction parts marked pixels step S220 is used for according to the position of the mark on the character picture tab character that will identify and the part marked pixels of the described mark of Shape Feature Extraction; The marked pixels step S230 that expansion is extracted is used for by comprising neighbor with equidirectional the part marked pixels of described extraction being expanded to the mark line segment; Obtain refined image (thinned image) step S240, be used for obtaining the refined image of the described character picture that will identify; The mark line segment step S250 of growth expansion is for the mark that the mark line segment of described expansion is grown to identification along the track of described refined image; Separating character and markers step S260 are used for the mark of described identification is separated with described character picture; And identify isolated character step S270, be used for identifying the character picture of described separation.
Here the step S210 that it is pointed out that selected marker character recited above is optional step.That is to say, can be without the selection of tab character and directly carry out Extraction parts marked pixels step S220 and later processing thereof to what will identify with markd character picture, can realize equally mark being separated with character picture and the character picture after separating being identified, thereby submit accuracy and the reliability of identifying to.
Next the marked pixels step S230 that the selected marker character step S210 that comprises in connection with accompanying drawing 3 to 14 pairs of character identifying methods shown in Figure 2 of accompanying drawing, Extraction parts marked pixels step S220, expansion are extracted, obtain mark line segment step S250, separating character and the markers step S260 of refined image step S240, growth expansion and the processing identified in each step such as isolated character step S270 is described in detail.
Fig. 3 illustrates the according to one embodiment of present invention process flow diagram of the concrete processing procedure in the selected marker character step S210 of Fig. 2.As shown in Figure 3, when the markd character of select tape, at first in step S310, always described text block is divided into character zone by the text block alternating projection in the described character picture that will identify to horizontal direction and Vertical Square.
Then, in step S320, the size of the character zone of relatively in step S310, cutting apart, and the character zone after will cutting apart is divided three classes, i.e. contact area, large scale zone and normal size zone.Fig. 4 illustrates the according to this embodiment of the invention character picture example after carrying out cutting and classification.At last, the character that in step S330, contact area and large scale zone served as a mark, and be the nonflag character zone with the normal size area marking.
Here, also mark out reference character for each tab character, reference character is that those and tab character are positioned at delegation or the same character that lists.As shown in Figure 4, for the large scale situation that illustrates, mark out two reference characters, and for the contact situation, then only had a reference character.
In addition, if all character zones are the normal size zone, then this character picture that will identify is categorized as the nonflag character image.So, need not to carry out the marked pixels step S230 of Extraction parts marked pixels step S220 shown in Figure 2, expansion extraction, mark line segment step S250 and separating character and the markers step S260 that obtains refined image step S240, grows and expand, treatment scheme directly advances to step S270 and carries out the character recognition processing.
According to process selecting shown in Figure 3 after the tab character, next will be according to the position of the mark on the tab character of selecting and the part marked pixels of the described mark of Shape Feature Extraction.When extracting the part marked pixels of mark, can process accordingly according to diverse location and the shape facility of mark.The below will make a concrete analysis of and describe for several concrete conditions.
According to one embodiment of present invention, when the Extraction parts marked pixels, as shown in Figure 5, can extract the part marked pixels outside the rectangle frame that comprises character.Utilize this feature, in the tight Extraction parts marked pixels easily when the character of mark, shown in Fig. 5 (a).In addition, for the contact situation that does not have the available reference character, this processing mode also can obtain preferably treatment effect.
Fig. 6 illustrates the process flow diagram according to first example process of embodiments of the invention in the Extraction parts marked pixels step S220 of Fig. 2.As shown in Figure 6, at first in step S610, select one group of candidate's marked pixels by the both sides side wave that separates respectively in the projection of horizontal direction and vertical direction.
Fig. 7 (a) and 7 (b) illustrate with markd character picture projection waveform example figure in vertical direction, and two vertical curves of the left and right sides are corresponding to two vertical curves on Fig. 7 (a) Chinese word both sides among Fig. 7 (b).Fig. 7 (c) and 7 (d) illustrate with markd character picture projection waveform example figure in the horizontal direction, and two vertical curves of the left and right sides are corresponding to two horizontal horizontal lines on Fig. 7 (c) Chinese word both sides among Fig. 7 (d).
Like this, to the shown example of Fig. 7 (d), pixel serves as a mark can to select pixel (respectively corresponding to two ripples outside the vertical curve of the left and right sides among Fig. 7 (b)) outside two vertical curves among Fig. 7 (a) and the pixel (respectively corresponding to two ripples outside the vertical curve of the middle left and right sides of Fig. 7 (d)) outside two horizontal horizontal lines among Fig. 7 (c) for Fig. 7 (a).
Then, in step S620, by utilizing the least square curve fitting method to set up curve model with the described candidate's marked pixels of match group, and in step S630, the error of fitting of calculating described candidate's marked pixels group is to determine whether marked pixels of described candidate's marked pixels group.If error of fitting is less, can think that then the pixel in this candidate's marked pixels group is marked pixels.By the processing of step S620 and step S630, those can be judged as marked pixels and actual false marked pixels eliminating for character pixels.For example, for the pixel outside the right side vertical curve among Fig. 7 (a), because the error of fitting between the curve model of actual pixel value and match exceeds predetermined threshold value, so it is made as is not marked pixels.
In addition, when the Extraction parts marked pixels, for contact situation recited above, also can utilize the feature of passing through of contact fragment to determine marked pixels.Fig. 8 illustrates the process flow diagram according to second example process of embodiments of the invention in the Extraction parts marked pixels step S220 of Fig. 2.
As shown in Figure 8, processing according to the Extraction parts marked pixels of this embodiment, at first in step S810, estimate stroke width by analyzing the distance of swimming, then in step S820 along the feature of passing through of the orthogonal directions inspection contact fragment of contact direction, and in step S830, will have at ruler and have the pixel on the width of two parts and the every part line segment that passes through feature suitable with described stroke width to be defined as marked pixels.
Fig. 9 illustrates to utilize according to this embodiment of the invention and passes through the exemplary plot that feature is carried out the extraction of part marked pixels.The darker part of gray scale on the mark as shown in Figure 9 namely is width two parts suitable with stroke width on the ruler, therefore these pixels is defined as marked pixels.
In addition, for above-described large-sized situation, when the Extraction parts marked pixels, can extract by the layout of analyzing reference character.Figure 10 illustrates the process flow diagram according to three example process of embodiments of the invention in the Extraction parts marked pixels step S220 of Fig. 2.
As shown in figure 10, when the Extraction parts marked pixels, at first at step S1010, for each tab character is determined reference character, described reference character is to be positioned at described tab character to go together mutually or those characters of same column, then calculates reference coordinate at step S1020 according to described reference character.After the reference coordinate of having determined character, be extracted in pixel outside the described reference coordinate scope pixel that serves as a mark at step S1030.
When calculating reference coordinate in step S1020, when described reference character is during along horizontal direction, only the vertical coordinate with described reference character is used for calculating described reference coordinate.Similarly, when described reference character is during along vertical direction, only the horizontal coordinate with described reference character is used for calculating described reference coordinate.
Figure 11 illustrates and utilizes according to this embodiment of the invention reference coordinate to carry out as a reference the exemplary plot that the part marked pixels is extracted.As shown in figure 11, the pixel extraction outside two vertical dotted lines in the character picture is served as a mark pixel.
After having extracted the part marked pixels according to method recited above, in the marked pixels step S230 that expansion shown in Figure 2 is extracted, by comprising neighbor with equidirectional the part marked pixels of described extraction is expanded to the mark line segment.Figure 12 illustrates the according to this embodiment of the invention process flow diagram of the processing procedure in the marked pixels step S230 that the expansion of Fig. 2 is extracted.
As shown in figure 12, when the part marked pixels that expansion is extracted, at first obtain the directional diagram of tab character at step S1210, then the marked pixels by selecting before the pixel-expansion with identical value in the regional area that is included in described directional diagram in step S1220.
Figure 13 illustrates the directional diagram according to the tab character of a concrete example of the present invention.As shown in Figure 13, can calculate according to following formula the gradient of each pixel on all directions and obtain the directional diagram in tab character zone.
C_horizontal=|in(i,j)-in(i,j-1)|+|in(i,j)-in(i,j+1)|+|in(i-1,j)-in(i-1,j-1)|+|in(i-1,j)-in(i-1,j+1)|+|in(i+1,j)-in(i+1,j-1)|+|in(i+1,j)-in(i+1,j+1)|
C_vertical=|in(i,j)-in(i-1,j)|+|in(i,j)-in(i+1,j)|+|in(i,j-1)-in(i-1,j-1)|+|in(i,j-1)-in(i+1,j-1)|+|in(i,j+1)-in(i-1,j+1)|+|in(i,j+1)-in(i+1,j+1)|
C_diagonal135=|in(i,j)-in(i-1,j-1)|+|in(i,j)-in(i+1,j+1)|+2*|in(i,j+1)-in(i-1,j)|+2*|in(i,j-1)-in(i+1,j)|
C_diagonal45=|in(i,j)-in(i-1,j+1)|+|in(i,j)-in(i+1,j-1)|+2*|in(i,j-1)-in(i-1,j)|+2*|in(i,j+1)-in(i+1,j)|
When the marked pixels of selecting is before expanded, if selected mark line section is positioned on the same direction line part in directional diagram, then should whole direction line part all be labeled as marked pixels, realize thus the expansion to the part marked pixels of extracting.
Return now Fig. 2, after in step S230, the part marked pixels of extracting being expanded, obtain the refined image of the character picture that will identify at step S240, as shown in figure 14, the markd character picture of being with according to will identify after the refinement of a concrete example of the present invention is shown.
Then, in step S250, be included in one by one the connection pixel in the track of described refined image until run into the abutment, thus the mark line segment of expanding among the step S230 be grown to the mark of identification.Then, the mark with described identification in step S260 separates with described character picture, and the character picture of the described separation of identification in step S270.
Below 2 describe according to an embodiment of the invention processing procedure and the detailed operation principle thereof of character identifying method in detail to accompanying drawing 14 by reference to the accompanying drawings.Below in conjunction with Figure 15 according to an embodiment of the invention structure and the principle of work thereof of character recognition device are described.
As shown in figure 15, comprise according to the character recognition device of this embodiment: tab character selected cell 1510 is configured to select the candidate region of the character picture that will the identify character that serves as a mark; Marked pixels extraction unit 1520 is configured to according to the position of the mark on the tab character in the character picture that will identify and the part marked pixels of the described mark of Shape Feature Extraction; Expanding element 1530 is configured to by comprising neighbor with equidirectional the part marked pixels of described extraction be expanded to the mark line segment; Refined image acquiring unit 1540 is configured to obtain the refined image of the described character picture that will identify; Mark line segment growing element 1550 is configured to along the track of described refined image the mark line segment of described expansion is grown to the mark of identification; Separative element 1560 is configured to the mark of described identification is separated with described character picture; And recognition unit 1570, be configured to identify the character picture of described separation.
The tab character selected cell 1510 that comprises according to the character recognition device of this embodiment, marked pixels extraction unit 1520, expanding element 1530, refined image acquiring unit 1540, mark line segment growing element 1550, separative element 1560, and the selected marker character step S210 in the character identifying method described with reference Fig. 2 to Figure 14 respectively of the concrete processing procedure in the modules such as recognition unit 1570, Extraction parts marked pixels step S220, the marked pixels step S230 that expansion is extracted, obtain refined image step S240, the mark line segment step S250 of growth expansion, separating character and markers step S260, and the processing of identifying in each steps such as isolated character step S270 is similar, omits further detailed description at this.
It is to be noted equally, but the tab character selected cell 1510 here is arrangement, can not comprise tab character selected cell 1510 according to one embodiment of present invention, but only consisted of by above-mentioned marked pixels extraction unit 1520, expanding element 1530, refined image acquiring unit 1540, mark line segment growing element 1550, separative element 1560 and recognition unit 1570, can realize too separating of character picture and marking image, thereby improve the accuracy of identification.
So, by above-described according to an embodiment of the invention character identifying method and character recognition device, can detect exactly the mark that exists on the character picture that will identify, and from character, isolate all or part of marked pixels, thereby can identify exactly.
In addition, because according to an embodiment of the invention character identifying method and character recognition device, adopted stable and mark position and shape facility come mark on the separating character image reliably, and position and shape facility equally also are applicable to character, therefore can guarantee that the pixel of extracting belongs to marked pixels, also exactly character picture be identified thereby can from character picture, isolate credibly all or part of marked pixels.
In addition, in according to an embodiment of the invention character identifying method and character recognition device, owing to having adopted the track of the image after directional diagram and the refinement to carry out as a reference the expansion of mark line segment, constraint on the space is provided, help avoid thus character pixels is divided into marked pixels mistakenly, thereby exactly separating character image and marking image, for follow-up realization exactly the identification character image guarantee is provided.
Ultimate principle of the present invention has below been described in conjunction with specific embodiments, but, also it is to be noted, for those of ordinary skill in the art, can understand whole or any steps or the parts of method and apparatus of the present invention, can be in the network of any calculation element (comprising processor, storage medium etc.) or calculation element, realized with hardware, firmware, software or their combination, this is those of ordinary skills in the situation that read the basic programming skill that explanation of the present invention uses them and just can realize.
Therefore, purpose of the present invention can also be by realizing in any program of calculation element operation or batch processing.Described calculation element can be known fexible unit.Therefore, purpose of the present invention also can be only by providing the program product that comprises the program code of realizing described method or device to realize.That is to say, such program product also consists of the present invention, and the storage medium that stores such program product also consists of the present invention.Obviously, described storage medium can be any storage medium that develops in any known storage medium or future.
In the situation that realize embodiments of the invention by software and/or firmware, from storage medium or network to the computing machine with specialized hardware structure, for example general purpose personal computer 700 shown in Figure 16 is installed the program that consists of this software, this computing machine can be carried out various functions etc. when various program is installed.
In Figure 16, CPU (central processing unit) (CPU) 701 carries out various processing according to the program of storage in the ROM (read-only memory) (ROM) 702 or from the program that storage area 708 is loaded into random access memory (RAM) 703.In RAM 703, also store as required data required when CPU 701 carries out various processing etc.CPU 701, ROM 702 and RAM 703 are connected to each other via bus 704.Input/output interface 705 also is connected to bus 704.
Following parts are connected to input/output interface 705: importation 706 comprises keyboard, mouse etc.; Output 707 comprises display, such as cathode-ray tube (CRT) (CRT), liquid crystal display (LCD) etc., and loudspeaker etc.; Storage area 708 comprises hard disk etc.; With communications portion 709, comprise that network interface unit is such as LAN card, modulator-demodular unit etc.Communications portion 709 is processed such as the Internet executive communication via network.
As required, driver 710 also is connected to input/output interface 705.Detachable media 711 is installed on the driver 710 as required such as disk, CD, magneto-optic disk, semiconductor memory etc., so that the computer program of therefrom reading is installed in the storage area 708 as required.
In the situation that realize above-mentioned series of processes by software, such as detachable media 711 program that consists of software is installed such as the Internet or storage medium from network.
It will be understood by those of skill in the art that this storage medium is not limited to shown in Figure 16 wherein has program stored therein, distributes separately to provide the detachable media 711 of program to the user with device.The example of detachable media 711 comprises disk (comprising floppy disk (registered trademark)), CD (comprising compact disc read-only memory (CD-ROM) and digital universal disc (DVD)), magneto-optic disk (comprising mini-disk (MD) (registered trademark)) and semiconductor memory.Perhaps, storage medium can be hard disk that comprises in ROM 702, the storage area 708 etc., computer program stored wherein, and be distributed to the user with the device that comprises them.
Also it is pointed out that in apparatus and method of the present invention, obviously, each parts or each step can decompose and/or reconfigure.These decomposition and/or reconfigure and to be considered as equivalents of the present invention.And, carry out the step of above-mentioned series of processes and can order naturally following the instructions carry out in chronological order, but do not need necessarily to carry out according to time sequencing.Some step can walk abreast or carry out independently of one another.
Although described the present invention and advantage thereof in detail, be to be understood that and in the situation that does not break away from the spirit and scope of the present invention that limited by appended claim, can carry out various changes, alternative and conversion.And, the application's term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thereby not only comprise those key elements so that comprise process, method, article or the device of a series of key elements, but also comprise other key elements of clearly not listing, or also be included as the intrinsic key element of this process, method, article or device.In the situation that not more restrictions, the key element that is limited by statement " comprising ... ", and be not precluded within process, method, article or the device that comprises described key element and also have other identical element.
Remarks
1. 1 kinds of character identifying methods of remarks comprise:
According to the position of the mark on the tab character in the character picture that will identify and the part marked pixels of the described mark of Shape Feature Extraction;
By comprising neighbor with equidirectional the part marked pixels of described extraction is expanded to the mark line segment;
Obtain the refined image of the described character picture that will identify;
The mark line segment of described expansion is grown to the mark of identification along the track of described refined image;
The mark of described identification is separated with described character picture; And
Identify the character picture of described separation.
Remarks 2. also comprises according to remarks 1 described character identifying method:
The candidate region of the described character picture that selection will be identified is as described tab character.
Remarks 3. is according to remarks 2 described character identifying methods, and wherein said selection candidate region comprises:
Always described text block is divided into character zone by the text block alternating projection in the described character picture that will identify to horizontal direction and Vertical Square;
Be categorized as contact area, large scale zone and normal size zone by the described character zone of cutting apart of the large young pathbreaker of the more described character zone of cutting apart; And
With described contact area and described large scale zone as described tab character.
Remarks 4. is according to remarks 3 described character identifying methods, and wherein said Extraction parts marked pixels comprises the part marked pixels of extracting outside the rectangle frame that comprises character.
Remarks 5. is according to remarks 4 described character identifying methods, and wherein said Extraction parts marked pixels comprises:
Select one group of candidate's marked pixels by the both sides side wave that separates respectively in the projection of horizontal direction and vertical direction;
Set up curve model with the described candidate's marked pixels of match group by utilizing the least square curve fitting method; And
The error of fitting of calculating described candidate's marked pixels group is to determine whether marked pixels of described candidate's marked pixels group.
Remarks 6. is according to remarks 3 described character identifying methods, and wherein said Extraction parts marked pixels comprises:
Estimate stroke width by analyzing the distance of swimming;
Contact the feature of passing through of fragment along the orthogonal directions inspection of contact direction; And
To have at ruler has the pixel on the width of two parts and the every part line segment that passes through feature suitable with described stroke width to be defined as marked pixels.
Remarks 7. is according to remarks 3 described character identifying methods, and wherein said Extraction parts marked pixels comprises:
For each tab character is determined reference character, described reference character is to be positioned at described tab character to go together mutually or those characters of same column;
Calculate reference coordinate according to described reference character; And
Be extracted in pixel outside the described reference coordinate scope pixel that serves as a mark.
Remarks 8. is according to remarks 7 described character identifying methods, wherein
When described reference character is during along horizontal direction, only the vertical coordinate with described reference character is used for calculating described reference coordinate; And
When described reference character is during along vertical direction, only the horizontal coordinate with described reference character is used for calculating described reference coordinate.
Remarks 9. is according to any one the described character identifying method in the remarks 1 to 8, and the part marked pixels of the described extraction of wherein said expansion comprises:
Obtain the directional diagram of described tab character; And
By the marked pixels of selecting before the pixel-expansion with identical value in the regional area that is included in described directional diagram.
Remarks 10. is according to any one the described character identifying method in the remarks 1 to 8, and the mark line segment of the described expansion of wherein said growth comprises:
Be included in one by one connection pixel in the track of described refined image until run into the abutment.
11. 1 kinds of character recognition devices of remarks comprise:
The marked pixels extraction unit is configured to according to the position of the mark on the tab character in the character picture that will identify and the part marked pixels of the described mark of Shape Feature Extraction;
Expanding element is configured to by comprising neighbor with equidirectional the part marked pixels of described extraction be expanded to the mark line segment;
The refined image acquiring unit is configured to obtain the refined image of the described character picture that will identify;
Mark line segment growing element is configured to along the track of described refined image the mark line segment of described expansion is grown to the mark of identification;
Separative element is configured to the mark of described identification is separated with described character picture; And
Recognition unit is configured to identify the character picture of described separation.
Remarks 12. also comprises according to remarks 11 described character recognition devices:
The tab character selected cell is configured to select the candidate region of the described character picture that will identify as described tab character.
Remarks 13. is according to remarks 12 described character recognition devices, and wherein said tab character selected cell also is configured to:
Always described text block is divided into character zone by the text block alternating projection in the described character picture that will identify to horizontal direction and Vertical Square;
Be categorized as contact area, large scale zone and normal size zone by the described character zone of cutting apart of the large young pathbreaker of the more described character zone of cutting apart; And
With described contact area and described large scale zone as described tab character.
Remarks 14. is according to remarks 13 described character recognition devices, and wherein said marked pixels extraction unit also is configured to extract the part marked pixels outside the rectangle frame that comprises character.
Remarks 15. is according to remarks 14 described character recognition devices, and wherein said marked pixels extraction unit also is configured to:
Select one group of candidate's marked pixels by the both sides side wave that separates respectively in the projection of horizontal direction and vertical direction;
Set up curve model with the described candidate's marked pixels of match group by utilizing the least square curve fitting method; And
The error of fitting of calculating described candidate's marked pixels group is to determine whether marked pixels of described candidate's marked pixels group.
Remarks 16. is according to remarks 13 described character recognition devices, and wherein said marked pixels extraction unit also is configured to:
Estimate stroke width by analyzing the distance of swimming;
Contact the feature of passing through of fragment along the orthogonal directions inspection of contact direction; And
To have at ruler has the pixel on the width of two parts and the every part line segment that passes through feature suitable with described stroke width to be defined as marked pixels.
Remarks 17. is according to remarks 13 described character recognition devices, and wherein said marked pixels extraction unit also is configured to:
For each tab character is determined reference character, described reference character is to be positioned at described tab character to go together mutually or those characters of same column;
Calculate reference coordinate according to described reference character; And
Be extracted in pixel outside the described reference coordinate scope pixel that serves as a mark.
Remarks 18. is according to remarks 17 described character recognition devices, wherein
When described reference character is during along horizontal direction, only the vertical coordinate with described reference character is used for calculating described reference coordinate; And
When described reference character is during along vertical direction, only the horizontal coordinate with described reference character is used for calculating described reference coordinate.
Remarks 19. is according to any one the described character recognition device in the remarks 11 to 18, and wherein said expanding element also is configured to:
Obtain the directional diagram of described tab character; And
By the marked pixels of selecting before the pixel-expansion with identical value in the regional area that is included in described directional diagram.
Remarks 20. is according to any one the described character recognition device in the remarks 11 to 18, and wherein said mark line segment growing element also is configured to be included in one by one connection pixel in the track of described refined image until run into the abutment.