[go: up one dir, main page]

CN102024138B - Character identification method and character identification device - Google Patents

Character identification method and character identification device Download PDF

Info

Publication number
CN102024138B
CN102024138B CN 200910173692 CN200910173692A CN102024138B CN 102024138 B CN102024138 B CN 102024138B CN 200910173692 CN200910173692 CN 200910173692 CN 200910173692 A CN200910173692 A CN 200910173692A CN 102024138 B CN102024138 B CN 102024138B
Authority
CN
China
Prior art keywords
character
pixels
marker
mark
marked pixels
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN 200910173692
Other languages
Chinese (zh)
Other versions
CN102024138A (en
Inventor
常兰兰
孙俊
小泽宪秋
武部浩明
于浩
直井聪
堀田悦伸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to CN 200910173692 priority Critical patent/CN102024138B/en
Priority to JP2010200193A priority patent/JP2011065643A/en
Publication of CN102024138A publication Critical patent/CN102024138A/en
Application granted granted Critical
Publication of CN102024138B publication Critical patent/CN102024138B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Character Input (AREA)
  • Character Discrimination (AREA)

Abstract

本发明公开了一种字符识别方法和字符识别装置。根据本发明的一个实施例的字符识别方法包括:根据要识别的字符图像中标记字符上的标记的位置和形状特征提取所述标记的部分标记像素;通过包含具有相同方向的相邻像素将所述提取的部分标记像素扩展为标记线段;获取要识别的所述字符图像的细化图像;沿着所述细化图像的轨迹将所述扩展的标记线段生长为识别的标记;将所述识别的标记与所述字符图像分离;以及识别所述分离的字符图像。

Figure 200910173692

The invention discloses a character recognition method and a character recognition device. The character recognition method according to one embodiment of the present invention includes: extracting part of the marked pixels of the mark according to the position and shape characteristics of the mark on the marked character in the character image to be recognized; expand the extracted part of the marked pixels into marked line segments; obtain the thinned image of the character image to be recognized; grow the expanded marked line segments into recognized marks along the track of the thinned image; The mark of is separated from the character image; and the separated character image is identified.

Figure 200910173692

Description

Character identifying method and character recognition device
Technical field
Relate generally to character identifying method of the present invention and character recognition device.More particularly, the present invention relates to a kind of character identifying method and character recognition device of the mark on can the separating character image.
Background technology
OCR (Optical Character Recognition, optical character identification) system has more and more popularized and has seemed for computer utility and become more and more important.The OCR system is converted to e-file with the document of paper spare form, has simplified the data input and has made it possible to carry out easily editor, management, distribution of flood tide document etc.The recognition capability of OCR engine is the key factor that affects its application cost, only has its using value of identification guarantee of pin-point accuracy.For common print text document, those standardized characters especially, current most of OCR engines can both be realized higher discrimination.
But, in some cases, such as registration form, questionnaire, bill etc., can be with some character marking in order to represent selection result, these marks have brought new challenge to the identification of OCR engine.At first, some marks have become a character with two or more Connection operators, and this can cause the Character segmentation failure of OCR engine usually.Secondly, mark may occupy the zone larger than character zone, and this will make character size diminish when the OCR engine carries out normalization, thereby causes follow-up recognition failures.
For this reason, proposed a kind of method by color filtration in the prior art and extracted with character and have marked pixels on the mark of different colours, but this method cisco unity malfunction when mark and character have same color.Existing another kind of method is that the gray scale difference according to mark and character comes separation marking and character and identifies, but the work of this method is also unstable, the situation about can't separate because often can occur that mark has same grayscale with character.
Summary of the invention
In view of the foregoing, the present invention proposes a kind of character identifying method and character recognition device, by utilize to mark and character all be suitable for locus and shape facility come separation marking and character, realize thus character recognition.According to character identifying method of the present invention and character recognition device, can detect easily and separate with the overlapping character picture of the character picture that will identify on mark, thereby recover character picture so that identify.
Given first is about brief overview of the present invention hereinafter, in order to the basic comprehension about some aspect of the present invention is provided.Should be appreciated that this general introduction is not about exhaustive general introduction of the present invention.It is not that intention is determined key of the present invention or pith, neither be intended to limit scope of the present invention.Its purpose only is that the form of simplifying provides some concept, with this as the in greater detail preorder of discussing after a while.
According to an aspect of the present invention, provide a kind of character identifying method, comprising: according to the position of the mark on the tab character in the character picture that will identify and the part marked pixels of the described mark of Shape Feature Extraction; By comprising neighbor with equidirectional the part marked pixels of described extraction is expanded to the mark line segment; Obtain the refined image of the described character picture that will identify; The mark line segment of described expansion is grown to the mark of identification along the track of described refined image; The mark of described identification is separated with described character picture; And the character picture of identifying described separation.
Character identifying method according to an embodiment of the invention comprises that also the candidate region of the described character picture that selection will be identified is as described tab character.
According to another aspect of the present invention, provide a kind of character recognition device, comprising: the marked pixels extraction unit is configured to according to the position of the mark on the tab character in the character picture that will identify and the part marked pixels of the described mark of Shape Feature Extraction; Expanding element is configured to by comprising neighbor with equidirectional the part marked pixels of described extraction be expanded to the mark line segment; The refined image acquiring unit is configured to obtain the refined image of the described character picture that will identify; Mark line segment growing element is configured to along the track of described refined image the mark line segment of described expansion is grown to the mark of identification; Separative element is configured to the mark of described identification is separated with described character picture; And recognition unit, be configured to identify the character picture of described separation.
Character recognition device according to an embodiment of the invention also comprises the tab character selected cell, is configured to select the candidate region of the described character picture that will identify as described tab character.
Preferably, described selection candidate region comprises: always described text block is divided into character zone by the text block alternating projection in the described character picture that will identify to horizontal direction and Vertical Square; Be categorized as contact area, large scale zone and normal size zone by the described character zone of cutting apart of the large young pathbreaker of the more described character zone of cutting apart; And with described contact area and described large scale zone as described tab character.
According to one embodiment of present invention, described Extraction parts marked pixels comprises the part marked pixels of extracting outside the rectangle frame that comprises character.Specifically, described Extraction parts marked pixels comprises: select one group of candidate's marked pixels by the both sides side wave that separates respectively in the projection of horizontal direction and vertical direction; Set up curve model with the described candidate's marked pixels of match group by utilizing the least square curve fitting method; And the error of fitting of calculating described candidate's marked pixels group is to determine whether marked pixels of described candidate's marked pixels group.
According to another embodiment of the invention, described Extraction parts marked pixels comprises: estimate stroke width by analyzing the distance of swimming; Contact the feature of passing through of fragment along the orthogonal directions inspection of contact direction; And will have at ruler and have the pixel on the width of two parts and the every part line segment that passes through feature suitable with described stroke width to be defined as marked pixels.
According to still a further embodiment, described Extraction parts marked pixels comprises: for each tab character is determined reference character, described reference character is to be positioned at described tab character to go together mutually or those characters of same column; Calculate reference coordinate according to described reference character; And be extracted in pixel outside the described reference coordinate scope pixel that serves as a mark.Preferably, when described reference character is during along horizontal direction, only the vertical coordinate with described reference character is used for calculating described reference coordinate; And when described reference character be during along vertical direction, only the horizontal coordinate with described reference character is used for calculating described reference coordinate.
According to one embodiment of present invention, the part marked pixels of the described extraction of described expansion comprises: the directional diagram that obtains described tab character; And the marked pixels by selecting before the pixel-expansion with identical value in the regional area that is included in described directional diagram.
According to one embodiment of present invention, the mark line segment of the described expansion of described growth comprises: be included in one by one connection pixel in the track of described refined image until run into the abutment.
Can find out, according to character identifying method of the present invention and character recognition device, by utilizing locus and the shape facility that mark and character all are suitable for, easily separation marking and character, thus recover character picture easily so that identify.
In addition, the present invention also is provided for realizing the computer program of above-mentioned character identifying method.
In addition, the present invention also provides at least computer program of computer-readable medium form, records on it be used to the computer program code of realizing above-mentioned character identifying method.
Description of drawings
The present invention can by with reference to hereinafter by reference to the accompanying drawings given description be better understood, wherein in institute's drawings attached, used same or analogous Reference numeral to represent identical or similar parts.Described accompanying drawing comprises in this manual and forms the part of this instructions together with following detailed description, and is used for further illustrating the preferred embodiments of the present invention and explains principle and advantage of the present invention.In the accompanying drawings:
Fig. 1 (a) illustrates the example with markd character picture that will identify;
Fig. 1 (b) illustrates according to embodiments of the invention the character picture that carries out output after mark and the character separation with markd character picture shown in Fig. 1 (a);
Fig. 1 (c) illustrates according to embodiments of the invention the marking image that carries out output after mark and the character separation with markd character picture shown in Fig. 1 (a);
Fig. 2 illustrates the according to an embodiment of the invention process flow diagram of the processing procedure of character identifying method;
Fig. 3 illustrates the process flow diagram according to the concrete processing procedure of embodiments of the invention in the selected marker character step of Fig. 2;
Fig. 4 illustrates according to the character picture example of embodiments of the invention after carrying out cutting and classification;
Fig. 5 (a) flag activation closely centers on the example of character;
Fig. 5 (b) illustrates the example that does not have the contact of available reference character situation;
Fig. 6 illustrates the process flow diagram according to first example process of embodiments of the invention in the Extraction parts marked pixels step of Fig. 2;
Fig. 7 (a) and 7 (b) illustrate with markd character picture projection waveform example figure in vertical direction;
Fig. 7 (c) and 7 (d) illustrate with markd character picture projection waveform example figure in the horizontal direction;
Fig. 8 illustrates the process flow diagram according to second example process of embodiments of the invention in the Extraction parts marked pixels step of Fig. 2;
Fig. 9 illustrates according to embodiments of the invention utilization and passes through feature and carry out the exemplary plot that the part marked pixels is extracted;
Figure 10 illustrates the process flow diagram according to three example process of embodiments of the invention in the Extraction parts marked pixels step of Fig. 2;
Figure 11 illustrates according to embodiments of the invention and utilizes reference coordinate to carry out as a reference the exemplary plot that the part marked pixels is extracted;
Figure 12 illustrates the process flow diagram according to the processing procedure of embodiments of the invention in the marked pixels step that the expansion of Fig. 2 is extracted;
The exemplary plot of the directional diagram of Figure 13 flag activation character;
Figure 14 illustrates the exemplary plot with markd character picture that will identify after the refinement;
Figure 15 illustrates the according to an embodiment of the invention configuration block scheme of character recognition device; And
Figure 16 illustrates for the structure calcspar of implementing according to the messaging device of character identifying method of the present invention.
It will be appreciated by those skilled in the art that in the accompanying drawing element only for simple and clear for the purpose of and illustrate, and not necessarily draw in proportion.For example, the size of some element may have been amplified with respect to other elements in the accompanying drawing, in order to help to improve the understanding to the embodiment of the invention.
Embodiment
In connection with accompanying drawing example embodiment of the present invention is described hereinafter.For clarity and conciseness, all features of actual embodiment are not described in instructions.Yet, should understand, in the process of any this practical embodiments of exploitation, must make a lot of decisions specific to this actual embodiment, in order to realize developer's objectives, for example, meet those restrictive conditions with system and traffic aided, and these restrictive conditions may change to some extent along with the difference of embodiment.In addition, might be very complicated and time-consuming although will also be appreciated that development, concerning the those skilled in the art that have benefited from present disclosure, this development only is routine task.
At this, what also need to illustrate a bit is, for fear of having blured the present invention because of unnecessary details, only show in the accompanying drawings with according to the closely-related apparatus structure of the solution of the present invention and/or treatment step, and omitted other details little with relation of the present invention.
For the ease of deepening the understanding to the principle of the invention, hereinafter will with shown in Figure 1 be exemplified as specifically how the example explanation is separated into the markd character picture of tool character picture and marking image and character picture identified to obtain identifying after character.As shown in fig. 1, Fig. 1 (a) illustrates the example with markd character picture that will identify, Fig. 1 (b) illustrates according to embodiments of the invention the character picture that carries out output after mark and the character separation with markd character picture shown in Fig. 1 (a), and Fig. 1 (c) illustrates according to embodiments of the invention the marking image that carries out output after mark and the character separation with markd character picture shown in Fig. 1 (a).
The below will at first describe the according to an embodiment of the invention basic functional principle of character identifying method with reference to Fig. 2 to Figure 14.
As shown in Figure 2, character identifying method according to this embodiment of the invention comprises: selected marker character step S210, be used for to select the candidate region of the character picture that will the identify character that serves as a mark; Extraction parts marked pixels step S220 is used for according to the position of the mark on the character picture tab character that will identify and the part marked pixels of the described mark of Shape Feature Extraction; The marked pixels step S230 that expansion is extracted is used for by comprising neighbor with equidirectional the part marked pixels of described extraction being expanded to the mark line segment; Obtain refined image (thinned image) step S240, be used for obtaining the refined image of the described character picture that will identify; The mark line segment step S250 of growth expansion is for the mark that the mark line segment of described expansion is grown to identification along the track of described refined image; Separating character and markers step S260 are used for the mark of described identification is separated with described character picture; And identify isolated character step S270, be used for identifying the character picture of described separation.
Here the step S210 that it is pointed out that selected marker character recited above is optional step.That is to say, can be without the selection of tab character and directly carry out Extraction parts marked pixels step S220 and later processing thereof to what will identify with markd character picture, can realize equally mark being separated with character picture and the character picture after separating being identified, thereby submit accuracy and the reliability of identifying to.
Next the marked pixels step S230 that the selected marker character step S210 that comprises in connection with accompanying drawing 3 to 14 pairs of character identifying methods shown in Figure 2 of accompanying drawing, Extraction parts marked pixels step S220, expansion are extracted, obtain mark line segment step S250, separating character and the markers step S260 of refined image step S240, growth expansion and the processing identified in each step such as isolated character step S270 is described in detail.
Fig. 3 illustrates the according to one embodiment of present invention process flow diagram of the concrete processing procedure in the selected marker character step S210 of Fig. 2.As shown in Figure 3, when the markd character of select tape, at first in step S310, always described text block is divided into character zone by the text block alternating projection in the described character picture that will identify to horizontal direction and Vertical Square.
Then, in step S320, the size of the character zone of relatively in step S310, cutting apart, and the character zone after will cutting apart is divided three classes, i.e. contact area, large scale zone and normal size zone.Fig. 4 illustrates the according to this embodiment of the invention character picture example after carrying out cutting and classification.At last, the character that in step S330, contact area and large scale zone served as a mark, and be the nonflag character zone with the normal size area marking.
Here, also mark out reference character for each tab character, reference character is that those and tab character are positioned at delegation or the same character that lists.As shown in Figure 4, for the large scale situation that illustrates, mark out two reference characters, and for the contact situation, then only had a reference character.
In addition, if all character zones are the normal size zone, then this character picture that will identify is categorized as the nonflag character image.So, need not to carry out the marked pixels step S230 of Extraction parts marked pixels step S220 shown in Figure 2, expansion extraction, mark line segment step S250 and separating character and the markers step S260 that obtains refined image step S240, grows and expand, treatment scheme directly advances to step S270 and carries out the character recognition processing.
According to process selecting shown in Figure 3 after the tab character, next will be according to the position of the mark on the tab character of selecting and the part marked pixels of the described mark of Shape Feature Extraction.When extracting the part marked pixels of mark, can process accordingly according to diverse location and the shape facility of mark.The below will make a concrete analysis of and describe for several concrete conditions.
According to one embodiment of present invention, when the Extraction parts marked pixels, as shown in Figure 5, can extract the part marked pixels outside the rectangle frame that comprises character.Utilize this feature, in the tight Extraction parts marked pixels easily when the character of mark, shown in Fig. 5 (a).In addition, for the contact situation that does not have the available reference character, this processing mode also can obtain preferably treatment effect.
Fig. 6 illustrates the process flow diagram according to first example process of embodiments of the invention in the Extraction parts marked pixels step S220 of Fig. 2.As shown in Figure 6, at first in step S610, select one group of candidate's marked pixels by the both sides side wave that separates respectively in the projection of horizontal direction and vertical direction.
Fig. 7 (a) and 7 (b) illustrate with markd character picture projection waveform example figure in vertical direction, and two vertical curves of the left and right sides are corresponding to two vertical curves on Fig. 7 (a) Chinese word both sides among Fig. 7 (b).Fig. 7 (c) and 7 (d) illustrate with markd character picture projection waveform example figure in the horizontal direction, and two vertical curves of the left and right sides are corresponding to two horizontal horizontal lines on Fig. 7 (c) Chinese word both sides among Fig. 7 (d).
Like this, to the shown example of Fig. 7 (d), pixel serves as a mark can to select pixel (respectively corresponding to two ripples outside the vertical curve of the left and right sides among Fig. 7 (b)) outside two vertical curves among Fig. 7 (a) and the pixel (respectively corresponding to two ripples outside the vertical curve of the middle left and right sides of Fig. 7 (d)) outside two horizontal horizontal lines among Fig. 7 (c) for Fig. 7 (a).
Then, in step S620, by utilizing the least square curve fitting method to set up curve model with the described candidate's marked pixels of match group, and in step S630, the error of fitting of calculating described candidate's marked pixels group is to determine whether marked pixels of described candidate's marked pixels group.If error of fitting is less, can think that then the pixel in this candidate's marked pixels group is marked pixels.By the processing of step S620 and step S630, those can be judged as marked pixels and actual false marked pixels eliminating for character pixels.For example, for the pixel outside the right side vertical curve among Fig. 7 (a), because the error of fitting between the curve model of actual pixel value and match exceeds predetermined threshold value, so it is made as is not marked pixels.
In addition, when the Extraction parts marked pixels, for contact situation recited above, also can utilize the feature of passing through of contact fragment to determine marked pixels.Fig. 8 illustrates the process flow diagram according to second example process of embodiments of the invention in the Extraction parts marked pixels step S220 of Fig. 2.
As shown in Figure 8, processing according to the Extraction parts marked pixels of this embodiment, at first in step S810, estimate stroke width by analyzing the distance of swimming, then in step S820 along the feature of passing through of the orthogonal directions inspection contact fragment of contact direction, and in step S830, will have at ruler and have the pixel on the width of two parts and the every part line segment that passes through feature suitable with described stroke width to be defined as marked pixels.
Fig. 9 illustrates to utilize according to this embodiment of the invention and passes through the exemplary plot that feature is carried out the extraction of part marked pixels.The darker part of gray scale on the mark as shown in Figure 9 namely is width two parts suitable with stroke width on the ruler, therefore these pixels is defined as marked pixels.
In addition, for above-described large-sized situation, when the Extraction parts marked pixels, can extract by the layout of analyzing reference character.Figure 10 illustrates the process flow diagram according to three example process of embodiments of the invention in the Extraction parts marked pixels step S220 of Fig. 2.
As shown in figure 10, when the Extraction parts marked pixels, at first at step S1010, for each tab character is determined reference character, described reference character is to be positioned at described tab character to go together mutually or those characters of same column, then calculates reference coordinate at step S1020 according to described reference character.After the reference coordinate of having determined character, be extracted in pixel outside the described reference coordinate scope pixel that serves as a mark at step S1030.
When calculating reference coordinate in step S1020, when described reference character is during along horizontal direction, only the vertical coordinate with described reference character is used for calculating described reference coordinate.Similarly, when described reference character is during along vertical direction, only the horizontal coordinate with described reference character is used for calculating described reference coordinate.
Figure 11 illustrates and utilizes according to this embodiment of the invention reference coordinate to carry out as a reference the exemplary plot that the part marked pixels is extracted.As shown in figure 11, the pixel extraction outside two vertical dotted lines in the character picture is served as a mark pixel.
After having extracted the part marked pixels according to method recited above, in the marked pixels step S230 that expansion shown in Figure 2 is extracted, by comprising neighbor with equidirectional the part marked pixels of described extraction is expanded to the mark line segment.Figure 12 illustrates the according to this embodiment of the invention process flow diagram of the processing procedure in the marked pixels step S230 that the expansion of Fig. 2 is extracted.
As shown in figure 12, when the part marked pixels that expansion is extracted, at first obtain the directional diagram of tab character at step S1210, then the marked pixels by selecting before the pixel-expansion with identical value in the regional area that is included in described directional diagram in step S1220.
Figure 13 illustrates the directional diagram according to the tab character of a concrete example of the present invention.As shown in Figure 13, can calculate according to following formula the gradient of each pixel on all directions and obtain the directional diagram in tab character zone.
C_horizontal=|in(i,j)-in(i,j-1)|+|in(i,j)-in(i,j+1)|+|in(i-1,j)-in(i-1,j-1)|+|in(i-1,j)-in(i-1,j+1)|+|in(i+1,j)-in(i+1,j-1)|+|in(i+1,j)-in(i+1,j+1)|
C_vertical=|in(i,j)-in(i-1,j)|+|in(i,j)-in(i+1,j)|+|in(i,j-1)-in(i-1,j-1)|+|in(i,j-1)-in(i+1,j-1)|+|in(i,j+1)-in(i-1,j+1)|+|in(i,j+1)-in(i+1,j+1)|
C_diagonal135=|in(i,j)-in(i-1,j-1)|+|in(i,j)-in(i+1,j+1)|+2*|in(i,j+1)-in(i-1,j)|+2*|in(i,j-1)-in(i+1,j)|
C_diagonal45=|in(i,j)-in(i-1,j+1)|+|in(i,j)-in(i+1,j-1)|+2*|in(i,j-1)-in(i-1,j)|+2*|in(i,j+1)-in(i+1,j)|
When the marked pixels of selecting is before expanded, if selected mark line section is positioned on the same direction line part in directional diagram, then should whole direction line part all be labeled as marked pixels, realize thus the expansion to the part marked pixels of extracting.
Return now Fig. 2, after in step S230, the part marked pixels of extracting being expanded, obtain the refined image of the character picture that will identify at step S240, as shown in figure 14, the markd character picture of being with according to will identify after the refinement of a concrete example of the present invention is shown.
Then, in step S250, be included in one by one the connection pixel in the track of described refined image until run into the abutment, thus the mark line segment of expanding among the step S230 be grown to the mark of identification.Then, the mark with described identification in step S260 separates with described character picture, and the character picture of the described separation of identification in step S270.
Below 2 describe according to an embodiment of the invention processing procedure and the detailed operation principle thereof of character identifying method in detail to accompanying drawing 14 by reference to the accompanying drawings.Below in conjunction with Figure 15 according to an embodiment of the invention structure and the principle of work thereof of character recognition device are described.
As shown in figure 15, comprise according to the character recognition device of this embodiment: tab character selected cell 1510 is configured to select the candidate region of the character picture that will the identify character that serves as a mark; Marked pixels extraction unit 1520 is configured to according to the position of the mark on the tab character in the character picture that will identify and the part marked pixels of the described mark of Shape Feature Extraction; Expanding element 1530 is configured to by comprising neighbor with equidirectional the part marked pixels of described extraction be expanded to the mark line segment; Refined image acquiring unit 1540 is configured to obtain the refined image of the described character picture that will identify; Mark line segment growing element 1550 is configured to along the track of described refined image the mark line segment of described expansion is grown to the mark of identification; Separative element 1560 is configured to the mark of described identification is separated with described character picture; And recognition unit 1570, be configured to identify the character picture of described separation.
The tab character selected cell 1510 that comprises according to the character recognition device of this embodiment, marked pixels extraction unit 1520, expanding element 1530, refined image acquiring unit 1540, mark line segment growing element 1550, separative element 1560, and the selected marker character step S210 in the character identifying method described with reference Fig. 2 to Figure 14 respectively of the concrete processing procedure in the modules such as recognition unit 1570, Extraction parts marked pixels step S220, the marked pixels step S230 that expansion is extracted, obtain refined image step S240, the mark line segment step S250 of growth expansion, separating character and markers step S260, and the processing of identifying in each steps such as isolated character step S270 is similar, omits further detailed description at this.
It is to be noted equally, but the tab character selected cell 1510 here is arrangement, can not comprise tab character selected cell 1510 according to one embodiment of present invention, but only consisted of by above-mentioned marked pixels extraction unit 1520, expanding element 1530, refined image acquiring unit 1540, mark line segment growing element 1550, separative element 1560 and recognition unit 1570, can realize too separating of character picture and marking image, thereby improve the accuracy of identification.
So, by above-described according to an embodiment of the invention character identifying method and character recognition device, can detect exactly the mark that exists on the character picture that will identify, and from character, isolate all or part of marked pixels, thereby can identify exactly.
In addition, because according to an embodiment of the invention character identifying method and character recognition device, adopted stable and mark position and shape facility come mark on the separating character image reliably, and position and shape facility equally also are applicable to character, therefore can guarantee that the pixel of extracting belongs to marked pixels, also exactly character picture be identified thereby can from character picture, isolate credibly all or part of marked pixels.
In addition, in according to an embodiment of the invention character identifying method and character recognition device, owing to having adopted the track of the image after directional diagram and the refinement to carry out as a reference the expansion of mark line segment, constraint on the space is provided, help avoid thus character pixels is divided into marked pixels mistakenly, thereby exactly separating character image and marking image, for follow-up realization exactly the identification character image guarantee is provided.
Ultimate principle of the present invention has below been described in conjunction with specific embodiments, but, also it is to be noted, for those of ordinary skill in the art, can understand whole or any steps or the parts of method and apparatus of the present invention, can be in the network of any calculation element (comprising processor, storage medium etc.) or calculation element, realized with hardware, firmware, software or their combination, this is those of ordinary skills in the situation that read the basic programming skill that explanation of the present invention uses them and just can realize.
Therefore, purpose of the present invention can also be by realizing in any program of calculation element operation or batch processing.Described calculation element can be known fexible unit.Therefore, purpose of the present invention also can be only by providing the program product that comprises the program code of realizing described method or device to realize.That is to say, such program product also consists of the present invention, and the storage medium that stores such program product also consists of the present invention.Obviously, described storage medium can be any storage medium that develops in any known storage medium or future.
In the situation that realize embodiments of the invention by software and/or firmware, from storage medium or network to the computing machine with specialized hardware structure, for example general purpose personal computer 700 shown in Figure 16 is installed the program that consists of this software, this computing machine can be carried out various functions etc. when various program is installed.
In Figure 16, CPU (central processing unit) (CPU) 701 carries out various processing according to the program of storage in the ROM (read-only memory) (ROM) 702 or from the program that storage area 708 is loaded into random access memory (RAM) 703.In RAM 703, also store as required data required when CPU 701 carries out various processing etc.CPU 701, ROM 702 and RAM 703 are connected to each other via bus 704.Input/output interface 705 also is connected to bus 704.
Following parts are connected to input/output interface 705: importation 706 comprises keyboard, mouse etc.; Output 707 comprises display, such as cathode-ray tube (CRT) (CRT), liquid crystal display (LCD) etc., and loudspeaker etc.; Storage area 708 comprises hard disk etc.; With communications portion 709, comprise that network interface unit is such as LAN card, modulator-demodular unit etc.Communications portion 709 is processed such as the Internet executive communication via network.
As required, driver 710 also is connected to input/output interface 705.Detachable media 711 is installed on the driver 710 as required such as disk, CD, magneto-optic disk, semiconductor memory etc., so that the computer program of therefrom reading is installed in the storage area 708 as required.
In the situation that realize above-mentioned series of processes by software, such as detachable media 711 program that consists of software is installed such as the Internet or storage medium from network.
It will be understood by those of skill in the art that this storage medium is not limited to shown in Figure 16 wherein has program stored therein, distributes separately to provide the detachable media 711 of program to the user with device.The example of detachable media 711 comprises disk (comprising floppy disk (registered trademark)), CD (comprising compact disc read-only memory (CD-ROM) and digital universal disc (DVD)), magneto-optic disk (comprising mini-disk (MD) (registered trademark)) and semiconductor memory.Perhaps, storage medium can be hard disk that comprises in ROM 702, the storage area 708 etc., computer program stored wherein, and be distributed to the user with the device that comprises them.
Also it is pointed out that in apparatus and method of the present invention, obviously, each parts or each step can decompose and/or reconfigure.These decomposition and/or reconfigure and to be considered as equivalents of the present invention.And, carry out the step of above-mentioned series of processes and can order naturally following the instructions carry out in chronological order, but do not need necessarily to carry out according to time sequencing.Some step can walk abreast or carry out independently of one another.
Although described the present invention and advantage thereof in detail, be to be understood that and in the situation that does not break away from the spirit and scope of the present invention that limited by appended claim, can carry out various changes, alternative and conversion.And, the application's term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thereby not only comprise those key elements so that comprise process, method, article or the device of a series of key elements, but also comprise other key elements of clearly not listing, or also be included as the intrinsic key element of this process, method, article or device.In the situation that not more restrictions, the key element that is limited by statement " comprising ... ", and be not precluded within process, method, article or the device that comprises described key element and also have other identical element.
Remarks
1. 1 kinds of character identifying methods of remarks comprise:
According to the position of the mark on the tab character in the character picture that will identify and the part marked pixels of the described mark of Shape Feature Extraction;
By comprising neighbor with equidirectional the part marked pixels of described extraction is expanded to the mark line segment;
Obtain the refined image of the described character picture that will identify;
The mark line segment of described expansion is grown to the mark of identification along the track of described refined image;
The mark of described identification is separated with described character picture; And
Identify the character picture of described separation.
Remarks 2. also comprises according to remarks 1 described character identifying method:
The candidate region of the described character picture that selection will be identified is as described tab character.
Remarks 3. is according to remarks 2 described character identifying methods, and wherein said selection candidate region comprises:
Always described text block is divided into character zone by the text block alternating projection in the described character picture that will identify to horizontal direction and Vertical Square;
Be categorized as contact area, large scale zone and normal size zone by the described character zone of cutting apart of the large young pathbreaker of the more described character zone of cutting apart; And
With described contact area and described large scale zone as described tab character.
Remarks 4. is according to remarks 3 described character identifying methods, and wherein said Extraction parts marked pixels comprises the part marked pixels of extracting outside the rectangle frame that comprises character.
Remarks 5. is according to remarks 4 described character identifying methods, and wherein said Extraction parts marked pixels comprises:
Select one group of candidate's marked pixels by the both sides side wave that separates respectively in the projection of horizontal direction and vertical direction;
Set up curve model with the described candidate's marked pixels of match group by utilizing the least square curve fitting method; And
The error of fitting of calculating described candidate's marked pixels group is to determine whether marked pixels of described candidate's marked pixels group.
Remarks 6. is according to remarks 3 described character identifying methods, and wherein said Extraction parts marked pixels comprises:
Estimate stroke width by analyzing the distance of swimming;
Contact the feature of passing through of fragment along the orthogonal directions inspection of contact direction; And
To have at ruler has the pixel on the width of two parts and the every part line segment that passes through feature suitable with described stroke width to be defined as marked pixels.
Remarks 7. is according to remarks 3 described character identifying methods, and wherein said Extraction parts marked pixels comprises:
For each tab character is determined reference character, described reference character is to be positioned at described tab character to go together mutually or those characters of same column;
Calculate reference coordinate according to described reference character; And
Be extracted in pixel outside the described reference coordinate scope pixel that serves as a mark.
Remarks 8. is according to remarks 7 described character identifying methods, wherein
When described reference character is during along horizontal direction, only the vertical coordinate with described reference character is used for calculating described reference coordinate; And
When described reference character is during along vertical direction, only the horizontal coordinate with described reference character is used for calculating described reference coordinate.
Remarks 9. is according to any one the described character identifying method in the remarks 1 to 8, and the part marked pixels of the described extraction of wherein said expansion comprises:
Obtain the directional diagram of described tab character; And
By the marked pixels of selecting before the pixel-expansion with identical value in the regional area that is included in described directional diagram.
Remarks 10. is according to any one the described character identifying method in the remarks 1 to 8, and the mark line segment of the described expansion of wherein said growth comprises:
Be included in one by one connection pixel in the track of described refined image until run into the abutment.
11. 1 kinds of character recognition devices of remarks comprise:
The marked pixels extraction unit is configured to according to the position of the mark on the tab character in the character picture that will identify and the part marked pixels of the described mark of Shape Feature Extraction;
Expanding element is configured to by comprising neighbor with equidirectional the part marked pixels of described extraction be expanded to the mark line segment;
The refined image acquiring unit is configured to obtain the refined image of the described character picture that will identify;
Mark line segment growing element is configured to along the track of described refined image the mark line segment of described expansion is grown to the mark of identification;
Separative element is configured to the mark of described identification is separated with described character picture; And
Recognition unit is configured to identify the character picture of described separation.
Remarks 12. also comprises according to remarks 11 described character recognition devices:
The tab character selected cell is configured to select the candidate region of the described character picture that will identify as described tab character.
Remarks 13. is according to remarks 12 described character recognition devices, and wherein said tab character selected cell also is configured to:
Always described text block is divided into character zone by the text block alternating projection in the described character picture that will identify to horizontal direction and Vertical Square;
Be categorized as contact area, large scale zone and normal size zone by the described character zone of cutting apart of the large young pathbreaker of the more described character zone of cutting apart; And
With described contact area and described large scale zone as described tab character.
Remarks 14. is according to remarks 13 described character recognition devices, and wherein said marked pixels extraction unit also is configured to extract the part marked pixels outside the rectangle frame that comprises character.
Remarks 15. is according to remarks 14 described character recognition devices, and wherein said marked pixels extraction unit also is configured to:
Select one group of candidate's marked pixels by the both sides side wave that separates respectively in the projection of horizontal direction and vertical direction;
Set up curve model with the described candidate's marked pixels of match group by utilizing the least square curve fitting method; And
The error of fitting of calculating described candidate's marked pixels group is to determine whether marked pixels of described candidate's marked pixels group.
Remarks 16. is according to remarks 13 described character recognition devices, and wherein said marked pixels extraction unit also is configured to:
Estimate stroke width by analyzing the distance of swimming;
Contact the feature of passing through of fragment along the orthogonal directions inspection of contact direction; And
To have at ruler has the pixel on the width of two parts and the every part line segment that passes through feature suitable with described stroke width to be defined as marked pixels.
Remarks 17. is according to remarks 13 described character recognition devices, and wherein said marked pixels extraction unit also is configured to:
For each tab character is determined reference character, described reference character is to be positioned at described tab character to go together mutually or those characters of same column;
Calculate reference coordinate according to described reference character; And
Be extracted in pixel outside the described reference coordinate scope pixel that serves as a mark.
Remarks 18. is according to remarks 17 described character recognition devices, wherein
When described reference character is during along horizontal direction, only the vertical coordinate with described reference character is used for calculating described reference coordinate; And
When described reference character is during along vertical direction, only the horizontal coordinate with described reference character is used for calculating described reference coordinate.
Remarks 19. is according to any one the described character recognition device in the remarks 11 to 18, and wherein said expanding element also is configured to:
Obtain the directional diagram of described tab character; And
By the marked pixels of selecting before the pixel-expansion with identical value in the regional area that is included in described directional diagram.
Remarks 20. is according to any one the described character recognition device in the remarks 11 to 18, and wherein said mark line segment growing element also is configured to be included in one by one connection pixel in the track of described refined image until run into the abutment.

Claims (9)

1.一种字符识别方法,包括:1. A character recognition method, comprising: 根据要识别的字符图像中标记字符上的标记的位置和形状特征提取所述标记的部分标记像素;extracting part of the marking pixels of the marking according to the position and shape features of the marking on the marking character in the character image to be recognized; 通过包含具有相同方向的相邻像素将所述提取的部分标记像素扩展为标记线段;expanding said extracted part of labeled pixels into a labeled line segment by including adjacent pixels having the same orientation; 获取要识别的所述字符图像的细化图像;Obtaining a thinned image of the character image to be recognized; 沿着所述细化图像的轨迹将所述扩展的标记线段生长为识别的标记;growing said expanded marker line segments into recognized markers along a trajectory of said thinned image; 将所述识别的标记与所述字符图像分离;以及separating the identified indicia from the character image; and 识别所述分离的字符图像。The separated character images are identified. 2.根据权利要求1所述的字符识别方法,还包括:2. The character recognition method according to claim 1, further comprising: 通过将要识别的所述字符图像中的文本块交替投影到水平方向和垂直方向来将所述文本块分割为字符区域;segmenting the text blocks into character regions by alternately projecting the text blocks in the character image to be recognized into horizontal and vertical directions; 通过比较所述分割的字符区域的大小将所述分割的字符区域分类为接触区域、大尺寸区域和正常尺寸区域;以及classifying the segmented character regions into a contact region, a large-sized region, and a normal-sized region by comparing sizes of the segmented character regions; and 将所述接触区域和所述大尺寸区域作为所述标记字符。The contact area and the large-size area are used as the marking characters. 3.根据权利要求2所述的字符识别方法,其中所述提取部分标记像素包括:3. The character recognition method according to claim 2, wherein said extracting part of marked pixels comprises: 通过分别分离沿着水平方向和垂直方向的投影中的两侧边波选择出一组候选标记像素;Select a set of candidate marker pixels by separating the side waves in the projections along the horizontal and vertical directions, respectively; 通过利用最小二乘曲线拟合方法建立曲线模型以拟合所述候选标记像素组;以及building a curve model to fit the set of candidate marker pixels by utilizing a least squares curve fitting method; and 计算所述候选标记像素组的拟合误差以确定所述候选标记像素组是否是标记像素。A fitting error for the set of candidate marker pixels is calculated to determine whether the set of candidate marker pixels is a marker pixel. 4.根据权利要求2所述的字符识别方法,其中所述提取部分标记像素包括:4. The character recognition method according to claim 2, wherein said extracting part of marked pixels comprises: 通过分析游程估计笔划宽度;Estimate stroke width by analyzing runs; 沿着接触方向的正交方向检查接触碎片的穿越特征;以及Examine the traversing characteristics of contact debris along a direction orthogonal to the contact direction; and 将具有在穿越线上有两个部分且每一部分的宽度与所述笔划宽度相当的穿越特征的线段上的像素确定为标记像素。A pixel on a line segment having a traversing feature having two parts on the traversing line, each having a width comparable to the stroke width, is determined as a marker pixel. 5.根据权利要求2所述的字符识别方法,其中所述提取部分标记像素包括:5. The character recognition method according to claim 2, wherein said extracting part of marked pixels comprises: 为每一个标记字符确定参考字符,所述参考字符是与所述标记字符位于相同行或相同列的那些字符;determining reference characters for each marking character, said reference characters being those characters located in the same row or in the same column as said marking character; 根据所述参考字符计算基准坐标;以及calculating datum coordinates based on said reference character; and 提取在所述基准坐标范围之外的像素作为标记像素。Pixels outside the reference coordinate range are extracted as marker pixels. 6.根据权利要求1至5中的任意一个所述的字符识别方法,其中所述扩展所述提取的部分标记像素包括:6. The character recognition method according to any one of claims 1 to 5, wherein said extending the extracted part of the marked pixels comprises: 获取所述标记字符的方向图;以及obtaining a directional map of the marked characters; and 通过包含在所述方向图的局部区域中的具有相同值的像素扩展之前选择的标记像素。A previously selected marker pixel is expanded by pixels with the same value contained in the local area of the orientation map. 7.根据权利要求1至5中的任意一个所述的字符识别方法,其中所述生长所述扩展的标记线段包括:7. The character recognition method according to any one of claims 1 to 5, wherein said growing the extended marking line segment comprises: 逐个包含在所述细化图像的轨迹中的连接像素直到遇到接合点为止。The connected pixels in the trajectory of the thinned image are included one by one until a junction is encountered. 8.一种字符识别装置,包括:8. A character recognition device, comprising: 标记像素提取单元,配置为根据要识别的字符图像中标记字符上的标记的位置和形状特征提取所述标记的部分标记像素;A marker pixel extraction unit configured to extract part of the marker pixels of the marker according to the position and shape characteristics of the marker on the marker character in the character image to be recognized; 扩展单元,配置为通过包含具有相同方向的相邻像素将所述提取的部分标记像素扩展为标记线段;an expansion unit configured to expand the extracted part of the marked pixels into a marked line segment by including adjacent pixels having the same direction; 细化图像获取单元,配置为获取要识别的所述字符图像的细化图像;a thinned image acquisition unit configured to acquire a thinned image of the character image to be recognized; 标记线段生长单元,配置为沿着所述细化图像的轨迹将所述扩展的标记线段生长为识别的标记;a marker segment growing unit configured to grow the expanded marker segment into recognized markers along the trajectory of the thinned image; 分离单元,配置为将所述识别的标记与所述字符图像分离;以及a separating unit configured to separate the recognized mark from the character image; and 识别单元,配置为识别所述分离的字符图像。A recognition unit configured to recognize the separated character images. 9.根据权利要求8所述的字符识别装置,还包括:9. The character recognition device according to claim 8, further comprising: 标记字符选择单元,配置为:Mark character selection unit, configured as: 通过将要识别的所述字符图像中的文本块交替投影到水平方向和垂直方向来将所述文本块分割为字符区域;segmenting the text blocks into character regions by alternately projecting the text blocks in the character image to be recognized into horizontal and vertical directions; 通过比较所述分割的字符区域的大小将所述分割的字符区域分类为接触区域、大尺寸区域和正常尺寸区域;以及classifying the segmented character regions into a contact region, a large-sized region, and a normal-sized region by comparing sizes of the segmented character regions; and 将所述接触区域和所述大尺寸区域作为所述标记字符。The contact area and the large-size area are used as the marking characters.
CN 200910173692 2009-09-15 2009-09-15 Character identification method and character identification device Expired - Fee Related CN102024138B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN 200910173692 CN102024138B (en) 2009-09-15 2009-09-15 Character identification method and character identification device
JP2010200193A JP2011065643A (en) 2009-09-15 2010-09-07 Method and apparatus for character recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200910173692 CN102024138B (en) 2009-09-15 2009-09-15 Character identification method and character identification device

Publications (2)

Publication Number Publication Date
CN102024138A CN102024138A (en) 2011-04-20
CN102024138B true CN102024138B (en) 2013-01-23

Family

ID=43865419

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200910173692 Expired - Fee Related CN102024138B (en) 2009-09-15 2009-09-15 Character identification method and character identification device

Country Status (2)

Country Link
JP (1) JP2011065643A (en)
CN (1) CN102024138B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102184396A (en) * 2011-06-13 2011-09-14 北方工业大学 Document image tilt correction method based on OCR recognition feedback
CN102867178B (en) * 2011-07-05 2015-06-10 富士通株式会社 Method and device for Chinese character recognition
CN102567725A (en) * 2011-12-23 2012-07-11 国网电力科学研究院 Soft segmentation method of financial OCR system handwritten numerical strings
JP6089401B2 (en) * 2012-01-06 2017-03-08 富士ゼロックス株式会社 Image processing apparatus, designated mark estimation apparatus, and program
CN104021385B (en) * 2013-03-02 2017-11-21 北京信息科技大学 Video caption thinning method based on template matches and curve matching
US9087272B2 (en) 2013-07-17 2015-07-21 International Business Machines Corporation Optical match character classification
CN106845473B (en) * 2015-12-03 2020-06-02 富士通株式会社 Method and device for determining whether image is image with address information
CN109542285A (en) * 2018-11-16 2019-03-29 北京小米移动软件有限公司 Image processing method and device
DE102019211984A1 (en) * 2019-08-09 2021-02-11 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Device, method for controlling the same and device network or swarm

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1066335A (en) * 1992-05-12 1992-11-18 浙江大学 Character Recognition Method and System
CN1276077A (en) * 1997-09-15 2000-12-06 卡艾尔公司 Automatic language identification system for multilingual optical character recognition
CN1347060A (en) * 2000-10-04 2002-05-01 富士通株式会社 Word identifying device and method, and memory medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1066335A (en) * 1992-05-12 1992-11-18 浙江大学 Character Recognition Method and System
CN1276077A (en) * 1997-09-15 2000-12-06 卡艾尔公司 Automatic language identification system for multilingual optical character recognition
CN1347060A (en) * 2000-10-04 2002-05-01 富士通株式会社 Word identifying device and method, and memory medium

Also Published As

Publication number Publication date
CN102024138A (en) 2011-04-20
JP2011065643A (en) 2011-03-31

Similar Documents

Publication Publication Date Title
CN102024138B (en) Character identification method and character identification device
CN111460927B (en) Method for extracting structured information of house property evidence image
JP2951814B2 (en) Image extraction method
Mi et al. A two-stage approach for road marking extraction and modeling using MLS point clouds
Zahour et al. Text line segmentation of historical arabic documents
CN106446769A (en) Systems and techniques for sign based localization
CN104751187A (en) Automatic meter-reading image recognition method
CN107392141A (en) A kind of airport extracting method based on conspicuousness detection and LSD straight-line detections
JP2005523530A (en) System and method for identifying and extracting character string from captured image data
CN102479332A (en) Image processing apparatus, image processing method and computer-readable medium
CN113095267B (en) Data extraction method of statistical chart, electronic device and storage medium
CN108509950B (en) Railway contact net support number plate detection and identification method based on probability feature weighted fusion
CN117437647B (en) Oracle bone text detection method based on deep learning and computer vision
CN116824608A (en) Answer sheet layout analysis method based on target detection technology
CN112241730A (en) Form extraction method and system based on machine learning
CN111507287A (en) Method and system for extracting road zebra crossing corner points in aerial image
CN115273108A (en) Artificial intelligence recognition automatic collection method and system
CN117373050B (en) Method for identifying drawing pipeline with high precision
CN111461132A (en) Method and device for assisting in labeling OCR image data
KR101411893B1 (en) Automatic Recognition Method of Direction Information in Road Sign Image
Milleville et al. Improving toponym recognition accuracy of historical topographic maps
JP2010020421A (en) Character recognizing apparatus, character recognizing method, computer program, and storage medium
CN110728723B (en) Automatic road extraction method for tile map
KR100834602B1 (en) Character Recognition Device and Character Recognition Method
Ma et al. Automated extraction of driving lines from mobile laser scanning point clouds

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20130123

Termination date: 20130915