CN1186744C - Chinese character recognizing method based on structure model - Google Patents
Chinese character recognizing method based on structure model Download PDFInfo
- Publication number
- CN1186744C CN1186744C CNB021259496A CN02125949A CN1186744C CN 1186744 C CN1186744 C CN 1186744C CN B021259496 A CNB021259496 A CN B021259496A CN 02125949 A CN02125949 A CN 02125949A CN 1186744 C CN1186744 C CN 1186744C
- Authority
- CN
- China
- Prior art keywords
- stroke
- model
- strokes
- standard
- recognized
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 59
- 239000011159 matrix material Substances 0.000 claims abstract description 61
- 230000008569 process Effects 0.000 claims description 6
- 230000005477 standard model Effects 0.000 claims 6
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 230000010365 information processing Effects 0.000 abstract description 2
- 238000013178 mathematical model Methods 0.000 abstract description 2
- 230000007246 mechanism Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 4
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000013145 classification model Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000012634 fragment Substances 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
Images
Landscapes
- Character Discrimination (AREA)
Abstract
The present invention relates to a Chinese character recognition method based on a structure model, which belongs to the fields of mode recognition, artificial intelligence and Chinese information processing. The present invention uses the two primitives of a stroke segment and a stroke to respectively establish two mathematical models for describing a Chinese character structure, namely a central point model of the stroke segment and a relational matrix model of the stroke, and a central point recognition method of the stroke segment and a relational matrix recognition method of the stroke are established. The central point recognition method of the stroke segment is combined with the matrix recognition method of the stroke, the central point recognition method of the stroke segment is used for the rough sort of Chinese character recognition, and the matrix recognition method of the stroke is used for the fine sort of the Chinese character recognition to form a set of integral Chinese character recognition methods. Printed Chinese character recognition and handwritten Chinese character recognition are processed by a uniform mechanism, and the present invention not only can be used for off-line recognition, but also can be used for online recognition. The present invention has the advantages of high recognition accuracy and stable performance.
Description
Technical field
The present invention relates to the Chinese characters recognition method based on structural model, claimed technical scheme belongs to pattern-recognition, artificial intelligence and Chinese information processing field.
Background technology
Through the development of decades, Chinese character recognition technology made great progress already.But unconfined Handwritten Chinese Character Recognition, particularly Off-line Handwritten Chinese Character Recognition also have certain distance apart from people's expectation.In order to solve this problem of Off-line Handwritten Chinese Character Recognition, at present statistical method and neural net methods of adopting by the study to a large amount of handwritten Chinese character samples, reach the purpose that adapts to the Chinese character distortion more.This method need be collected the magnanimity sample and spend huge learning time, but effect is not very good.Structural approach is strong to the adaptive faculty of distortion, does not collect sample and the burden learnt, though that existing structural approach is obtained in the Online Handwritten Chinese Character Recognition is quite successful, but is difficult to apply in the off line Chinese Character Recognition field and goes.
Summary of the invention
Technical matters to be solved by this invention provides a kind of structural approach of effective identification Chinese character, this method recognition correct rate height, good stability, both can be used for Handwritten Chinese Character Recognition, also can be used for printed Chinese character identification, both can be used for the off line Chinese Character Recognition, also can be used for online Chinese Character Recognition.
Matter of utmost importance with structural approach identification Chinese character is to set up the structural model of Chinese character image.The invention provides two kinds of mathematical models that are used for Description of Chinese Character Structure: sub-stroke center model and stroke relation matrix model.
The sub-stroke center model serves as to form the primitive of Chinese character with the pen section, describes Chinese character by the type and the position of pen section.Here, pen section refers in the Chinese character image set of a foreground pixel understanding horizontal, vertical, that cast aside, press down four kinds of basic strokes (other stroke can be combined by these four kinds of basic strokes) being consistent with people.Being expressed as follows of sub-stroke center model:
1) segment type
According to the direction vector of pen section correspondence, be divided into horizontal, vertical, cast aside, press down four kinds.
2) fragment position
Fragment position is represented by the mid point Euclidean coordinate of pen section, is referred to as center point coordinate.This coordinate is tried to achieve on the standardization Chinese character image.
3) model constitutes
H={(X
i,Y
i,T
i)},i=1,2,…,N (1)
Wherein, H represents Chinese character, X
iBe the central point abscissa value of i pen section, Y
iBe the central point ordinate value of i pen section, T
iThe type of representing i pen section, value be horizontal, vertical, cast aside, press down one of four kinds, N is for forming the pen section number of Chinese character.
Formula (1) illustrates, if a standardization Chinese character image is determined on the position (by X at all
iAnd Y
iDetermine) definite type is all arranged (by T
iDetermine) the pen section, then this image is exactly a certain Chinese character (being determined by H), otherwise then is not.
Based on the sub-stroke center model, the invention provides following Chinese characters recognition method, this method is called as the sub-stroke center method of identification.
At first determine the pairing standard sub-stroke center of each Chinese character classification model.During identification, calculating the distance between the pairing sub-stroke center model of Chinese character to be identified and all standard sub-stroke center models, is recognition result with classification under classification under the distance reckling or the inferior little top n.The computing formula of distance is as follows:
Wherein, D (SP, RP) expression center for standard point set and wait distance between knowing central point gathers, Q represents the set of center for standard point and waits to know the maximum number of the pen section that can mate between the central point set, I represents the pen section number of center for standard point set, J represents to wait to know the pen section number of central point set, and the remaining later on pen section number of pen section that is considered to connect pen in matching process is removed in J ' expression from the input pen section is gathered.(G
iX, G
iY) center point coordinate of gathering for center for standard point, (H
jX, H
jY) for waiting to know the center point coordinate of central point set, MS
iExpression with center for standard point set in before the cross-talk collection of waiting of section being complementary of i-1 pen during knowing central point gathers, Simi (ST
i, PT
j) type and the similarity of waiting to know in the central point set j section type of i pen section in the expression center for standard point set, V is the threshold value of the pen section number difference that allowed, T is for giving the threshold value of the ultimate range that section is given that can not mate, and W is the threshold value of the minor increment between the section that allows coupling.
The concrete steps of sub-stroke center method of identification are as follows:
(1) the standard sub-stroke center of setting up each Chinese character is gathered;
(2) will wait to know standardization of Chinese characters, extract all sections in the Chinese character to be identified then, form central point set to be identified to normal size;
(3) by formula (2) calculate the distance of each center for standard point set between gathering with central point to be identified, and with as the distance between each standard Chinese character and the Chinese character to be identified;
(4) in all standard Chinese characters, get and Chinese character to be identified between be recognition result apart from reckling or inferior little top n.
The stroke relation matrix model is the primitive of forming Chinese character with the stroke, concerns by the type of stroke and position each other and describes Chinese character.Here, stroke is meant the common Chinese character stroke of being familiar with of people.The concrete form of stroke relation matrix model is:
(1) type of stroke
See accompanying drawing 1
(2) relation of the mutual alignment between the stroke
For represent as much as possible one between the various forms of Chinese character general character and ignore the factor that those might produce violent change, we turn to six kinds with the mutual alignment between each stroke relation is fuzzy: upper and lower, left and right, intersection, link to each other.
(3) built-up pattern
Because Chinese character image is two-dimentional,, stroke and mutual alignment relation thereof can reflect its architectural feature more accurately so expressing with two-dimensional approach.We adopt the form of matrix to describe:
S
1 S
2 ..... S
N-1 S
N
S
1 R
11 R
12 ..... R
1(N-1) R
1N
S
2 R
21 R
22 ..... R
2(N-1) R
2N
..... ..... .... ..... ...... .....
S
N-1 R
(N-1)1 R
(N-1)2 ..... R
(N-1)(N-1) R
(N-1)N
S
N R
N1 R
N2 ..... R
N(N-1) R
NN
Wherein, S represents stroke, and R representation relation, N are represented the stroke number.S
1~S
NRepresent the meaning of row or column, i.e. stroke type, R
11~R
NNBe matrix element, row that expression is corresponding with it and the mutual alignment that lists between two strokes concern.
Based on the stroke relation matrix model, the invention provides following Chinese characters recognition method, this method is called as the stroke relation matrix method of identification:
At first determine the pairing standard stroke relational matrix of each Chinese character classification model.During identification, calculate the similarity between Chinese character to be identified pairing pen section set and all standard stroke relational matrix models.With classification under similarity value the maximum is recognition result.The computing formula of similarity value is as follows:
Wherein, S (SP, RP) expression canonical matrix and wait to know similarity between the matrix, the pen section number that BN (SP) expression is corresponding with canonical matrix, BN (RP) represents and waits to know the corresponding pen section number of matrix, BN (RP ') expression from wait to know the matrix correspondence and matching process, remove and be considered to connect remaining pen section number after the pen section of pen, SS (S
k, T
k) k stroke and wait to know in the matrix similarity (k is i or j) on the type between k the stroke, RS (R in the expression canonical matrix
Ij, G
Ij) in the expression canonical matrix the capable j column element of i with wait to know the similarity between the capable j column element of i in the matrix, V is the threshold value of the pen section number difference that allowed.
The concrete steps of stroke relation matrix method of identification are as follows:
(1) sets up the standard stroke relational matrix model of each Chinese character.
(2) with standardization of Chinese characters to be identified to normal size, extract all sections in the Chinese character to be identified then, form the set of input pen section.
(3) by formula (3) calculate the similarity between the set of each canonical matrix and input pen section, and with as the similarity between each standard Chinese character and the Chinese character to be identified.
(4) in all standard Chinese characters, get and Chinese character to be identified between one of the similarity maximum be recognition result.
Sub-stroke center method of identification and stroke relation matrix method of identification respectively have characteristics, and the stroke relation matrix method of identification is more accurate, and sub-stroke center method of identification speed is faster.Therefore, Chinese characters recognition method provided by the invention adopts the sub-stroke center method of identification to carry out rough sort, adopts the stroke relation matrix method of identification to carry out disaggregated classification.Simultaneously, the accuracy that the sub-stroke center method of identification is discerned the Chinese character of shape comparison standard also is gratifying, therefore, when enforcement the present invention discerns the Chinese character of shape comparison standard, can adopt the sub-stroke center method of identification to carry out disaggregated classification separately.
The present invention has the following advantages:
1, Chinese characters recognition method provided by the invention carries out Chinese Character Recognition with unified mechanism, both can be used for off line identification, also can be used for off line identification, both can be used for handwritten form identification, also can be used for block letter identification.
2, Chinese characters recognition method recognition correct rate height provided by the invention, strong to the adaptive faculty of distortion, good stability.
Description of drawings
Fig. 1 is the stroke type figure in the stroke relation matrix model;
Fig. 2 is the synoptic diagram of sub-stroke center model;
Fig. 3 is the synoptic diagram of stroke relation matrix model;
Fig. 4 is the The general frame of Chinese characters recognition method
Fig. 5 is the Chinese Character Recognition process flow diagram of pen section center identification method;
Fig. 6 is the Chinese Character Recognition process flow diagram of stroke relation matrix method of identification;
Embodiment
Invention can be implemented in the various occasions that need carry out Chinese Character Recognition, optimal way is Online Handwritten Chinese Character Recognition System and device, off line printed Chinese characters recognition system and device, Off-line Handwritten Chinese Character Recognition system and device.Embodiment, in 6763 Chinese character scopes of GB2312-80 regulation, unrestricted free handwritten Chinese character is discerned, the accuracy of sub-stroke center sorter identification top ten candidate is more than 99%, average recognition speed is 1 a second/word, the recognition correct rate of stroke relation matrix sorter is more than 91.2%, and average recognition speed is 0.2 a second/word.
Claims (7)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB021259496A CN1186744C (en) | 2002-08-06 | 2002-08-06 | Chinese character recognizing method based on structure model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB021259496A CN1186744C (en) | 2002-08-06 | 2002-08-06 | Chinese character recognizing method based on structure model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1474351A CN1474351A (en) | 2004-02-11 |
CN1186744C true CN1186744C (en) | 2005-01-26 |
Family
ID=34143156
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB021259496A Expired - Fee Related CN1186744C (en) | 2002-08-06 | 2002-08-06 | Chinese character recognizing method based on structure model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN1186744C (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1315090C (en) * | 2005-02-08 | 2007-05-09 | 华南理工大学 | Method for identifying hand-writing characters |
CN102375994B (en) * | 2010-08-10 | 2013-05-29 | 广东因豪信息科技有限公司 | Method and device for detecting and reducing correctness of order of strokes of written Chinese character |
CN107844740A (en) * | 2017-09-05 | 2018-03-27 | 中国地质调查局西安地质调查中心 | A kind of offline handwriting, printing Chinese character recognition methods and system |
CN110909563B (en) * | 2018-09-14 | 2023-07-28 | 新方正控股发展有限责任公司 | Method, apparatus, device and computer readable storage medium for extracting text skeleton |
CN109740415B (en) * | 2018-11-19 | 2021-02-09 | 深圳市华尊科技股份有限公司 | Vehicle attribute identification method and related product |
-
2002
- 2002-08-06 CN CNB021259496A patent/CN1186744C/en not_active Expired - Fee Related
Also Published As
Publication number | Publication date |
---|---|
CN1474351A (en) | 2004-02-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110116415B (en) | Bottle and tank garbage identification and classification robot based on deep learning | |
CN107679078B (en) | Bayonet image vehicle rapid retrieval method and system based on deep learning | |
CN108154102B (en) | Road traffic sign identification method | |
CN103761531B (en) | The sparse coding license plate character recognition method of Shape-based interpolation contour feature | |
CN105975968B (en) | A kind of deep learning license plate character recognition method based on Caffe frame | |
CN110147794A (en) | A kind of unmanned vehicle outdoor scene real time method for segmenting based on deep learning | |
CN1315090C (en) | Method for identifying hand-writing characters | |
CN111967313B (en) | Unmanned aerial vehicle image annotation method assisted by deep learning target detection algorithm | |
CN111738367B (en) | Part classification method based on image recognition | |
CN102163287A (en) | Method for recognizing characters of licence plate based on Haar-like feature and support vector machine | |
CN105574540B (en) | A Pest Image Feature Learning and Automatic Classification Method Based on Unsupervised Learning Technology | |
CN111523622B (en) | Handwriting simulation method of mechanical arm based on feature image self-learning | |
CN112270681A (en) | Method and system for detecting and counting yellow plate pests deeply | |
CN107273889B (en) | License plate recognition method based on statistics | |
CN109325487B (en) | Full-category license plate recognition method based on target detection | |
CN1186744C (en) | Chinese character recognizing method based on structure model | |
CN107577994A (en) | A recognition and retrieval method for pedestrians and vehicle accessories based on deep learning | |
CN117854036A (en) | Water surface obstacle detection method based on improved YOLOv3 | |
CN1025764C (en) | Characters recognition method and system | |
CN110968735B (en) | An Unsupervised Person Re-ID Method Based on Spherical Similarity Hierarchical Clustering | |
CN210161172U (en) | Bottle and can type garbage identification and classification robot based on deep learning | |
CN1790374A (en) | Face recognition method based on template matching | |
CN114926691A (en) | Insect pest intelligent identification method and system based on convolutional neural network | |
CN115049881A (en) | Ceramic fragment classification method based on convolutional neural network | |
CN114612718B (en) | Small sample image classification method based on graph structural feature fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
C19 | Lapse of patent right due to non-payment of the annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |