CN1093966C - Apparatus and method for character identification - Google Patents
Apparatus and method for character identification Download PDFInfo
- Publication number
- CN1093966C CN1093966C CN98108373A CN98108373A CN1093966C CN 1093966 C CN1093966 C CN 1093966C CN 98108373 A CN98108373 A CN 98108373A CN 98108373 A CN98108373 A CN 98108373A CN 1093966 C CN1093966 C CN 1093966C
- Authority
- CN
- China
- Prior art keywords
- stroke
- mentioned
- strokes
- feature
- corresponding connection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims description 26
- 238000011156 evaluation Methods 0.000 claims abstract description 14
- 239000000284 extract Substances 0.000 claims abstract description 9
- 238000000605 extraction Methods 0.000 claims description 31
- 230000011218 segmentation Effects 0.000 claims description 5
- 230000015572 biosynthetic process Effects 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 33
- 238000012545 processing Methods 0.000 description 30
- 238000005070 sampling Methods 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 4
- 238000013507 mapping Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 230000007423 decrease Effects 0.000 description 3
- 238000007781 pre-processing Methods 0.000 description 3
- 230000000295 complement effect Effects 0.000 description 2
- 238000012217 deletion Methods 0.000 description 2
- 230000037430 deletion Effects 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 239000000470 constituent Substances 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000013515 script Methods 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/32—Digital ink
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/142—Image acquisition using hand-held instruments; Constructional details of the instruments
- G06V30/1423—Image acquisition using hand-held instruments; Constructional details of the instruments the instrument generating sequences of position coordinates corresponding to handwriting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Character Discrimination (AREA)
Abstract
目的在于高精度且高速识别文字。由输入单元1输入手写文字(输入图形),由笔划提取单元2提取构成输入手写文字的笔划特征,由笔划分类单元4根据该特征将笔划分类为直线笔划、非直线笔划和后继字笔画,将由直线笔划对应连接单元5、非直线笔划对应连接单元7和后继字笔划对应连接单元8进行分类的各笔划的特征和笔划特征辞典3中存储的识别对象文字(标准图形)的笔划特征对应连接,由文字评价单元10根据该结果识别手写文字。
The purpose is to recognize characters with high precision and high speed. Input handwritten character (input figure) by input unit 1, extract the stroke feature that forms input handwritten character by stroke extracting unit 2, by stroke classifying unit 4, stroke is classified into linear stroke, non-linear stroke and successor character stroke according to this feature, will be by Straight-line strokes correspond to connection unit 5, non-linear strokes correspond to connection unit 7 and successor word strokes correspond to connection unit 8 to classify the feature of each stroke and the stroke feature of the recognition object character (standard figure) stored in the stroke feature dictionary 3 to connect correspondingly, Handwritten characters are recognized by the character evaluation unit 10 based on the result.
Description
本发明涉及读取用笔在输入板上所写的手写文字并对手写体进行文字识别,涉及一种文字识别装置及文字识别方法,该方法是将手写文字分成构成文字的各笔划,根据分类的笔划的特征来识别所识别的对象文字。The present invention relates to reading handwritten characters written on an input board with a pen and performing character recognition on the handwritten characters. It relates to a character recognition device and a character recognition method. The characteristics of strokes are used to identify the recognized object text.
先有的吸收笔顺变化及画数变化二者的在线手写文字识别装置以特征点串记述手写文字的笔划,笔划结合是根据按笔顺进行的假定,使用过剩对应解除型和不足对应解除型的两个互补的最优对应连接算法,通过使包括笔顺·画数变化的手写文字图形的笔划和由笔顺·画数合乎标准的楷体表示的标准文字图形的笔划一一对应来吸收笔顺变化,对于未对应的笔划,利用在前后笔划结合时的距离最小的对应连接来吸收画数变化,进行文字识别(由笔划对应连接进行的在线手写汉字识别,NTT R&D第45卷第11期,1996)。The prior online handwritten character recognition device that absorbs both the change of stroke order and the change of the number of strokes describes the strokes of handwritten characters with feature point strings. The complementary optimal corresponding connection algorithm absorbs stroke order changes by making one-to-one correspondence between the strokes of handwritten characters and graphics including changes in stroke order and number of strokes and the strokes of standard characters and graphics represented by standard scripts with stroke order and number of strokes. , use the corresponding connection with the smallest distance when the strokes are combined to absorb the change in the number of strokes, and perform character recognition (online handwritten Chinese character recognition by stroke-corresponding connections, NTT R&D Vol. 45 No. 11, 1996).
例如,图34是表示先有例中所示的吸收笔顺·画数变化的先有装置的结构框图。图34中,201是输入用笔在输入板上所写的手写文字即输入图形的输入单元,202是进行输入图形的样本化和位置大小等规范化的前处理单元,203是将构成输入图形的各笔划以等间隔刻上的点作为特征点提取的特征点提取单元,204是存储了由以标准笔顺·画数所写的多个楷体文字制作的平均文字图形即标准图形的标准图形字典,205是计算输入图形的笔划和标准图形的笔划的笔划间距离的笔划间距离计算单元,206是利用两种互补的搜索算法使输入图形的笔划与标准图形的笔划一一对应连接而吸收笔顺变化的笔划一一对应连接单元,207是通过将标准图形中未对应的笔划与前后笔划结合来吸收画数变化的选择笔画结合单元,208是根据对应连接结果来计算图形间距离的图形间距离计算单元,209是控制各单元的控制单元。For example, FIG. 34 is a block diagram showing the structure of a conventional device for absorbing changes in the order of strokes and the number of strokes shown in the prior art. Among Fig. 34, 201 is the input unit that inputs the handwritten characters written on the input board with a pen, i.e. the input graphics, 202 is a preprocessing unit that samples the input graphics and normalizes the position size, etc., and 203 is the input graphics that will be formed Each stroke is engraved at equal intervals as a feature point extraction unit for feature point extraction. 204 is a standard graphic dictionary that has stored the average character graphics that are made by a plurality of italic characters written with standard stroke order and number of strokes, that is, standard graphics. 205 It is an inter-stroke distance calculation unit for calculating the distance between the strokes of the input graphics and the strokes of the standard graphics. 206 uses two complementary search algorithms to connect the strokes of the input graphics and the strokes of the standard graphics in one-to-one correspondence to absorb stroke order changes Stroke-to-one correspondence connection unit, 207 is the selection stroke combination unit that absorbs the change in the number of strokes by combining uncorresponding strokes in the standard graphics with the front and rear strokes, 208 is the inter-graphic distance calculation unit that calculates the distance between graphics according to the corresponding connection results, 209 is a control unit for controlling each unit.
使用图34的框图说明先有例的动作。首先,控制单元209指示输入单元201取得输入图形。The operation of the conventional example will be described using the block diagram of Fig. 34 . First, the control unit 209 instructs the input unit 201 to obtain an input pattern.
接着,控制单元209将由输入单元201输入的输入图形送到前处理单元202,进行样本化和位置大小的标准化。Next, the control unit 209 sends the input graphics input by the input unit 201 to the pre-processing unit 202 for sampling and standardization of position and size.
控制单元209将前处理后的输入图形送到特征点提取单元203,将输入图形变换成以等间隔在输入图形的各笔划上刻的特征点串。The control unit 209 sends the pre-processed input graphic to the feature point extraction unit 203, and converts the input graphic into a string of feature points engraved on each stroke of the input graphic at equal intervals.
控制单元209将输入图形的特征点串送往一一对应笔画连接单元206。一一对应笔画连接单元206以笔划数少的一方的笔划为基准,将输入图形和标准图形字典204的标准图形进行对应连接,使各特征点没有过剩的对应连接(过剩对应解除型的对应连接算法)。The control unit 209 sends the feature point string of the input figure to the one-to-one stroke connection unit 206 . The one-to-one correspondence stroke connection unit 206 is based on the stroke of the side with the few strokes as a reference, and the input graphics and the standard graphics of the standard graphics dictionary 204 are connected correspondingly, so that each feature point does not have redundant corresponding connections (corresponding connections of the excess corresponding cancellation type) algorithm).
接着,一一对应笔划连接单元206以笔划数多的一方的笔划为基准,将输入图形和标准图形字典204的标准图形进行对应连接,使各特征点没有不足的对应连接(不足对应解除型的对应连接算法)。Then, the one-to-one correspondence stroke connection unit 206 is based on the stroke of the side with the largest number of strokes, and the input graphics and the standard graphics of the standard graphics dictionary 204 are connected correspondingly, so that each feature point does not have insufficient corresponding connections (insufficient corresponding release type) corresponding to the join algorithm).
在由上述两种对应连接的算法所得的对应连接的结果中,将距离小的结果作为由一一对应笔划连接单元206最终得到的结果。Among the corresponding connection results obtained by the above two corresponding connection algorithms, the result with the smaller distance is taken as the final result obtained by the one-to-one corresponding stroke connection unit 206 .
这里,在对应连接时由笔划间距离计算单元205计算的距离中,采用将始点间及终点间的距离之和被二除所得的值即端点匹配距离,以及使点数少的一方的笔划的点从点数多的一方的笔划开头的点开始顺序地对应连接、算出点间距离的和后乘以点数比的值的部分匹配。Here, among the distances calculated by the stroke-to-stroke distance calculating section 205 at the time of corresponding connection, the value obtained by dividing the sum of the distances between the start points and the end points, that is, the end point matching distance, and the point of the stroke with the smaller number of points are used. Partial matching is performed by sequentially correspondingly connecting the first point of the stroke with the larger number of points, and multiplying the sum of the distance between the points by the point ratio.
接着,选择的笔划结合单元207使由一一对应笔划连接单元206对应连接的笔划与笔划多的一方的笔顺一致后改变行。对一一对应笔划连接单元206未对应连接的笔划,根据「按照笔顺生成笔划结合」的假设来进行笔划结合。Next, the selected stroke combining unit 207 matches the order of the strokes connected by the one-to-one corresponding stroke connecting unit 206 with the stroke order of the side with more strokes, and changes the line. For the strokes that are not connected by the one-to-one corresponding stroke connection unit 206, stroke combination is performed based on the assumption of "generating stroke combination according to the stroke order".
具体地说,对于比由一一对应笔划连接单元206对应连接的笔划的开头笔划先书写的未对应的笔画,依笔顺顺序与开头笔划结合。同样,对比最终笔划还后写的未对应的笔划,依笔划顺序与最终笔划结合。其他未对应笔划存在于由一一对应笔划连接单元206对应连接的笔划的任意笔划间,所以,将未对应的笔划与前后的笔划暂时结合,通过使两个笔划间的距离为最小的地方断开来分成两个笔划。Specifically, for uncorresponding strokes written earlier than the first strokes of the strokes correspondingly connected by the one-to-one corresponding stroke connecting unit 206, they are combined with the first strokes in stroke order. Similarly, the uncorresponding strokes written after comparing with the final strokes are combined with the final strokes according to the order of the strokes. Other non-corresponding strokes exist between arbitrary strokes of the strokes correspondingly connected by the one-to-one corresponding stroke connecting unit 206. Therefore, the uncorresponding strokes are temporarily combined with the preceding and following strokes, and the distance between the two strokes is minimized. Open to split into two strokes.
这里,在选择的笔划结合单元207中,以等间隔去掉点数多的一方的笔划的点,使之与点数少的一方的点数一致后对应连接,计算点间距离的和,使用用点数少的一方的点数除得的值即整体匹配距离。然而,为了高速处理,在大分类处理中使用端点匹配距离。Here, in the selected stroke combination unit 207, remove the dots of the stroke with more dots at equal intervals, make it consistent with the dots of the less dot side, and then connect correspondingly, calculate the sum of the distances between points, and use the dot with a few dots. The value divided by the number of points on one side is the overall matching distance. However, for high-speed processing, the endpoint matching distance is used in large classification processing.
最后,根据对应连接的结果,对结合笔划进行适当的距离规范化,计算最终的距离,将距离最小的文字作为识别结果。Finally, according to the result of the corresponding connection, appropriate distance normalization is performed on the combined strokes, the final distance is calculated, and the character with the smallest distance is taken as the recognition result.
在以上说明的吸收现有例的笔顺·画数变化的文字识别装置中,使用以等间隔对手写文字的各笔划刻的特征点串来计算笔划间距离,所以,存在着由撇捺等噪声成分和笔划变形、位置偏移引起的特征点的坐标值的偏移影响笔划距离、成为误读原因的问题。In the above-described character recognition device that absorbs changes in the order of strokes and the number of strokes in the conventional example, the distance between strokes is calculated using a string of feature points carved on each stroke of a handwritten character at equal intervals, so there are noise components such as left and right strokes. There is a problem that the stroke distance is affected by the offset of the coordinate value of the feature point due to the stroke deformation and the position offset, which causes misreading.
还有,在汉字等笔画多的文字中,各笔划多由简单的直线笔画构成,所以,笔划中的特征点即使不包括重要信息也没有关系,由于用特征点值进行对应连接,因此,在对应连接时要花费时间,存在着容易受坐标值偏移影响的问题。Also, in characters with many strokes such as Chinese characters, each stroke is mostly composed of simple straight strokes, so it does not matter even if the feature points in the strokes do not include important information, because the corresponding connection is performed with the feature point values, therefore, in There is a problem that it takes time to connect and is easily affected by coordinate value offset.
另外,划数变化吸收时,为了减少计算量,而使用「按照笔顺生成笔划结合」的假设,在划数变化不产生按照笔顺的变化时,存在不能对应的问题。In addition, when absorbing changes in the number of strokes, in order to reduce the amount of calculation, the assumption of "generating stroke combinations according to the stroke order" is used. When the change in the number of strokes does not produce changes according to the stroke order, there is a problem that it cannot respond.
本发明系为解决上述问题而完成的,其目的在于提供这样的文字识别装置与文字识别方法,即能够完全地进行笔顺·笔画变化的手写文字的笔划与笔划特征辞典的笔划对应连接,难以受笔划局部变形和位置偏移的影响,实现高精度的识别,还能够高速地进行文字识别。The present invention was made to solve the above problems, and its object is to provide such a character recognition device and character recognition method, that is, the strokes of handwritten characters that can completely change the stroke order and strokes are connected in correspondence with the strokes of the stroke feature dictionary, and it is difficult to be affected. Due to the influence of local deformation and positional deviation of strokes, high-precision recognition can be realized, and high-speed character recognition can also be performed.
有关本发明的第一方面,包括:输入单元,在线输入手写文字;笔划特征辞典,对多个识别对象文字,预先存储构成各识别对象文字的直线笔划及非直线笔划的特征;笔划特征提取单元,从由上述输入单元输入的上述手写文字提取构成该手写文字的笔划的特征;笔划分类单元,根据由笔划特征提取单元提取的特征,将上述手写文字的各笔画分类为上述直线笔划、上述非直线笔划或不可与构成上述识别对象文字的笔划对应连接的连续字笔划的某一个;直线笔划对应连接单元,进行由该笔划分类单元分类的直线笔划的特征与上述笔划特征辞典中存储的构成识别对象文字的直线笔划的特征的对应连接;非直线笔划对应连接单元,进行由该笔划分类单元分类的非直线笔划的特征与上述笔划特征辞典中存储的构成识别对象文字的非直线笔划的特征的对应连接;后继字笔划对应连接单元,将由上述笔划分类单元分类的后继字笔划分为上述直线笔划或上述非直线笔划,进行该分割的直线笔划或非直线笔划的特征与上述笔划特征辞典中存储的构成识别对象文字的直线笔划或非直线笔划的特征的对应连接;文字评价单元,根据由上述直线笔划对应连接单元、上述非直线笔划对应连接单元或上述后继字笔划对应连接单元所得的对应连接结果,识别上述手写文字。Regarding the first aspect of the present invention, it includes: an input unit for online input of handwritten characters; a stroke feature dictionary for pre-storing the features of the linear strokes and non-linear strokes that constitute each recognition target character for a plurality of recognition target characters; a stroke feature extraction unit extracting the features of the strokes that constitute the handwritten characters from the above-mentioned handwritten characters input by the above-mentioned input unit; the stroke classification unit classifies each stroke of the above-mentioned handwritten characters into the above-mentioned linear strokes, the above-mentioned non-linear strokes according to the features extracted by the stroke feature extraction unit One of the strokes of straight lines or continuous characters that cannot be correspondingly connected with the strokes that constitute the above-mentioned recognition object characters; the straight-line strokes correspond to the connection unit, and the features of the straight-line strokes classified by the stroke classification unit are identified with the composition stored in the above-mentioned stroke feature dictionary The corresponding connection of the features of the linear strokes of the object characters; the non-linear strokes correspond to the connection unit, and carry out the feature of the non-linear strokes classified by the stroke classification unit and the features of the non-linear strokes that constitute the recognition object characters stored in the above-mentioned stroke feature dictionary Correspondingly connected; the corresponding connection unit of the subsequent character stroke, the subsequent character stroke classified by the above-mentioned stroke classification unit is divided into the above-mentioned linear stroke or the above-mentioned non-linear stroke, and the feature of the linear stroke or non-linear stroke of the segmentation is stored in the above-mentioned stroke feature dictionary The corresponding connection of the features of the linear strokes or non-linear strokes that constitute the recognition object text; the character evaluation unit is based on the corresponding connection obtained by the corresponding connection unit of the above-mentioned straight strokes, the corresponding connection unit of the above-mentioned non-linear strokes or the corresponding connection unit of the above-mentioned subsequent character strokes As a result, the above handwritten characters were recognized.
有关本发明的第二方面,还包括笔划对应连接确定单元,根据由上述笔划特征提取单元提取的特征,确定上述手写文字的各笔划的存在区;上述直线笔划对应连接单元、上述非直线连接笔划对应连接单元或上述后继字笔划对应连接单元进行与由上述可对应连接区确定单元确定的各笔划存在区对应的、构成存储在上述笔划特征辞典中的识别对象文字的直线笔划或非直线笔划的特征的对应连接。Regarding the second aspect of the present invention, it also includes a stroke corresponding connection determination unit, which determines the existence area of each stroke of the above-mentioned handwritten character according to the features extracted by the above-mentioned stroke feature extraction unit; the above-mentioned linear stroke corresponding connection unit, the above-mentioned non-linear connection stroke The corresponding connection unit or the above-mentioned successor character stroke corresponding connection unit performs the linear strokes or non-linear strokes corresponding to each stroke existence area determined by the above-mentioned corresponding connection area determination unit and constituting the recognition object character stored in the above-mentioned stroke feature dictionary. Corresponding connections of features.
有关本发明的第三方面,上述非直线笔划对应连接单元对由上述直线笔划对应连接单元对应连接的上述识别文字进行对应连接。According to the third aspect of the present invention, the above-mentioned non-linear stroke corresponding connection unit performs corresponding connection on the above-mentioned recognition characters correspondingly connected by the above-mentioned linear stroke corresponding connection unit.
有关本发明的第四方面,上述后续字笔划对应连接单元对由上述直线笔划对应连接单元或上述非直线笔划对应连接单元对应连接的上述识别对象文字进行对应连接。Regarding the fourth aspect of the present invention, the subsequent character stroke corresponding connection unit performs corresponding connection on the above-mentioned recognition object characters correspondingly connected by the above-mentioned linear stroke corresponding connection unit or the above-mentioned non-linear stroke corresponding connection unit.
有关本发明的第五方面,包括:在线输入手写文字的输入步骤;从由上述输入步骤输入的上述手写文字提取构成该手写文字的笔划的特征的笔划特征提取步骤;根据由该笔划特征提取步骤提取的特征,将上述手写体文字的各笔划分类成上述直线笔划、上述非直线笔划及不可与构成上述识别对象文字的笔划对应连接的后继字笔划中的一个的笔划分类步骤;进行由该笔划分类步骤分类的直线笔划特征与构成上述笔划特征辞典中存储的识别对象文字的直线笔划的特征的对应连接的直线笔划对应连接步骤;进行由上述笔划分类步骤分类的非直线笔划的特征与构成上述笔划特征辞典中存储的识别对象文字的非直线笔划的特征的对应连接的非直线笔划对应连接步骤;将由上述笔划分类步骤分类的后继字笔划分割成上述直线笔划或上述非直线笔划、进行将该分割的直线笔划或非直线笔划的特征与构成上述笔划特征辞典中存储的识别对象文字的直线笔划或非直线笔划的特征对应连接的后继字笔划对应连接步骤;以及根据上述直线笔划对应连接步骤、上述非直线笔划对应连接步骤或上述后继字笔划对应连接步骤所得的对应连接结果识别上述手写文字的文字评价步骤。Regarding the fifth aspect of the present invention, it includes: an input step of inputting handwritten characters online; a stroke feature extraction step of extracting features of strokes constituting the handwritten characters from the above-mentioned handwritten characters input by the above input step; according to the stroke feature extraction step The feature of extracting, each stroke of above-mentioned handwritten character is classified into one of above-mentioned linear stroke, above-mentioned non-linear stroke and the stroke that can not be correspondingly connected with the stroke that forms above-mentioned recognition object character; Carry out by this stroke classification The linear stroke feature of the step classification and the corresponding connection step of the linear stroke that constitutes the feature of the linear stroke of the recognition object character stored in the above-mentioned stroke feature dictionary; The non-linear stroke corresponding connection step of the corresponding connection of the feature of the non-linear stroke of the recognition object character stored in the feature dictionary; the subsequent character strokes classified by the above-mentioned stroke classification step are divided into the above-mentioned linear stroke or the above-mentioned non-linear stroke, and the segmentation is carried out. The features of the linear strokes or non-linear strokes and the features of the linear strokes or non-linear strokes that constitute the recognition object characters stored in the above-mentioned stroke feature dictionary correspond to the corresponding connection steps of the subsequent character strokes; and according to the above-mentioned linear strokes. A character evaluation step for recognizing the above-mentioned handwritten characters by the corresponding connection result obtained in the non-linear stroke corresponding connection step or the above-mentioned subsequent character stroke corresponding connection step.
有关本发明的第六方面,包括根据由上述笔划特征提取步骤提取的特征确定上述手写文字的各笔划的存在区的笔划可对应连接区确定步骤;Regarding the sixth aspect of the present invention, it includes determining the stroke-corresponding connecting region of the existence region of each stroke of the above-mentioned handwritten characters according to the features extracted by the above-mentioned stroke feature extraction step;
上述直线笔划连接步骤、上述非直线笔划连接步骤或上述后继字笔划对应连接步骤进行与上述可对应连接区确定步骤确定的各笔划的存在区对应的、构成存储在与各笔划的存在区相对应的上述笔划特征辞典中的识别对象文字的直线笔划或非直线笔划的特征的对应连接。The above-mentioned linear stroke connection step, the above-mentioned non-linear stroke connection step or the corresponding connection step of the above-mentioned subsequent character strokes are carried out corresponding to the existence area of each stroke determined in the above-mentioned corresponding connection area determination step, and the composition is stored in the corresponding existence area of each stroke. Corresponding connection of features of straight strokes or non-straight strokes of the character to be recognized in the above-mentioned stroke feature dictionary.
有关本发明的第七方面,上述非直线笔划对应连接步骤对由上述直线笔划对应步骤对应连接的上述识别对象文字进行对应连接。According to the seventh aspect of the present invention, the non-linear stroke corresponding connection step performs corresponding connection on the above-mentioned recognition target characters correspondingly connected by the above-mentioned linear stroke corresponding step.
有关本发明的第八方面,上述后继字笔划对应连接步骤对由上述直线笔划对应步骤或上述非直线笔划对应连接步骤对应连接的上述识别对象文字进行对应连接。Regarding the eighth aspect of the present invention, the subsequent character stroke corresponding connection step performs corresponding connection on the above-mentioned recognition object characters correspondingly connected by the above-mentioned linear stroke corresponding step or the above-mentioned non-linear stroke corresponding connection step.
图1是实施例1的文字识别装置的结构图。FIG. 1 is a block diagram of a character recognition device according to the first embodiment.
图2是表示实施例1的文字识别装置的处理流程的流程图。FIG. 2 is a flowchart showing the processing flow of the character recognition device of the first embodiment.
图3是表示实施例1的输入单元1中写入的输入图形”亞”的示意图。FIG. 3 is a schematic diagram showing the input pattern "sub" written in the
图4是表示实施例1的由笔划特征提取单元2提取的输入图形的笔划特征的示意图。FIG. 4 is a schematic diagram showing the stroke features of the input figure extracted by the stroke
图5是表示将实施例1的笔划特征中的笔划方向、虚拟笔划方向量化为16个方向用的分配图。FIG. 5 is an allocation diagram showing stroke directions and virtual stroke directions quantized into 16 directions in the stroke feature of the first embodiment.
图6是表示实施例1的笔划分类单元的处理流程的流程图。FIG. 6 is a flowchart showing the processing flow of the stroke classifying unit in the first embodiment.
图7是表示实施例1的笔划分类单元求取方向分量分布时使用的8个方向分量分布图。Fig. 7 is a diagram showing the distribution of eight direction components used when the stroke classifying unit in the first embodiment obtains the distribution of direction components.
图8是表示实施例1的直线笔划对应连接装置5的处理流程的流程图。FIG. 8 is a flowchart showing the processing flow of the straight-line-stroke-
图9是表示实施例1的笔划可对应连接区确定单元6在直线笔划的对应连接时使用的区域分割图。FIG. 9 is a diagram showing the division of regions used by the stroke-corresponding-connection
图10是表示实施例1的笔划可对应连接区确定单元6使用的对应连接区域确定用的信息图。FIG. 10 is a diagram showing information for determining a corresponding connection region used by the stroke-corresponding connection
图11是表示实施例1的输入笔划可对应连接区的图。FIG. 11 is a diagram showing the connection area to which an input stroke can be corresponded in the first embodiment.
图12是表示实施例1的笔划特征辞典中的文字”亞”的标准图形的笔划串图。Fig. 12 is a stroke string diagram showing a standard pattern of the character "亚" in the stroke feature dictionary of the first embodiment.
图13是表示实施例1的笔划特征辞典中的文字”亞”的标准图形中的笔划特征图。Fig. 13 is a graph showing stroke characteristics in the standard pattern of the character "亚" in the stroke characteristic dictionary of the first embodiment.
图14是表示根据实施例1的直线笔划对应连接单元5、非直线笔划对应连接单元7、后继字笔划对应连接单元8中使用的笔划形状而可否对应连接的图。FIG. 14 is a diagram showing whether a corresponding connection is possible according to stroke shapes used in the linear stroke corresponding
图15是表示实施例1的直线笔划对应连接单元5中的候选笔划的一览形态图。FIG. 15 is a view showing a list of stroke candidates in the linear stroke-corresponding
图16是表示实施例1的非直线笔划对应连接单元7的处理流程的流程图。FIG. 16 is a flowchart showing the processing flow of the non-linear stroke
图17是表示实施例1的后继字笔划对应连接单元8的处理流程的流程图。FIG. 17 is a flow chart showing the flow of processing by the successor-character-stroke-correspondence-
图18是表示实施例1的笔划可对应连接区确定装置6在直线笔划以外的对应连接情况下使用的区域划分图。FIG. 18 is a diagram showing the division of regions used by the stroke-corresponding-connection-
图19是表示实施例1的输入图形的各笔划存在于笔划可对应连接区确定装置6中使用的哪个区中的图。FIG. 19 is a diagram showing in which area each stroke of the input pattern according to the first embodiment exists in the stroke-corresponding connectable
图20是表示实施例1的步骤S54的详细处理流程的流程图。FIG. 20 is a flowchart showing the detailed processing flow of step S54 in the first embodiment.
图21是表示实施例1的标准图形的笔划的特征点的图。FIG. 21 is a diagram showing feature points of strokes of standard figures in the first embodiment.
图22是表示以一定的间隔对实施例1的输入图形的笔划进行取样的取样点的图。Fig. 22 is a diagram showing sampling points for sampling the strokes of the input pattern in the first embodiment at regular intervals.
图23是表示实施例1的标准图形的特征点和输入图形的取样点的对应连接结果的图。FIG. 23 is a diagram showing the result of correspondence connection between the feature points of the standard graph and the sampling points of the input graph in the first embodiment.
图24是表示使实施例1的输入图形与标准图形的笔划一致而分解的笔划的图。Fig. 24 is a diagram showing strokes decomposed by making the input figure of the first embodiment coincide with the strokes of the standard figure.
图25是表示由实施例1的输入单元1写的文字“言”的各笔划的图。FIG. 25 is a diagram showing individual strokes of the character "言" written by the
图26是表示实施例1的输入图形的各笔划存在于笔划可对应连接区确定单元6中使用的哪个区中的图。FIG. 26 is a diagram showing in which area each stroke of the input pattern according to the first embodiment exists in the stroke-corresponding connectable
图27是表示实施例1的笔划特征辞典3中的文字“言”的标准图形的图。Fig. 27 is a diagram showing standard patterns of the character "yan" in the
图28是表示实施例1的局部笔划的对应连接的详细处理流程的流程图。Fig. 28 is a flowchart showing the detailed processing flow of the partial stroke correspondence connection in the first embodiment.
图29是表示实施例1的输入图形的后继字笔划的分解候选点的图。Fig. 29 is a diagram showing decomposition candidate points of subsequent character strokes of an input pattern according to the first embodiment.
图30是表示实施例1的局部笔划的图。FIG. 30 is a diagram showing partial strokes in the first embodiment.
图31是表示图28中步骤S72的处理的详细流程的流程图。FIG. 31 is a flowchart showing the detailed flow of the processing of step S72 in FIG. 28 .
图32是表示由图30中的局部笔划获得的分解笔划的图。FIG. 32 is a diagram showing decomposed strokes obtained from partial strokes in FIG. 30 .
图33是表示由局部笔划获得的分解笔划的图。Fig. 33 is a diagram showing decomposed strokes obtained from partial strokes.
图34是现有的在线文字识别装置的结构图。Fig. 34 is a structural diagram of a conventional online character recognition device.
(实施例1)(Example 1)
下面,参照附图说明实施例一。Next,
图1是本发明的文字识别装置的结构图。FIG. 1 is a block diagram of a character recognition device of the present invention.
图中,1是向输入板型的输入装置中手写输入文字的输入单元;2是从输入单元1输入的手写文字中提取构成手写文字的笔划的特征的笔划特征提取单元;3是对多个识别对象文字预先存储按照正确的笔顺书写的各识别对象文字即构成标准图形的直线笔划及非直线笔划的笔划特征的笔划特征辞典;4是根据笔划特征提取单元2提取的特征,将手写文字的各笔划分类为直线笔划、非直线笔划、后继字笔划之一的笔划分类单元;5是进行由上述笔划分类单元4分类的直线笔划的特征与构成笔划特征辞典3中存储的标准图形的直线笔划的特征的对应连接的直线笔划对应连接单元;6是根据由笔划特征提取单元2提取的特征确定手写文字的各笔划的存在区的笔划可对应连接区域确定单元;7是进行由笔划分类单元4分类的非直线笔划的特征与构成笔划特征辞典3中存储的标准图形的非直线笔划的特征的对应连接的非直线笔划对应连接单元;8是将用笔划分类单元分类的后继字笔划分割为直线笔划或非直线笔划,并进行该分割的直线笔划或非直线笔划的特征同构成笔划特征辞典3中存储的标准图形的直线笔划或非直线笔划特征的对应连接的后继字笔划对应连接单元;9是根据后继字笔划制作局部笔划的局部笔划制作单元。Among the figure, 1 is the input unit of handwritten input character in the input device of tablet type; 2 is the stroke feature extraction unit that extracts the feature of the stroke that constitutes handwritten character from the handwritten character that input unit 1 inputs; 3 is to a plurality of Recognition object character is stored in advance according to correct stroke order writing each recognition object character i.e. constitutes the stroke characteristic dictionary of the linear stroke of standard figure and the stroke feature of non-linear stroke; 4 is the feature that extracts according to stroke feature extraction unit 2, the Each stroke is classified into the stroke classification unit of one of straight stroke, non-straight stroke and successor character stroke; 5 is the straight stroke that carries out the characteristic of the straight stroke that is classified by above-mentioned stroke classification unit 4 and constitutes the standard figure stored in the stroke feature dictionary 3 The straight line stroke corresponding connection unit of the corresponding connection of the feature; 6, according to the feature that is extracted by stroke feature extraction unit 2, the stroke that determines the existence area of each stroke of handwritten character can correspond to the connection area determination unit; 7 is to carry out by stroke classification unit 4 The non-linear stroke corresponding connection unit of the feature of the non-linear stroke of classification and the feature of the non-linear stroke that constitutes the standard figure stored in the stroke feature dictionary 3; Stroke or non-straight line stroke, and the feature of the straight line stroke or non-straight line stroke of this division is with the straight line stroke of the standard figure that constitutes the standard figure stored in the stroke feature dictionary 3 or the corresponding connection unit of the successor word stroke of the non-straight line stroke feature of the corresponding connection; 9 It is a partial stroke making unit for making partial strokes according to the strokes of subsequent characters.
10是根据由直线笔划对应连接单元5、非直线笔划对应连接单元7、后继字笔划对应连接单元8得的对应连接结果计算文字的评价值、识别手写文字的文字识别单元,11是控制各单元的控制单元。10 is the character recognition unit that calculates the evaluation value of characters and recognizes handwritten characters according to the corresponding connection results obtained by the
图2是表示图1所示文字识别单元的处理流程的流程图。FIG. 2 is a flowchart showing the processing flow of the character recognition unit shown in FIG. 1 .
使用表示图2中的文字识别单元的处理流程的流程图来说明处理顺序的例子。An example of the processing procedure will be described using a flowchart showing the processing flow of the character recognition unit in FIG. 2 .
首先,在步骤S1,由输入单元1用笔将文字写入输入板型的输入装置中。First, in step S1, characters are written by the
图3是表示向输入单元1输入的手写文字的输入图形”亞”的图,60-65是按时间顺序输入的笔划。在该例中,进行笔顺和画数都不规范的书写,由六个笔划构成。FIG. 3 is a diagram showing the input pattern "亚" of handwritten characters input to the
进到步骤S2,由笔划特征提取单元2提取构成手写文字的输入图形的各笔划的特征。在该例中,提取图3中的输入图形的笔划60-65的笔划特征。Proceeding to step S2, the stroke
图4是由笔划特征提取单元2提取的输入图形的笔划特征,示出了从图3中的输入图形的笔划60-65得到的笔划特征。在该笔划特征中,为了在16方向上量化图5所示的笔划而利用进行方向配置的16方向分布图来计算笔划方向及将一个笔划与下一笔划实际连接时未书写的假想笔划的方向。FIG. 4 is the stroke features of the input graphic extracted by the stroke
接着,进到步骤S3,笔划分类单元4按其形状来对输入图形的笔划进行分类。Then, proceeding to step S3, the
图6是表示笔划分类单元4的处理流程的流程图。使用图6的流程图来详细说明笔划分类单元4的动作。FIG. 6 is a flowchart showing the processing flow of the
首先,在图6的步骤S10中,笔划分类单元4从开头笔划开始依次判断当前的笔划是否为直线笔划,在是直线笔划的情况下进到步骤S11,在此外的情况下进到步骤S13。在图3的输入图形中,根据图4的笔划特征的形状而将五个笔划60、61、62、63、65判断为直线笔划,进入到步骤S11。First, in step S10 of FIG. 6 , the
在步骤S11,笔划分类单元4提取笔划的方向成分分布。笔划的方向分布由用直线连接以适当间隔将输入笔划采样的点而成的各直线的方向分布来计算。这里,在笔划分类单元4计算方向分布时,使用图7所示的8方向成分分布。In step S11, the
进到步骤S12,笔划分类单元4检查所求的方向成分是否只在一定的方向上,只在一定的方向时,进入步骤S15,判定为直线笔划,除此之外就进到步骤S13。在图3的输入图形的例中,由于所有直线笔划都只具有一定方向上的成分,所以,进到步骤S15,判定为是直线笔划。Proceed to step S12, the
在方向成分不是只在一定方向时,在步骤S13,笔划分类单元对具有多个方向成分的直线笔划及直线之外的笔划检查是否为笔划特征辞典3中存在的笔划形状,在是辞典中不存在的笔划的情况下,进到步骤S16,判定为后继字笔划。在是辞典中存在的笔划的情况下,进入步骤S14,判定为非直线笔划。在图3的输入图形的例中,由于未知的笔划64是辞典中不存在的笔划,所以,进入步骤S16,判定为是后继字笔画。When the direction component is not only in a certain direction, in step S13, the stroke classifying unit checks whether the straight line strokes with multiple direction components and the strokes outside the straight line are stroke shapes that exist in the
接着,进入图2的步骤S4,控制单元11将在笔划分类单元4中判定为直线笔划的笔划送到直线笔划对应连接单元5,进行输入图形的直线笔划与笔划特征辞典3的标准图形的直线图笔划的对应连接。在图3所示的输入图形的例中,将直线笔划60、61、62、63、65送到直线笔划对应连接单元5。Then, enter step S4 of Fig. 2, the
图8是表示直线笔划对应连接单元的处理流程的流程图。使用图8的流程图来说明直线笔划对应连接单元5的动作。FIG. 8 is a flowchart showing the processing flow of the linear stroke correspondence connection unit. The operation of the linear stroke
首先,直线笔划对应连接单元5在步骤S21中根据长度而将各直线笔划分成三种(长、中、短)。笔划的长度由笔划始点和终点的欧几里德距离来求取。在本例中,将长笔划定义为大于70,将短笔划定义为小于30,将中间笔划定义为其中间值。根据图4的输入图形的笔划特征所示的输入笔划的长度信息,将笔划60、61、62、65分类为长笔划,将笔划63分类为短笔划。First, in step S21, the straight line stroke
进入步骤S22,直线笔划对应连接单元5根据笔划方向、笔划的始终点坐标及笔划的长度,由笔划可对应连接区确定单元6确定笔划的存在区。Going to step S22, the straight line stroke corresponding
图9是表示在直线笔划的对应连接中使用的区域分割的一例。这里,示出了利用由外接矩形将文字在水平方向、垂直方向分割成四份的区域来确定笔划的存在区的情况例。FIG. 9 shows an example of region division used in the correspondence connection of straight line strokes. Here, an example of the case where the stroke-existing area is determined using the area that divides the character into four in the horizontal direction and the vertical direction by the circumscribing rectangle is shown.
图10是表示在笔划可对应连接确定单元6中使用的区域确定用的信息。图10中的笔划的方向表示用图5的16方向分布图将从笔划的始点到终点的方向量化的值。例如,对图10的水平直线笔划来说,笔划方向表示有与4、5、6、12、13、14方向的标准图形的水平笔划对应连接的可能性。还有,区1中存在的水平直线笔划表示有可与区1、区2中存在的水平直线笔划对应连接的可能性。FIG. 10 shows information for specifying an area used in the stroke-corresponding-
如图10所示,笔划可对应连接区确定单元6判断为在与输入图形的笔划方向垂直的方向上的距离变化少、比较稳定,一直到相邻区为止都是存在区。但是,在跨过多个区的情况下,与该多个区域相邻的区域都成为存在区。As shown in FIG. 10 , the stroke-corresponding connectable
图11表示对图3的输入图形的笔划的存在区及可对应连接区。FIG. 11 shows the existence area and the corresponding connection area of the strokes of the input pattern in FIG. 3 .
进入步骤S23,直线笔划对应连接单元5对输入图形的笔划确定成为对应连接对象的笔划特征辞典3的标准图形的笔划。Proceeding to step S23, the straight line stroke-corresponding
图12是表示笔划特征辞典3中存储的文字”亞”的标准图形的笔划的图。图13是笔划特征辞典3中存储的图12的标准图形的笔划特征。FIG. 12 is a diagram showing the strokes of the standard pattern of the character "亚" stored in the
作为可对应连接的条件,有用1)笔划的形状条件、2)笔划的长度条件、3)笔划的可对应连接区的条件这三个条件确定成为对应连接对象的标准图形的笔划。As the condition of linkable connection, there are three conditions: 1) the shape condition of the stroke, 2) the length condition of the stroke, and 3) the condition of the linkable region of the stroke to determine the stroke of the standard figure to be connected.
图14是表示是否可根据笔划形状进行对应连接的图,是确定可否与在直线笔划对应连接单元5、非直线笔划对应连接单元7、后继字笔划对应连接单元8使用的笔划形状进行对应连接的图。Fig. 14 is a diagram showing whether the corresponding connection can be carried out according to the stroke shape, which is to determine whether the corresponding connection can be carried out with the stroke shape used in the straight line stroke corresponding
具体地说,笔划的形状条件是图14所示的对应关系,即输入图形的笔划形状和标准图形的笔划形状,连接“○”的对应连接作为可对应连接,连接“×”的作为不可对应连接。Specifically, the shape condition of the stroke is the corresponding relationship shown in Figure 14, that is, the stroke shape of the input figure and the stroke shape of the standard figure, the corresponding connection of the connection "○" is regarded as a corresponding connection, and the connection of "×" is regarded as a non-corresponding connect.
还有,笔划的长度条件是长笔划可以和长笔划或中间笔划对应连接, 中间笔划可以和长笔划及短笔划对应连接,短笔划可以和中间笔划或短笔划对应连接。还有,笔划可对应连接区的条件使用笔划可对应连接区确定单元6的结果。Also, the length condition of the stroke is that the long stroke can be connected correspondingly with the long stroke or the middle stroke, the middle stroke can be connected correspondingly with the long stroke and the short stroke, and the short stroke can be connected correspondingly with the middle stroke or the short stroke. Also, the stroke may correspond to the condition of the connection area. The stroke may correspond to the result of the connection
在该例中,对图12的标准图形,将图13的笔划特征存储在笔划特征辞典3中。根据该笔划特征,对输入图形的各笔划确定可对应连接的标准图形的笔划。In this example, the stroke features shown in FIG. 13 are stored in the
在本实施例中,对图3的输入图形的各笔划,确定可对应连接的标准图形的笔划。In this embodiment, for each stroke of the input figure in FIG. 3 , strokes of standard figures that can be connected are determined.
图15是表示根据笔划形状条件、笔划的可对应连接区条件及笔划的长度条件归纳的可对应连接的标准图形的笔划的情况的图。这里,按照各条件进行的归纳只是将至此归纳的内容作为对象依次进行。FIG. 15 is a diagram showing strokes of standard figures that can be connected according to the stroke shape condition, the stroke's connectable region condition, and the stroke length condition. Here, the summarization according to each condition is performed sequentially only for the contents summarized so far.
根据图15,笔划60被归纳到笔划70和73中,笔划61被归纳到笔划74、75中,笔划62被归纳到笔划75中,笔划63被归纳到笔划71中,笔划65被归纳到笔划73、76中。这里,对归纳成一个的笔划,不存在此外的对应,归纳成一个的标准笔划不能在其他输入笔划的对应连接中使用。这里,由于笔划71、75被归纳成一个,所以,从其他输入笔划的候选笔划中除去笔划71、75。因此,笔划61被归纳到笔划74中,最后,只存在笔划60、笔划65两个候选笔划。According to Fig. 15,
接着,进入步骤S24,直线笔划对应连接单元5对输入图形的笔划详细地进行与标准图形的候选笔划的对应连接,对应连接成功的结果作为全部直线笔划的对应连接结果。Next, enter step S24, the straight line stroke corresponding
这里,在详细对应连接的检验中,对笔划方向、笔划宽度、笔划高度及笔划的始终点的坐标值,将其与预先确定的对应连接的阈值进行比较,检验是否可对应连接以及在哪里连接。此时,在输入笔划与标准图形的连续笔划对应连接时,也使用虚拟笔划的对应连接结果检验对应连接。Here, in the check of the detailed corresponding connection, compare the stroke direction, stroke width, stroke height, and the coordinate value of the stroke point with the predetermined corresponding connection threshold, and check whether the corresponding connection is possible and where . At this time, when the input strokes are correspondingly connected with the continuous strokes of the standard figure, the corresponding connection results of the virtual strokes are also used to check the corresponding connection.
另外,为吸收输入图形的笔划变化,即使在输入图形与标准图形的笔顺不同时,也对可与各输入图形的笔划对应连接的标准图形的笔划的全部候选笔划进行对应连接。In addition, in order to absorb the change of the strokes of the input graphics, even when the stroke order of the input graphics is different from that of the standard graphics, all the candidate strokes of the standard graphics that can be connected with the strokes of the input graphics are correspondingly connected.
具体地说,用图4的输入图形的笔划特征与图13的标准图形的笔划特征进行比较。Specifically, compare the stroke features of the input graphic in FIG. 4 with the stroke features of the standard graphic in FIG. 13 .
在该例中,对输入图形的笔划61、62、63,已经对应连接的候选被归纳到一个中,由于任何笔划都能正确对应连接,所以,对应连接成功。对笔划60、65,由于存在两个候选,所以,对各输入图形的笔划分别对应连接标准图形的候选笔划。即,对输入图形的笔划60对应连接标准图形的笔划70及笔划73,对笔划65对应连接笔划73及笔划76。In this example, for the
首先,对输入图形的笔划60的对应连接处理进行说明。进行笔划60与标准图形的笔划70、73的对应连接。这里,在笔划60与笔划73对应连接的情况下,始终点的Y坐标值的差变大。还有,如果使用表示笔划间的位置关系的虚拟笔划检验对应连接结果的话,则由图13可知,在笔划73的情况下,指向下一笔划74的虚拟笔划的方向为“16”。在笔划60的情况下,与辞典中的标准笔划74对应连接的输入图形的笔划为笔划61,笔划60的虚拟笔划方向成为从笔划60的终点到笔划61的始点的方向,故为“13”。因此,虚拟笔划的方向差为16-13=3。First, the corresponding connection processing of the
另一方面,在笔划60与笔划70的对应连接中,始终点坐标值的差减小,虚拟笔划的方向之差减小(对应连接结果的距离减小),是正确的对应连接。On the other hand, in the corresponding connection of
接着,进行笔划65与标准图形的笔划73、76的对应连接。在笔划65与笔划73对应连接的情况下,始终点的坐标值的差(笔划特征的差)变大。另一方面,在笔划65与笔划76对应连接的情况下,笔划特征的差减小,成为正确的对应连接。Next, the corresponding connection between the
结果,直线笔划的对应连接是输入图形的笔划60与标准图形的笔划70对应连接,笔划61与笔划74对应连接,笔划62与笔划75对应连接,笔划63与笔划71对应连接,笔划65与笔划76对应连接,归纳成一种对应连接。As a result, the corresponding connection of straight line strokes is that the
如该例所示,利用上述对应连接处理,输入图形的笔划的笔顺即使与标准图形的笔顺不同,也能对应连接。As shown in this example, with the above-mentioned corresponding connection processing, even if the stroke order of the strokes of the input figure is different from that of the standard figure, the corresponding connection can be made.
进到步骤S25中,直线笔划对应连接单元5在输入图形的直线笔划与标准图形的任一笔划对应的情况下,进入到步骤S27,对应连接为可(OK),在存在未对应的笔划(哪怕只有1个)的情况下,进入到步骤S28,与当前标准图形的对应连接为“不可”(NG)。Proceed among the step S25, the straight line stroke corresponding connecting
在本例中,所有输入图形的直线笔划与标准图形的笔划进行了正确的对应连接。In this example, the straight line strokes of all input graphics are correctly connected with the strokes of standard graphics.
接着,进到图2的步骤S5中,控制单元11检验直线笔划对应连接结果,在对应连接为“不可”的情况下,进到步骤S8,进行与下一标准图形的对应连接。另外,在对应连接成功的情况下,进到步骤S6中,进行与非直线笔划的对应连接。Next, in step S5 of FIG. 2 , the
图16是表示非直线笔划对应连接单元7的处理流程的流程图。使用图16的所示的流程图,详细说明非直线笔划对应连接单元7的动作。FIG. 16 is a flowchart showing the processing flow of the non-linear stroke
在步骤S30中,非直线笔划对应连接单元7与直线笔划的情况相同,对非直线笔划,由笔划可对应连接区确定单元6确定与输入图形的笔划对应连接的标准图形的笔划。该对应连接和直线笔划对应连接相同,使用图10的区域确定用的信息图。In step S30, the non-linear stroke corresponding
这里,输入图形的非直线笔划与标准图形的笔划的对应连接只限于在由上述直线笔划对应连接单元5对应连接的直线笔划存在的标准图形中进行。Here, the corresponding connection between the non-linear strokes of the input graphics and the strokes of the standard graphics is limited to the standard graphics in which the linear strokes correspondingly connected by the above-mentioned straight stroke corresponding
进到步骤S31,非直线笔划对应连接单元7除输入图形的笔划形状之外,还进行与标准图形的该笔划的对应连接。Proceeding to step S31, the non-linear stroke
首先,对标准图形的可与在步骤S30的处理中所求的输入图形的各笔划对应连接的笔划,根据笔划形状进行对应连接的检验。笔划形状的对应连接规则使用表示可否进行与图14的笔划形状对应的对应连接的图。First, for the strokes of the standard pattern that can be associated with each stroke of the input pattern obtained in the process of step S30, the corresponding connection is checked according to the shape of the stroke. The stroke shape correspondence connection rule uses a graph indicating whether a correspondence connection corresponding to the stroke shape in FIG. 14 is possible.
接着,判断为是可对应连接的笔划与直线笔划同样地进行详细的笔划对应连接的检验。Next, the detailed stroke correspondence check is carried out similarly to the linear strokes for the strokes judged to be correspondingly connected.
进到步骤S32中,非直线笔划对应连接单元7断定输入图形的笔划为与标准图形的笔划的某一个对应连接时,进到步骤S34,判定为对应连接成功。这里,未对应连接时,看作没有与标准图形的笔划一致的笔划,进到步骤S33中,判断为后续字笔划。Proceed to step S32, when the non-linear stroke corresponding
进到步骤S35,非直线笔划对应连接单元7对所有的非直线笔划研究是否进行了对应连接,若没进行就进到步骤S30,进行下一个的非直线笔划的对应连接。还有,在全部对应连接都结束了的情况下,非直线笔划对应连接单元7的对应连接结束。Proceeding to step S35, the non-linear stroke corresponding
接着,进到图2的步骤S7,控制单元11由后继字笔划对应连接单元8进行后继字笔划的对应连接。Next, proceed to step S7 of FIG. 2 , the
图17是表示后继字笔划对应连接单元8的处理流程的流程图。下面,使用图17的流程图说明后继字笔划对应连接单元8的动作。FIG. 17 is a flow chart showing the processing flow of the successor character stroke
首先,在步骤S50中,后继字笔划对应连接单元8由笔划可对应连接区确定单元6确定输入图形的后继字笔划可对应连接区,求取可与输入图形的后继字笔划对应连接的标准图形的笔划候选。First, in step S50, the subsequent character stroke corresponding
这里,输入图形的后继字笔划对应连接只限于对由上述直线笔划对应连接单元5和上述非直线笔划对应连接单元7对应连接的直线笔划及非直线笔划存在的标准图形进行。Here, the corresponding connection of subsequent character strokes of the input graphics is limited to the standard graphics with straight and non-linear strokes correspondingly connected by the above-mentioned linear stroke corresponding
图18是表示笔划可对应连接区确定单元6在直线笔划之外的对应连接时使用的区域分割的一例的图。笔划可对应连接区确定单元6在图10的用于可对应连接区确定的信息图中所示的笔划之外时,使用图18的区域分割确定笔划可对应连接区。笔划可对应连接区将直到输入图形的笔划的外接矩形所含的区域的一个外侧的区域作为对象。在图3的输入图形的例中,笔划64被判断为后继字笔划,不是图10所示的笔划形状,所以,首先确定包含笔划64的区域。FIG. 18 is a diagram showing an example of region division used by the stroke-corresponding-connection
图19是用图18的区域分割图3的输入图形、求取笔划64的外接矩形80的例子,是表示输入图形的各笔划存在于笔划可对应连接区确定单元6使用的区域的何处的图。由该图可知,笔划64的外接矩形80包含于区域A~H中,笔划的可对应连接区为区域A~L。FIG. 19 is an example of dividing the input figure of FIG. 3 by the region of FIG. 18 and obtaining the circumscribed
接着,后继字笔划对应连接单元8求取确定的可对应连接区中存在的标准图形的笔划。在图1 2所示标准图形的笔划例的情况下,区域A~L中包括的是笔划70、71、72、73共四个,但是,其中利用直线笔划对应连接单元5和非直线笔划对应连接单元7与输入笔划一一对应连接的标准图形的笔划由于对应连接已确定,所以除去。因此,在本例中,由于笔划70、71由直线笔划对应连接单元确定,所以被除去。结果,笔划72、73被选作候选笔划。Next, the successor character strokes correspond to the
进到步骤S57,后继字笔划对应连接单元8检验是否对所有后继字笔划确定了可对应连接区,若确定了就进到步骤S51,未确定就返回步骤S50。Proceed to step S57, whether the subsequent character stroke corresponding
在图3的输入图形中,由于后继字笔划只有一个,所以,进到步骤S51,确定由输入图形的后继字笔划的开始笔划和结束笔划构成的标准图形的候选笔划。In the input figure of Fig. 3, because the successor character stroke has only one, so proceed to step S51, determine the candidate stroke of the standard figure that is formed by the start stroke and the end stroke of the successor character stroke of input figure.
具体地说,首先,对输入图形的后继字笔划的始终点,用图18的区域分割求取其始终点存在的区域,将直到存在区域的外侧的区域作为可对应连接区。Concretely, at first, for the continuation point of the successor character stroke of input figure, obtain the region that its continuation point exists with the region segmentation of Fig. 18, with the region until the outside of existing region as corresponding connection region.
接着,由与输入图形的后继字笔划的始点存在的可对应连接区对应的标准图形的笔划将包括始点的笔划作为开始笔划的候选来选择。同样,由与输入的后继字笔划的终点存在的可对应连接区对应的标准图形的笔划将包括终点的笔划作为结束笔划的候选来选择。Next, the strokes of the standard figure corresponding to the corresponding connectable regions that exist at the start point of the subsequent character strokes of the input figure select the stroke including the start point as a candidate for the start stroke. Likewise, the strokes of the standard figure corresponding to the corresponding connectable regions that exist at the end points of the input successor strokes select the strokes including the end points as candidates for the end strokes.
在图19所示例的情况下,由于输入图形的后继字笔划64的始点存在于区域A,所以,始点的可对应连接区域为区域A、B、E、F。由于在步骤S50的处理所求的标准图形的候选即笔划72、73的始点都包含在该区域中,所以,开始笔划的候选为笔划72、73两个。同样,由于后继的笔划64的终点存在于区域H,所以,可对应连接区为区域C、D、G、H、K、L。这样,终点笔划的候选也为笔划72、73。In the case shown in FIG. 19 , since the starting point of the
进到步骤S52,后继字笔划对应连接单元8检查步骤S51所求的始终点候选的合计笔划数(其中,同一笔划不重复计数)是否小于3,在小于3的情况下,进到步骤S54,在大于3的情况下,进到步骤S53。在图3的后继字笔划的情况下,由于候选笔划数为2,所以,进到步骤S54。Proceed to step S52, whether the total number of strokes (wherein, the same stroke does not repeat counting) of the total point candidate that step S51 seeks according to the
在步骤S54,进行输入图形的后继字笔划与标准图形的候选笔划的对应连接。图20是表示步骤S54的对应连接处理的流程图。使用图20的流程图,详细说明后继字笔划的对应连接。In step S54, the corresponding connection between the strokes of the successor characters of the input graphics and the candidate strokes of the standard graphics is performed. FIG. 20 is a flowchart showing the corresponding connection processing in step S54. Using the flow chart of FIG. 20, the corresponding connection of subsequent character strokes will be described in detail.
首先,在步骤S60中,后继字笔划对应连接单元8确定标准图形的候选笔划的笔顺。这里,在步骤S60中,标准图形的候选笔划数一定归纳为三个以下。由于后继字笔划最小二个笔划构成,所以,标准图形的笔划候选为3个时的笔顺组合成为3C2×P2+P3=12种。First, in step S60, the subsequent character stroke
同样,标准图形的候选笔划为2个时为2种。Similarly, when there are two candidate strokes for the standard figure, there are two types.
由于最坏情况下是12种,所以是实际时间足可处理的组合,但还由以下两个制约条件来限制组合数。Since there are 12 types in the worst case, it is a combination that can be processed in actual time, but the number of combinations is limited by the following two constraints.
1)受由步骤S51确定的开始笔划的候选、结束笔划的候选制约1) subject to the candidate of the start stroke determined by step S51, the candidate of the end stroke is restricted
2)受不能删除的笔划的制约2) Restricted by strokes that cannot be deleted
这里,所谓不能删除,是指只与当前输入图形的笔划候选对应的标准图形的笔划。由于在步骤S50的处理中确定可与各输入图形的笔划对应连接的标准图形笔划的候选,所以,删除笔划能够利用该信息来确定。Here, the term "undeletable" refers to only the strokes of the standard graphics corresponding to the stroke candidates of the currently input graphics. In the process of step S50, candidates for standard figure strokes that can be associated with the strokes of the input figures are identified, so the deletion stroke can be specified using this information.
在输入图形的笔划64的情况下,存在两个候选笔划数,任一笔划都成为开始笔划、结束笔划的候选,由于没有删除笔划,故有两种笔划。在该例中,首先与笔划72、73的笔顺进行对应连接。In the case of the
进到步骤S61,后继字笔划对应连接单元8进行输入图形的后继字笔划与标准图形的笔划的对应连接。对应连接以特征点值来进行,标准图形的笔划特征点信息在直线笔划时只有始点和终点存在特征点,在直线笔划之外时,在始点、终点及笔划的弯曲点附近都存在特征点,所以,也使用存储在笔划特征辞典3中的该特征点。还有,输入图形的笔划特征点信息成为前处理后由适当间隔表现的采样点。Proceeding to step S61, the subsequent character stroke corresponding
由图13的标准图形的笔划特征知道,笔划72的特征点信息存在始点、笔划中的特征点和终点三点,笔划73的特征点信息存在始点和终点两点。因而,标准图形的笔划特征点存在五点。图21中的“○”表示所获得的标准图形的笔划特征点。还有,图22中的“○”表示输入图形的后继字笔划64的特征点。Known from the stroke features of the standard figure in Figure 13, the feature point information of
接着,后继字笔划对应连接单元8进行输入图形的笔划特征点与标准图形的笔划特征点的距离最小的对应连接。一般地,输入图形的笔划A与标准图形的笔划B的对应连接通过给出从输入图形的笔划A的特征点的元素串的集合{0,1,……I}向标准图形的笔划B的特征点的元件串的集合{0,1,……J}的映射ω:{0,1,2,……I}→{0,1,2,……J}(式1)来确定。Next, the successor character stroke corresponding
这里,ω是使笔划A与笔划B的两个端点一致的映射,满足ω(0)=0,ω(1)=J(即,始点、终点一致),是构成元素的顺序不允许反转的单调映射,所以称之为伸缩映射。还有,假设笔划A的第i个元素为ai、笔划B的第j个元素为bj、ai和bj的距离d(ai,bj)为特征点ai和bj间的欧几里德距离、笔划A的第i个元素到笔划B的第j个元素的累计距离为g(i,j),则递归公式为Here, ω is a mapping that makes the two endpoints of stroke A and stroke B consistent, satisfying ω(0)=0, ω(1)=J (that is, the starting point and the end point are consistent), and the order of the constituent elements does not allow reversal The monotonic mapping of , so it is called stretching mapping. Also, assume that the i-th element of stroke A is ai, the j-th element of stroke B is bj, and the distance d(ai, bj) between ai and bj is the Euclidean distance between feature points ai and bj, stroke The cumulative distance from the i-th element of A to the j-th element of stroke B is g(i, j), then the recursive formula is
g(i,j)=d(ai,bi)+min{g(i-1,k)|0≤k≤j} (式2)g(i, j)=d(ai, bi)+min{g(i-1, k)|0≤k≤j} (Formula 2)
其中g(0、0)=d(a0,b0),g(0,j)=∞(j>0时)成立。Where g(0, 0)=d(a0, b0), g(0, j)=∞ (when j>0) holds true.
这样,能够用动态规划法求取输入图形的笔划A与标准图形的笔划B的对应连接距离g(I,J)。这里,若记录得出min{g(i-1),k|0≤k≤j}的k值,就得到了特征点间的最佳对应连接信息。就是说,可知标准图形的笔划的各特征点与输入图形的后继字笔划的哪个特征点对应。In this way, the dynamic programming method can be used to obtain the corresponding connection distance g(I, J) between the stroke A of the input figure and the stroke B of the standard figure. Here, if the k value of min{g(i-1), k|0≤k≤j} is recorded, the best corresponding connection information between feature points can be obtained. That is to say, it can be known which feature point of the stroke of the standard figure corresponds to which feature point of the stroke of the subsequent character of the input figure.
如果用动态规划法进行图21的标准图形的笔划的特征点与图22的输入图形的笔划特征点(采样点)的对应连接,则得到对应连接的评价值与图23的箭头所示的对应连接结果。If carry out the corresponding connection of the feature point of the stroke of the standard figure of Fig. 21 and the stroke feature point (sampling point) of the input figure of Fig. 22 with dynamic programming method, then obtain the correspondence shown in the arrow shown in the evaluation value of corresponding connection and Fig. 23 Connection result.
进到步骤S62,后继字笔划对应连接单元8根据所得的对应连接结果,用笔划特征提取单元2求取分断笔划的笔划特征。为求得笔划特征,首先用与标准图形的笔划的始终点对应连接的特征点分断输入图形的后继字笔划,将始终点所夹的笔划作为分断笔划来提取。Proceeding to step S62, the subsequent character stroke corresponding
其次,用笔划特征提取单元2提取分断笔划的笔划特征。图24是根据输入图形的笔划64求出分断笔划130、131的例。Next, stroke features of the segmented strokes are extracted by the stroke
进到步骤S63,后继字笔划对应连接单元8对当前的候选笔划的所有的笔顺,检查与输入图形的后继字笔划的对应连接。检查的结果是不利用全部笔划组合进行对应连接时,进到步骤S60,对其他笔顺进行对应连接。Proceeding to step S63, the subsequent character stroke corresponding
在输入图形的笔划64的情况下,进到步骤S60,按笔划73、72的笔顺进行对应连接。对应连接处理和笔划72、73的情况相同,故省略说明。When the
这里是笔划72、73的笔顺的情况,但由于输入图形的笔划特征点与标准图形的笔划特征点的距离变小,所以,能够作为正确的部分笔划的对应连接结果来采用。This is the case of the stroke order of
如果所有笔顺连接都结束,就进到图17的步骤S56,后继字笔划对应连接单元8对当前输入图形的后继字笔划,检查是否对所有后继字笔划进行了对应连接,在结束了对应连接的情况下,结束后继字笔划对应连接单元8的处理,进到图2的步骤S8。在对应连接没结束时,返回到图17的步骤S51。在图3的输入图形中,由于后继字笔划只有一个,所以,条件满足,进到步骤S8。If all the stroke order connections are all over, just go to step S56 of Fig. 17, the subsequent word stroke corresponding
在步骤S8中,控制单元11检查是否对笔划特征辞典3的所有标准图形进行匹配,在结束时进到步骤S9,在没结束时进到步骤S4,进行与下一标准图形的对应连接。In step S8, the
对图3的输入图形,由于与所有标准图形的对应连接结束,所以,进入步骤S9。With regard to the input pattern in FIG. 3 , since the correspondence connection with all the standard patterns has been completed, the process proceeds to step S9.
在步骤S9中,控制单元11将直线笔划对应连接单元5、非直线笔划对应连接单元7、后继字笔划对应连接单元8所得的对应连接结果送给文字评价单元10。In step S9, the
文字评价单元10根据所送来的对应连接结果,求取文字的评价值,选择距离最小的识别对象文字(标准图形)作为最终的识别结果。The
文字的评价值D由下式3求取。The evaluation value D of the character is obtained by the following
D=(Wd×Dd+Ww×Dw+Wh×Dh+Wvw×Dvw+Wvh×Dvh+Wvd×Dvd)/k (式3)D=(Wd×Dd+Ww×Dw+Wh×Dh+Wvw×Dvw+Wvh×Dvh+Wvd×Dvd)/k (Formula 3)
这里,笔划方向正规化后的距离用Dd表示,笔划宽度正规化后的距离用Dw表示,笔划高度方向正规化后的距离用Dh表示,假想笔划宽度正规化后的距离用Dvw表示,假想笔划高度正规化后的距离用Dvh表示,假想笔划方向正规化后的距离用Dvd表示,Wd表示笔划方向的数,Ww表示笔划宽度的数,Wh表示笔划高度的数,Wvw表示假想笔划宽度的数,Wvh表示假想笔划高度的数,Wvb表示假想笔划方向的数,k表示当前标准图形的画数。还有,Wd+Ww+Wh+Wvw+Wvh+Wvd=1。Here, the normalized distance of the stroke direction is represented by Dd, the normalized distance of the stroke width is represented by Dw, the normalized distance of the stroke height direction is represented by Dh, and the normalized distance of the virtual stroke width is represented by Dvw. The height normalized distance is represented by Dvh, the normalized distance of the imaginary stroke direction is represented by Dvd, Wd represents the number of stroke direction, Ww represents the number of stroke width, Wh represents the number of stroke height, Wvw represents the number of imaginary stroke width , Wvh represents the number of imaginary stroke height, Wvb represents the number of imaginary stroke direction, and k represents the number of strokes of the current standard graphics. Also, Wd+Ww+Wh+Wvw+Wvh+Wvd=1.
但是,在直线笔划、非直线笔划及后继字笔划的对应连接结果存在多个的情况下,考虑不产生与输入图形的任一笔划都不对应连接的未对应标准图形的笔划和一个标准图形的笔划与多个输入图形的笔划对应的矛盾。But, under the situation that there are multiple corresponding connection results of straight line strokes, non-straight line strokes and successor word strokes, consider not producing the stroke of the non-corresponding standard figure that does not correspond to any stroke of the input figure and the stroke of a standard figure. Inconsistencies between stroke correspondences with strokes of multiple input shapes.
接着,对部分笔划提取单元9进行说明。图25是包括用四划书写七划的文字“言”的输入图形的例子。Next, the partial
下面,以图25的输入图形为例,说明部分笔划提取单元9的动作。在本例中,笔划141、142、143是利用直线笔划对应连接单元5和非直线笔划对应连接单元7对应连接的,省略其说明。Next, the operation of the partial
笔划140是后继字笔划,故进入图2的步骤S7,由后继字笔划对应连接单元8进行对应连接。
首先,进到图17的步骤S50,后继字笔划对应连接单元8由笔划可对应连接区确定单元6确定输入图形的后继字笔划的可对应连接区,求取可与后继字笔划对应连接的标准图形的笔划候选。At first, advance to step S50 of Fig. 17, subsequent character stroke corresponding
笔划可对应连接区域确定单元6使用图18的区域分割信息确定可对应连接区。可对应连接区域将到包括输入图形的笔划的外接矩形的区域的外侧的区域作为对象。图26是对图25的输入图形进行如图18所示的区域分割的例子。由图26可知,笔划140的外接矩形144包括区域A~L。结果,笔划140的笔划可对应连接区域是全部区域A~P。The stroke-corresponding connectable
后继字笔划对应连接单元8求取在确定的对应连接区存在的标准图形的笔划。图27是文字“言”的标准图形,是在区域A~P中所含的是所有笔划,但由于其中利用直线笔划对应连接单元5和非直线笔划对应连接单元7与输入图形的笔划一一对应连接的标准图形笔划的对应连接已经确定,所以将其除去。在本例中,由于笔划154、155由直线笔划对应连接单元5确定,笔划155由非直线笔划对应连接单元7确定,所以,将其除去。结果,选择笔划150、151、152、153这四个作为候选笔划。The
接着,进入图17的步骤S57,后继字笔划对应连接单元8检查是否对所有的后继字笔划确定了可对应连接区;如果确定了就进到步骤S51,如果未确定就进到步骤S50。在图25的输入图形中,由于后继字笔划只存在一个,所以,进到步骤S51,确定由后继字笔划的开始笔划、结束笔划构成的标准图形的候选笔划。Then, enter the step S57 of Fig. 17, whether successor character stroke corresponding
由图26,由于输入图形的笔划140的始点存在于区域B中,所以,可对应连接区域成为区域A、B、C、D、F、G,始点包括在该区域中的标准图形的笔划150、151、152成为开始笔划的候选。同样,笔划152、153成为结束笔划的候选。From Fig. 26, since the starting point of the
进到步骤S52,后继字笔划对应连接单元8检查标准图形的笔划候选数,但在本例中,由于候选字笔划数为4,所以,进到步骤S53。Proceed to step S52, and the stroke candidate number of subsequent word stroke corresponding
在步骤S53中,后继字笔划对应连接单元8由部分笔划提取单元9从输入图形的后继字笔划提取部分笔划。In step S53, the subsequent character stroke corresponding
使用图28所示的流程图说明部分笔划提取单元9的动作。The operation of the partial
在步骤S70中,部分笔划提取单元9将输入图形的笔划的弯曲点作为部分笔划的分断候选点提取,将部分笔划的始点作为输入图形的笔划的始点。In step S70, the partial
图29是表示输入图形的后继字笔划的分断点候选的图。在图29中,160~165是输入图形的笔划140的部分笔划分断候选点,160是部分笔划的始点,167是部分笔划的终点。Fig. 29 is a diagram showing candidates for breaking points of subsequent character strokes of an input pattern. In FIG. 29 , 160 to 165 are partial stroke division candidate points of the
进到步骤S71,部分笔划提取单元9从输入图形的笔划终点逆向寻找分断点,求取从当前部分的笔划始点到分断点的笔划的候选笔划数小于3的分断点,将该分断点作为部分笔划的终点。在图29的例中,从始点166到分断点候选点160、161、162、163、164、165的部分笔划被首先选中,该部分笔划存在于区域A、B、C、D、E、F、G、I、J、K中,所以,可对应连接区域是全部区域,候选笔数4仍为4个,所以,进到分断候选点165的前一分断候选点164。同样,沿始点方向将顺序分断点移动,求取部分笔划,提取其候选笔划。在本例中,由于从始点166到分断候选点160、161、162的部分笔划的候选笔划数为3,所以,选择该笔划作为部分笔划。Proceed to step S71, the partial
进到步骤S72,后继字笔划对应连接单元8进行所得的部分笔划与候选笔划的对应连接。Proceeding to step S72, the successor character stroke corresponding
图31是表示步骤S72的处理流程的流程图。使用图31的流程图说明步骤S72的处理。FIG. 31 is a flowchart showing the flow of processing in step S72. The processing of step S72 will be described using the flowchart of FIG. 31 .
在步骤S80中,利用部分笔划的外接矩形与各候选笔划的外接矩形计入可对应连接的候选。具体地说,在用规定的阈值展开部分笔划的外接矩形的区域中选择包括外接矩形的候选笔划。图30的180是展开的部分笔划的外接矩形。在本例中,将候选笔划150、151、152中的笔划152从候选中除去。In step S80 , the circumscribing rectangles of the partial strokes and the circumscribing rectangles of the candidate strokes are used to count candidates for corresponding connection. Specifically, a candidate stroke including a circumscribed rectangle is selected in an area of a circumscribed rectangle of a partial stroke developed by a predetermined threshold. 180 in FIG. 30 is the circumscribed rectangle of the expanded partial stroke. In this example, the
进到步骤S81,后继字笔划对应连接单元8与候选笔划数小于3的情况相同,确定标准图形的候选笔划的笔顺。Proceeding to step S81, the successor character stroke corresponding
这里,在当前部分笔划包括输入图形的后继字笔划的始点时或包括开始笔划的候选、终点的情况下,笔顺组合的确定也使用结束笔划的候选即信息。当前的部分笔划的候选笔划数存在两个,任一笔划都满足开始笔划的条件,所以,成为两种笔顺。Here, when the current partial stroke includes the starting point of the stroke of the succeeding character of the input pattern or includes the candidate and end point of the starting stroke, the determination of the stroke order combination also uses the information of the candidate of the ending stroke. There are two candidate strokes for the current partial strokes, and any stroke satisfies the condition of the start stroke, so there are two stroke orders.
这里,首先进行笔划150、151的笔顺的对应连接。Here, first, the corresponding connection of the stroke order of the
进到步骤S82,后继字笔划对应连接单元8进行部分笔划和标准图形的候选笔划的对应连接。对应连接的方法和图20的步骤S61相同(与上述后继字笔划对应连接时相同)。在图20的步骤S61中,笔划150、151都是直线笔划,各自的始点和终点都成为特征点,利用动态规范求取部分笔划的特征点。Proceeding to step S82, the subsequent character stroke corresponding
进到步骤S83,后继字笔划对应连接单元8根据所得的对应连接结果分断部分笔划,由笔划特征提取单元2求取该分断的笔划的笔划特征。图32是从图30的部分笔划求取分断笔划190、191的例子。Proceed to step S83, the successor character stroke corresponding
进到步骤S84,后继字笔划对应连接单元8对当前的候选笔划的全部笔顺检查是否进行了与输入图形的部分笔划的对应连接。检查结果是没对所有组合进行时,进到步骤S81,对其他笔顺进行对应连接。Proceeding to step S84, the successor character stroke corresponding
在本例中,进到步骤S81,进行笔划151、150的其他笔顺的对应连接。对应连接处理和笔划150、151的情况相同,故省略说明。这里,笔划150、151的笔顺的情况是输入图形的笔划特征点与标准图形的笔划特征点的距离减少,所以,被采用来作为正确的部分笔划的对应连接结果。In this example, proceed to step S81 to perform corresponding connection of
接着,进到图28的步骤S73,后继字笔划对应连接单元8判断当前的后继字笔划的对应连接是否结束,在结束时进到步骤S75,在没结束时进到步骤S74。Then, proceed to step S73 of Fig. 28, the subsequent word stroke corresponding
在图29的例中,由于是没有结束,所以进到步骤S74。在步骤S74中,将当前的部分笔划的终点作为下一个部分笔划的始点。在这里,图29的分断候选点162成为下一个部分笔划的始点。In the example of FIG. 29, since it is not finished, it progresses to step S74. In step S74, the end point of the current partial stroke is used as the starting point of the next partial stroke. Here, the
进到步骤S71,后继字笔划对应连接单元8使用部分笔划提取单元9制作下一部分笔划。具体地说,对新的部分笔划,部分笔划提取单元9从部分笔划的终点反向求取分断点,求取从当前的部分笔划的始点到分断点的笔划的候选笔划数小于3的分断点,将该分断点作为部分笔划的终点。Proceed to step S71, the successor character stroke corresponding
在图29的例中,由于将已确定的笔划150、151从候选去除去,所以,笔划152、153任一个都作为候选。因而,选择由分断候选点162、163、164、165和部分笔划的终点167构成的其余所有的部分笔划。In the example of FIG. 29, since the specified
进到步骤S72,后继字笔划对应连接单元8进行所得的部分笔划与候选笔划的对应连接。Proceeding to step S72, the successor character stroke corresponding
使用图31的流程图说明部分笔划对应连接的动作。The operation of partial stroke correspondence connection will be described using the flowchart in FIG. 31 .
在步骤S80中,利用部分笔划的外接矩形与各候选笔划的外接矩形插入可对应连接的候选。在本例中,候选笔划152、153成为两方都满足条件的对应连接的对象。In step S80, candidates for corresponding connections are inserted using the circumscribing rectangles of some strokes and the circumscribing rectangles of each candidate stroke. In this example, the candidate strokes 152, 153 become the objects of the corresponding connection that both satisfy the conditions.
接着,进到步骤S81,后继字笔划对应连接单元8确定笔顺。由于当前的部分笔划的候选笔划数存在两个,任一笔划都满足结束笔划的条件,所以,成为两种笔划。Next, proceed to step S81, and the subsequent character strokes correspond to the
这里,首先进行接笔划152、153的笔顺的对应连接。Here, firstly, the corresponding connection of the order of
进入步骤S82,后继字笔划对应连接单元8进行部分笔划与标准图形的候选笔划的对应连接。对应连接的方法与图20的步骤S61相同,由于笔划152、153都是直线笔划,所以,各始终点都是特征点,用动态规范法进行与部分笔划的特征点的对应连接。Going to step S82, the subsequent character stroke corresponding
这里,由于当前的部分笔划的始点162和到该点为止的部分笔划的终点一致,所以,存在着在其间存在实际上在笔划和笔划间没有书写的假想笔划成分的情形。因此,进行将分断候选点162作为始点的对应连接和将下一个分断候选点163作为始点的对应连接,将距离小的一种作为最终结果。在本例中,选择将分断候选点163作为始点的对应连接。Here, since the
进到步骤S83,后继字笔划对应连接单元8从所得的对应连接结果分断部分笔划,用笔划特征提取单元2求取其笔划特征。图33是从部分笔划求取分断笔划192、193的例子。Proceed to step S83, the
进到步骤S84,后继字笔划对应连接单元8对当前的候选笔划检查是否对所有笔顺都进行了与输入图形的部分笔划的对应连接。当检查结果是没有对所有组合进行的情况下,进到步骤S81,进行对其他笔顺的对应连接。Proceeding to step S84, the successor character stroke corresponding
在本例中,进行步骤S81,以笔划153、152的笔顺进行对应连接,对应连接处理和笔顺152、153的情况相同,故省略说明。这里是笔顺152、153的笔顺的情况,但由于输入图形的笔划特征和标准图形的笔划特征点的距离减少,所以,被采用来作为正确的部分笔划的对应连接结果。In this example, step S81 is performed to perform corresponding connection with the stroke order of
接着,进到图28的步骤S73,后继字笔划对应连接单元8判断当前的后继字笔划的对应连接是否结束,结束时就进到步骤75,没结束时就进到步骤S74。在本例中,由于已经结束,所以进到步骤S75。Then, proceed to step S73 of Fig. 28, whether the
在步骤S75中,后继字笔划对应连接单元8根据各个部分笔划的对应连接结果求取后继字笔划的对应连接结果。这里,由于在各个部分笔划中对应连接两个标准图形的笔划,所以,把将两者按部分笔划的顺序排列的笔划作为最终的后继字笔划采用。In step S75, the subsequent character stroke corresponding
在本例的情况下,在标准图形的笔划中没有对应连接的笔划虽然存在,但在没对应连接的笔划存在的情况下,包括没有对应连接的笔划后进行与连接了连续的部分笔划的笔划的对应连接、对应连接的距离变小时,采用该笔划。此时,在各个部分笔划间插入的标准图形的笔划数只有一个。In the case of this example, although there are strokes without corresponding connections in the strokes of the standard figure, if there are strokes without corresponding connections, the strokes that are connected with the continuous partial strokes after including the strokes without corresponding connections The corresponding connection and the distance of the corresponding connection become smaller, and the stroke is adopted. At this time, the number of strokes of the standard graphic inserted between each partial stroke is only one.
利用上述处理,从后继字笔划提取部分笔划的笔划对应连接结束,由文字评价单元10计算最终的文字的评价值,确定识别结果。Through the above-mentioned processing, the stroke-corresponding connection of partial strokes extracted from the subsequent character strokes is completed, and the final character evaluation value is calculated by the
还有,在本实施例中,在后继字笔划的对应连接中使用的是特征点的坐标信息,但也可以以一定间距对笔划进行采样、使用该采样点间的方向码进行对应连接。Also, in this embodiment, the coordinate information of feature points is used in the corresponding connection of subsequent character strokes, but it is also possible to sample the strokes at a certain interval and use the direction codes between the sampling points for corresponding connection.
还有,在本实施例中,为确定笔划可对应连接区,使用的是将文字分割成4×1、1×4、4×4的区域(分别表示水平方向×垂直方向的分割区域),但分割数可以是任意的数。Also, in this embodiment, in order to determine that the stroke can correspond to the connecting region, what is used is to divide the character into 4*1, 1*4, 4*4 regions (representing the division regions of the horizontal direction*vertical direction respectively), However, the number of divisions may be any number.
还有,在本实施例中,将后继字笔划的分断候选点作为弯曲点,但也可使用按一定间隔采样的点。Also, in this embodiment, the segmentation candidate points of subsequent character strokes are used as inflection points, but points sampled at regular intervals may also be used.
如上,根据本实施例,对于没有接着书写的笔划,只由笔划特征进行对应连接,只对未对应连接的笔划使用细的特征点信息进行对应连接,并且只使用笔划的特征进行最终评价,所以,可高速地进行文字识别,同时能够对文字变形实现强识别方式。As above, according to this embodiment, for the strokes that are not followed by writing, only the stroke features are used for corresponding connection, only for the strokes that are not correspondingly connected, the fine feature point information is used for corresponding connection, and only the characteristics of the strokes are used for final evaluation, so , can perform character recognition at high speed, and can realize a strong recognition method for character deformation at the same time.
还有,不只是没有接着书写的笔划,对于接着书写的笔划也进行笔顺的对应连接,也可以高精度地进行文字识别。In addition, not only there are no subsequent strokes, but also the corresponding connection of the stroke order is performed for the subsequent strokes, and character recognition can be performed with high accuracy.
还有,对于接着书写的笔划,分割成部分笔划后进行对应连接,所以,处理时间不增加就可进行考虑了笔顺变化的对应连接。Also, since the strokes to be written next are divided into partial strokes and connected accordingly, the corresponding connection can be carried out in consideration of changes in the stroke order without increasing the processing time.
还有,将手写文字即输入笔划分类为直线笔划、非直线笔划或后继字笔划,按直线笔划对应连接、非直线笔划对应连接、后继字笔划对应连接的顺序进行对应连接,对于各个对应连接中对应连接失败的标准图形,由于不进行此后的对应连接处理,所以,不进行与不同于正解的标准图形的对应连接,能够进行效率高的对应连接处理,可高速地进行文字识别。In addition, handwritten characters, that is, input strokes, are classified into straight line strokes, non-straight line strokes or subsequent word strokes, and corresponding connections are made in the order of the corresponding connections of straight line strokes, the corresponding connection of non-straight line strokes, and the corresponding connection of subsequent word strokes. Since the subsequent corresponding connection processing is not performed on standard patterns that fail to be connected, efficient corresponding connection processing can be performed without performing corresponding connection with standard patterns different from the correct solution, and high-speed character recognition can be performed.
还有,由于按笔划的始点、终点值的特征点进行后继字笔划的对应连接,所以,对应连接能够高速进行,并且可将笔划特征辞典的容量抑制得小。Also, since the corresponding connection of subsequent character strokes is performed according to the feature points of the start point and end point of the stroke, the corresponding connection can be performed at high speed, and the capacity of the stroke feature dictionary can be kept small.
本发明如上所述那样构成,所以,实现了下面所述的效果。The present invention is constituted as described above, therefore, the effects described below are achieved.
在本发明的第一或第五方向中,根据在输入板上书写的手写文字(输入图形)的笔划形状,将笔划分类为直线笔划、非直线笔划和后继字笔划,将各笔划的特征和标准图形(识别对象文字)的笔划特征对应连接,即使是对笔顺、画数变化的输入图形,也能够高速且高精度地进行文字识别。In the first or fifth direction of the present invention, according to the stroke shape of the handwritten characters (input graphics) written on the input board, the strokes are classified into linear strokes, non-linear strokes and subsequent word strokes, and the characteristics of each stroke and The stroke features of standard graphics (characters to be recognized) are connected correspondingly, and even if the input graphics vary in stroke order and number of strokes, high-speed and high-precision character recognition can be performed.
第二或第六方面由于确定输入图形的各笔划的存在区,将与各笔划的存在区对应的标准图形的笔划与输入图形的笔划对应连接,所以,能够高速地进行文字识别。In the second or sixth aspect, since the existence area of each stroke of the input pattern is determined, and the strokes of the standard pattern corresponding to the existence area of each stroke are correspondingly connected with the strokes of the input pattern, character recognition can be performed at high speed.
第三或第七方面由直接笔划对应连接来对对应连接的标准图形进行输入图形的非直线笔划和标准图形的笔划对应连接,所以,能够高速地进行文字识别。In the third or seventh aspect, the non-linear strokes of the input graphics and the strokes of the standard graphics are correspondingly connected to the correspondingly connected standard graphics by direct stroke-corresponding connection, so character recognition can be performed at high speed.
第四或第八方面由直线笔划对应连接、非直线笔划对应连接来对对应连接的标准图形进行输入图形的后继字笔划和标准图形的笔划的对应连接,所以,能够高速地进行文字识别。In the fourth or eighth aspect, the strokes of the subsequent characters of the input graphics and the strokes of the standard graphics are connected correspondingly to the correspondingly connected standard graphics by corresponding connection of linear strokes and non-linear strokes, so that character recognition can be performed at a high speed.
Claims (8)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP14763497A JP3419251B2 (en) | 1997-06-05 | 1997-06-05 | Character recognition device and character recognition method |
JP147634/97 | 1997-06-05 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1201955A CN1201955A (en) | 1998-12-16 |
CN1093966C true CN1093966C (en) | 2002-11-06 |
Family
ID=15434776
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN98108373A Expired - Fee Related CN1093966C (en) | 1997-06-05 | 1998-05-13 | Apparatus and method for character identification |
Country Status (4)
Country | Link |
---|---|
JP (1) | JP3419251B2 (en) |
KR (1) | KR100299725B1 (en) |
CN (1) | CN1093966C (en) |
TW (1) | TW385414B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100930802B1 (en) | 2007-06-29 | 2009-12-09 | 엔에이치엔(주) | Browser control method and system using images |
US8094941B1 (en) | 2011-06-13 | 2012-01-10 | Google Inc. | Character recognition for overlapping textual user input |
JP6797869B2 (en) * | 2018-08-08 | 2020-12-09 | シャープ株式会社 | Book digitization device and book digitization method |
-
1997
- 1997-06-05 JP JP14763497A patent/JP3419251B2/en not_active Expired - Lifetime
- 1997-12-17 TW TW086119128A patent/TW385414B/en not_active IP Right Cessation
-
1998
- 1998-03-10 KR KR1019980007800A patent/KR100299725B1/en not_active IP Right Cessation
- 1998-05-13 CN CN98108373A patent/CN1093966C/en not_active Expired - Fee Related
Also Published As
Publication number | Publication date |
---|---|
JPH10334187A (en) | 1998-12-18 |
KR100299725B1 (en) | 2001-11-30 |
KR19990006359A (en) | 1999-01-25 |
CN1201955A (en) | 1998-12-16 |
JP3419251B2 (en) | 2003-06-23 |
TW385414B (en) | 2000-03-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN1158627C (en) | Method and device for character recognition | |
CN1140878C (en) | Character recognition method, correction method and character recognition device | |
CN1161687C (en) | Handwriting Matching Technology | |
CN1235177C (en) | Hand-wirte signature recognition program, method and device | |
CN1171162C (en) | Apparatus and method for retrieving character strings based on character classification | |
CN1174332C (en) | Method and device for converting expressions | |
CN1151464C (en) | Method for reading characters and method for reading postal addresses | |
CN1215433C (en) | Online character identifying device, method and program and computer readable recording media | |
CN1818927A (en) | Fingerprint identification method and system | |
CN101510252B (en) | Area extraction method, character recognition method, and character recognition device | |
CN1162795A (en) | Pattern Recognition Apparatus and Method | |
CN1177407A (en) | Method and system for velocity-based head writing recognition | |
CN1571980A (en) | Character string identification | |
CN1226696C (en) | Method for retrieving cursive handwritten annotations | |
CN1945599A (en) | Image processing device, image processing method, and computer program product | |
CN1215457C (en) | Sentense recognition device, sentense recognition method, program and medium | |
CN1667641A (en) | character recognition method | |
CN1040276A (en) | Simplified and complex character root Chinese character entering technique and keyboard thereof | |
CN1281191A (en) | Information retrieval method and information retrieval device | |
CN1452157A (en) | Speech recognition apparatus and method, and recording medium having recorded thereon speech recognition program | |
CN1041773C (en) | Character recognition method and apparatus based on 0-1 pattern representation of histogram of character image | |
CN1664846A (en) | On-line Handwritten Chinese Character Recognition Method Based on Statistical Structural Features | |
CN1251130C (en) | Method for identifying multi-font multi-character size print form Tibetan character | |
CN1684492A (en) | Image dictionary creation device, encoding device, image dictionary creation method | |
CN1200387C (en) | Statistic handwriting identification and verification method based on separate character |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C06 | Publication | ||
PB01 | Publication | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
C17 | Cessation of patent right | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20021106 |