[go: up one dir, main page]

CN114495117B - A method for extracting strokes from handwritten Chinese characters - Google Patents

A method for extracting strokes from handwritten Chinese characters Download PDF

Info

Publication number
CN114495117B
CN114495117B CN202210137230.7A CN202210137230A CN114495117B CN 114495117 B CN114495117 B CN 114495117B CN 202210137230 A CN202210137230 A CN 202210137230A CN 114495117 B CN114495117 B CN 114495117B
Authority
CN
China
Prior art keywords
point
inflection point
angle
points
skeleton
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210137230.7A
Other languages
Chinese (zh)
Other versions
CN114495117A (en
Inventor
李振江
王轶群
张倩雪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Gansu University Of Political Science And Law
Original Assignee
Gansu University Of Political Science And Law
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gansu University Of Political Science And Law filed Critical Gansu University Of Political Science And Law
Priority to CN202210137230.7A priority Critical patent/CN114495117B/en
Publication of CN114495117A publication Critical patent/CN114495117A/en
Application granted granted Critical
Publication of CN114495117B publication Critical patent/CN114495117B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

本发明公开了一种中文手写汉字的笔划提取方法,包括:字符图像预处理;像素的方向点角度特征计算;拐点团的方向点角度特征计算;基于笔划骨架和轮廓的笔划提取。本发明的方法,可以将脱机手写中文汉字转化为相应的笔划,之后基于笔划开展汉字手写识别、写字人身份识别等下游任务,可以极大提高整个处理流程的可解释性。使用本发明的方法,可以更加准确的将脱机手写中文汉字转化为相应的笔划,进而提高后续汉字手写识别、写字人身份识别等下游任务的准确率。

The present invention discloses a method for extracting strokes of handwritten Chinese characters, including: character image preprocessing; direction point angle feature calculation of pixels; direction point angle feature calculation of inflection point clusters; and stroke extraction based on stroke skeletons and contours. The method of the present invention can convert offline handwritten Chinese characters into corresponding strokes, and then carry out downstream tasks such as Chinese character handwriting recognition and writer identity recognition based on the strokes, which can greatly improve the interpretability of the entire processing flow. Using the method of the present invention, offline handwritten Chinese characters can be more accurately converted into corresponding strokes, thereby improving the accuracy of subsequent downstream tasks such as Chinese character handwriting recognition and writer identity recognition.

Description

Stroke extraction method for Chinese handwritten Chinese characters
Technical Field
The invention relates to a word processing technology, in particular to a stroke extraction method of Chinese handwritten Chinese characters.
Background
The invention of the characters is an important driving force for the development of human civilization, and after entering an information age, how to process the characters is always the key point of related discipline research, such as recognition of various online/offline handwritten characters, identity verification based on writing style, automatic processing of various documents, intelligent education and the like, and particularly various character processing technologies have been widely applied along with the rapid development of the current deep learning technology. However, in the above process, due to the black box characteristic of the deep neural network, the interpretation of the finally obtained model is not strong, and it is difficult to correlate with the prior knowledge intuitively, which is not beneficial to further deepening of application.
Chinese characters are cultural magnificents of Chinese nationalities, inherit the cultural spirit of the first generation of Chinese people, and have special character shapes. According to the pen-using rule of Chinese handwriting, namely 'Yongji eight method', all Chinese characters can be split into eight basic strokes of points, horizontal, vertical, hooks, lifting, skimming, short skimming and right falling, and basic strokes of different types and numbers are orderly distributed in a character space according to a certain word-forming logic, so that the simple and attractive square Chinese characters are formed. The above-described process is clear and intuitive for human perception, and is easy to accept and understand by people.
In the skeleton-based method, the skeleton of the character is firstly required to be obtained, and is usually realized based on various refinement algorithms, then the intersection points in the skeleton are detected, the skeleton is split into a plurality of skeleton line segments based on the intersection points, finally the split skeleton line segments are combined based on a certain rule, a single stroke skeleton is restored, and the stroke is restored by the stroke skeleton. In the process, the problems to be solved mainly include the problem of processing strokes with different widths, the problem of hairline and bifurcation of a skeleton, the problem of deformation of the strokes and the like. A schematic flow of this type of method is shown in figure 1.
In the contour-based method, firstly, the contour of a character is required to be obtained, which is usually realized based on various edge detection algorithms, secondly, the direction attribute of points on a contour line is detected, corresponding stroke inflection points are found out from the detected contour line, inflection points belonging to the same stroke intersection point are combined, the accurate position of the intersection point in a stroke is determined, the stroke is split into a plurality of disjoint contour fragments by utilizing the intersection point, finally, the split contour fragments are combined based on a certain rule, a single stroke contour is obtained, and the stroke contour is filled to obtain the stroke. In the above-described process, finding and judging of inflection points and the relative positional relationship between inflection points are important points of attention. A schematic flow of the method is shown in fig. 2.
The two methods are similar in overall flow, namely, first, the high-order information (skeleton and outline) of the original character image is extracted, then, the information is used for obtaining the intersection point, splitting the character to obtain the stroke segment, and finally, the stroke segment is combined based on a certain rule to obtain the final stroke. Where how to efficiently describe high-order information and the rule of combination of stroke segments is the technical focus.
As known from the related literature, the prior art is mainly oriented to printed characters or handwriting characters with more standard writing, and has better effect on the regular characters. However, when general handwriting is processed, the problems of continuous writing, writing deformation and the like exist, so that the stroke extraction error is large, and the performance of downstream tasks is seriously affected. The method is characterized in that the method comprises the steps of obtaining outline information and skeleton information, wherein the outline information is a first-order outline, the skeleton information is a second-order outline, the skeleton information is a first-order outline, the skeleton information is a second-order outline, the first-order outline is a first-order outline, the second-order outline is a second-order outline, and the first-order outline is a third-order outline.
Disclosure of Invention
The invention mainly aims to provide a stroke extraction method for handwriting Chinese characters.
The invention adopts the technical scheme that the stroke extraction method of the Chinese handwritten Chinese character comprises the following steps:
Preprocessing a character image;
Calculating the angle characteristics of the direction points of the pixels;
calculating the direction point angle characteristics of the inflection point group;
stroke extraction based on stroke skeleton and outline.
Further, the character image preprocessing includes:
The original image is marked as I, the input original image is binarized, and the binarized character image is marked as Ibw;
extracting the edges of the characters in the binary image Ibw by using a Canny algorithm, and marking the edge image as Iedge;
The skeleton of the character in the binary image Ibw is extracted using the Rosenfeld algorithm, and the skeleton image is denoted as Isk.
Still further, the direction point angle feature calculation of the pixel includes:
Taking the current pixel as a center, taking a square area with a neighborhood radius of N as an area localRect to be calculated, and only reserving the pixels which are communicated with the current pixel in the area;
wherein the value of N is related to the size of the character image, and N is usually 3 for the character image with the size of about 100 x 100;
Deleting the point in the radius range of N-1 in localRect, wherein the rest part is the direction point of the current pixel, and if a plurality of direction points are mutually adjacent, only one point which is farthest from the current pixel is reserved;
calculating an included angle theta between a connecting line of the current pixel and a direction point and an x-axis by taking the current pixel as a coordinate origin, wherein the calculation formula of theta is shown as formula (1):
θ=atan2(x-x0,y-y0) (1)
Wherein (x 0,y0) is the coordinate of the current pixel, (x, y) is the coordinate of the current direction point, the possible value range of theta is [ -180, 180), and the value of the angle theta is the first-order direction point angle characteristic of the current pixel point;
If the number of the direction points of a certain pixel point is more than or equal to 2, the relative angle difference delta between any two direction points can be calculated, and the calculation formula of delta is shown as the formula (2):
Wherein the subscripts a and b respectively represent any two direction points of the same pixel point, θ is the angle of the direction point corresponding to the point, and the value is relative position, so that the value needs to be normalized to the angle range of [0,180 ]) Delta values that form the second order directional point angle characteristic of the current pixel.
Still further, the method for extracting strokes of Chinese handwritten Chinese characters according to claim 1, wherein,
The direction point angle characteristic calculation of the inflection point group comprises the following steps:
Inflection point and skeleton line segment extraction
Calculating first-order and second-order direction point angle characteristics of all pixel points on the skeleton diagram Isk, and deleting the pixel points with the number of the first-order direction points being 0 to obtain a new Isk;
Extracting all inflection points in Isk, judging whether one pixel point is provided with two conditions, wherein one pixel point is provided with 2 direction points and the second order direction point angle is smaller than 145, and the other pixel point is provided with more than 2 direction points;
Subtracting Isk-i from Isk to obtain skeleton line segment graph Isk-l, wherein the number of direction points of all pixels is less than 3 and the second-order direction point angle is greater than 135;
direction point angle feature calculation of inflection point group
Calculating the distance between all pixels in the inflection point group and the center pixel, and constructing a square region localRect to be calculated by taking the farthest distance as a radius N;
Deleting localRect points in the N-1 radius range, wherein the rest part is the direction point of the current inflection point group, as shown in fig. 9 (b), and if a plurality of direction points are mutually adjacent, only one point which is farthest from the center of the current inflection point group is reserved;
Calculating an included angle theta between the direction point and the center line of the current inflection point group aiming at each direction point, searching a skeleton line segment communicated with the included angle theta, taking the value of the angle theta as the first-order direction point angle characteristic of the current inflection point group if no skeleton line segment is communicated with the direction point, taking the value of the direction angle phi of the skeleton line segment as the first-order direction point angle characteristic of the current inflection point group if the skeleton line segment is communicated with the direction point, and normalizing the value of the angle phi into the range of [ -180,180) according to the value of the angle theta because the value range of the phi is [ -90,90), wherein the normalization method is shown in a formula (3):
Calculating the relative angle difference delta between any two direction points in the inflection point group, wherein the calculation formula is shown as the formula (2), if one inflection point group has m direction points, the calculation can be performed Delta values that constitute the second order directional point angle characteristic of the current inflection point cluster.
Still further, the stroke extraction based on the stroke skeleton and outline includes:
Finding out two direction points DP a and DP b with the largest second-order direction point angles for the inflection clusters with the number of all the direction points being more than three, constructing a new inflection cluster based on the original center point and DP a and DP b, and removing the two direction points from the original inflection cluster;
if the angle variance of the three second-order direction points of the inflection point group is smaller, judging that the inflection point group is Y-shaped intersection, splitting the inflection point group into three new inflection point groups, wherein only one direction point of the original inflection point group is in each new inflection point group, otherwise, considering that the inflection point group is T-shaped intersection, splitting the original inflection point group into two new inflection point groups, wherein one inflection point group comprises the direction point with the largest angle of the two second-order direction points, and the other inflection point group comprises only one direction point;
Judging the second-order direction point angle characteristic of the inflection point group, if the second-order direction point angle characteristic is more than 135, merging skeleton line segments corresponding to two direction points on the inflection point group, otherwise, keeping the relative relation of the skeleton line segments unchanged;
searching the nearest edge parallel to the stroke skeleton in the Iedge, filling a blank area between the edge and the stroke skeleton, and filling a corresponding skeleton intersection area to obtain the split stroke.
The invention has the advantages that:
The method of the invention can convert the offline handwritten Chinese characters into corresponding strokes, and then develop the downstream tasks such as the handwriting recognition of the Chinese characters, the identification of the identities of the Chinese characters and the like based on the strokes, thereby greatly improving the interpretability of the whole processing flow and being convenient for people to develop and understand the Chinese character information processing process in depth and further study.
The method can more accurately convert the offline handwritten Chinese characters into corresponding strokes, thereby improving the accuracy of downstream tasks such as handwriting recognition of the subsequent Chinese characters, identification of the letters, and the like.
In addition to the objects, features and advantages described above, the present invention has other objects, features and advantages. The present invention will be described in further detail with reference to the drawings.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application.
FIG. 1 is a flow chart of a skeleton-based method;
FIG. 2 is a flow chart of a contour-based method;
FIG. 3 is a sample view of a portion of a text image that needs to be processed;
FIG. 4 is a flow chart of the method of the present invention;
FIG. 5 is a current pixel map of the present invention;
FIG. 6 is a diagram with a current pixel as the origin of coordinates;
FIG. 7 is a graph of first and second order directional point angle characteristics for all pixel points on a skeleton graph Isk (where (a) is the new Isk obtained, (b) is the inflection point group, and (c) is the skeleton line segment graph Isk-l);
FIG. 8 is a directional diagram of each skeleton line segment calculated;
FIG. 9 is a diagram of the area to be calculated for constructing a square with the furthest distance as radius N;
(wherein, (a) is a region to be calculated localRect to construct a square, and (b) is a direction point of the current inflection point);
FIG. 10 is a drawing of a stroke skeleton diagram iteratively performed until all inflection groups have been processed;
FIG. 11 is a drawing of a resulting split stroke.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The object processed by the invention is an independent Chinese character picture, and the work such as denoising enhancement, layout analysis, line word segmentation, normalization and the like in the earlier stage of the document image is not in the discussion range. A sample of the portion of the text and picture that needs to be processed is shown in fig. 3.
The invention relates to a computer program algorithm, which comprises the steps of image preprocessing, pixel direction point angle characteristic calculation, inflection point angle characteristic calculation and stroke extraction, wherein the step flow is shown in fig. 4.
Character image preprocessing
1.1, The original image is denoted as I, the input original image is binarized, and the binarized character image is denoted as Ibw.
1.2 The edges of the characters in the binary image Ibw are extracted using the Canny algorithm and the edge image is denoted Iedge.
1.3 The skeleton of the character in the binary image Ibw is extracted using the Rosenfeld algorithm and the skeleton image is denoted Isk.
The technical methods related to binarization, edge extraction and skeleton extraction of the single character image are basic methods in the field of digital image processing, and implementation is provided in various image processing tool libraries, so that the specific technical details of the above processes are not separately described in the present invention.
Second, the angle characteristic of the direction point of the pixel
In order to describe the character structure information more accurately, the invention provides a description feature, which is named as a direction point angle feature. For a single pixel on the skeleton image Isk, the calculation process of the direction point angle characteristic is as follows:
2.1 with the current pixel as the center, a square area with a neighborhood radius of N is the area localRect to be calculated, and only the pixels in the area that are connected with the current pixel are reserved, as shown in fig. 5 (a). Where the value of N is related to the size of the character image, N is typically 3 for a character image of about 100 x 100.
2.2 Deleting localRect the point of the N-1 radius, the rest is the direction point of the current pixel, as shown in FIG. 5 (b). If the plurality of direction points are adjacent to each other, only one point, among which the farthest from the current pixel, is reserved.
2.3, Calculating an included angle theta between a connecting line of the current pixel and the direction point and the x-axis by taking the current pixel as the origin of coordinates, wherein the calculation formula of theta is shown in formula 1:
θ=atan2 (x-x 0,y-y0) equation (1)
Wherein (x 0,y0) is the coordinate of the current pixel, (x, y) is the coordinate of the current direction point, the possible value range of theta is [ -180, 180), and the value of the angle theta is the first-order direction point angle characteristic of the current pixel point. One pixel point may correspond to a plurality of direction points, and fig. 5 illustrates five cases of the number of the direction points being 1,2,3, and 4, and corresponds to five cases of endpoint, no intersection, Y-intersection, T-intersection, and X-intersection which occur most frequently in structural information.
2.4 If the number of the direction points of a certain pixel point is greater than or equal to 2, the relative angle difference delta between any two direction points can be calculated, as shown in fig. 6, and the calculation formula of delta is shown in formula 2:
Where subscripts a and b represent any two direction points of the same pixel point, θ is the angle of the direction point corresponding to the point, and since the value is the relative position, it is necessary to normalize it to the angle range of [0, 180). If one pixel point has m direction points, the method can calculate Delta values that form the second order directional point angle characteristic of the current pixel.
Third, the direction point angle characteristic of the inflection point group
The direction point angle characteristics of the skeleton pixels show that the extending change trend of the skeleton is a local description characteristic in a certain range of neighborhood, the character skeleton can be further split based on the characteristic, and the flow is as follows:
3.1 inflection point and skeleton segment extraction
3.1.1 Calculating the first-order and second-order direction point angle characteristics of all the pixel points on the skeleton map Isk, and deleting the pixel points (isolated noise) with the number of the first-order direction points being 0, so as to obtain a new Isk, as shown in fig. 7 (a).
3.1.2 Extract all inflection points in Isk. Whether a pixel point is an inflection point is judged by two conditions, wherein one is that the number of the direction points of the pixel is 2, the second-order direction point angle is smaller than 145, and the other is that the number of the direction points of the pixel is larger than 2. The pixel points can be judged as inflection points when one of the two conditions is met, all the inflection points form a turning point diagram Isk-i of the skeleton, the connected areas in the turning point diagram are solved, and each connected area is one inflection point group, as shown in fig. 7 (b).
3.1.3 Subtracting Isk-i from Isk to obtain skeleton line segment graph Isk-l, as shown in fig. 7 (c), in which the number of direction points of all pixels is less than 3 and the second order direction point angle is greater than 135. And solving the connected areas in Isk-l, wherein each connected area is a skeleton line segment. The direction angle phi of each skeleton line segment is calculated, and the value of the direction angle phi is the included angle between the long axis and the x axis of the current skeleton line segment area, as shown in fig. 8.
3.2 Direction Point Angle characterization of corner groups
3.2.1 For each corner cluster, its geometric center is calculated, with the pixel on the corner cluster closest to the geometric center as the corner cluster center. The distances between all pixels in the inflection group and the center pixel are calculated, and a square region to be calculated localRect is constructed with the farthest distance as a radius N, as shown in fig. 9 (a).
3.2.2 Deleting localRect points of the N-1 radius, the rest is the direction point of the current inflection point group, as shown in FIG. 9 (b). If multiple direction points are adjacent to each other, only one point of the points farthest from the center of the current inflection point cluster is reserved.
3.2.3 For each direction point, calculating the included angle theta (the calculation method is the same as 2.3) between the direction point and the center line of the current inflection point group, and searching the skeleton line segment communicated with the direction point. If there is no connection between the skeleton line segment and the direction point, the value of the angle theta is used as the first-order direction point angle characteristic of the current inflection point group, and if there is connection between the skeleton line segment and the direction point, the value of the skeleton line segment direction angle phi is used as the first-order direction point angle characteristic of the current inflection point group. Since the value of phi is in the range of [ -90, 90), it is also necessary to normalize it to the range of [ -180, 180) according to the value of the angle θ, the normalization method is as shown in formula 3:
Note that, at the time of normalization, since the y-axis direction in the image coordinate system is downward and the y-axis direction in the general coordinate system, the angle θ calculated based on the pixel coordinates and the angle Φ calculated based on the shape of the connected region are opposite in the y-direction, and thus it is necessary to add a negative sign to the value of Φ.
3.2.4 Calculating the relative angle difference delta between any two direction points in the inflection point group, the calculation of delta is shown in the same formula 2. If one inflection point group has m direction points, the calculation can be performedDelta values that constitute the second order directional point angle characteristic of the current inflection point cluster.
Fourth, stroke extraction based on stroke skeleton and outline
The direction point angle characteristic of the inflection point group reflects the relative position relation of skeleton line segments on the inflection point, is a global structure description characteristic, can combine the skeleton line segments based on the characteristic, and extracts strokes based on the feature, and comprises the following specific steps:
4.1 for inflection groups with the number of all the direction points being greater than three, finding out two direction points DP a and DP b with the largest second-order direction point angles, constructing a new inflection group based on the original center point and DP a and DP b, and removing the two direction points from the original inflection group. And iteratively executing the steps until the number of the direction points of the inflection point group is less than or equal to 2. After the step is executed, all the multi-directional point inflection groups are split, and the number of second-order direction point angle characteristics of all the inflection groups is 1.
4.2, If the angle variance of the three second-order direction points of the inflection point group is smaller, judging that the inflection point group is Y-shaped intersection, splitting the inflection point group into three new inflection point groups, wherein only one direction point of the original inflection point group is in each new inflection point group, otherwise, considering that the inflection point group is T-shaped intersection, splitting the original inflection point group into two new inflection point groups, wherein one inflection point group comprises the direction point with the largest angle of the two second-order direction points, and the other inflection point group only comprises one direction point.
And 4.3, judging the second-order direction point angle characteristic of the inflection point group, if the second-order direction point angle characteristic is larger than 135, merging skeleton line segments corresponding to two direction points on the inflection point group, otherwise, keeping the relative relation of the skeleton line segments unchanged. The step is iteratively performed until all inflection groups are processed, resulting in a stroke skeleton, as shown in fig. 10.
4.4 Searching the nearest edge parallel to the stroke skeleton in the Iedge, filling a blank area between the edge and the stroke skeleton, and filling a corresponding skeleton intersection area to obtain the split stroke, as shown in fig. 11.
Aiming at the defects of the prior art, the invention mainly solves two technical problems:
1. how to describe the structure information of the character more effectively.
2. How to accurately extract the strokes of the handwritten Chinese characters by fusing different types of character structure information.
The image binarization, edge detection and skeletonizing algorithms can be replaced by other algorithms of the same kind, and the selection and collocation can be carried out according to the characteristics of the actual input image.
The method can be replaced according to specific downstream tasks, for example, if the downstream tasks only need stroke transverse, the first-order and second-order direction point angles of the inflection point groups can be synthesized for screening, and finally only the strokes meeting the transverse characteristic are reserved.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims (1)

1.一种中文手写汉字的笔划提取方法,其特征在于,包括:1. A method for extracting strokes of Chinese handwritten characters, comprising: 字符图像预处理;Character image preprocessing; 像素的方向点角度特征计算;Calculation of pixel direction point angle features; 拐点团的方向点角度特征计算;Calculation of the angle characteristics of the direction points of the inflection point cluster; 基于笔划骨架和轮廓的笔划提取;Stroke extraction based on stroke skeleton and outline; 所述字符图像预处理包括:The character image preprocessing comprises: 将原始图像记为I,对输入的原始图像进行二值化,将二值化后的字符图像记为Ibw;The original image is recorded as I, the input original image is binarized, and the binarized character image is recorded as Ibw; 使用Canny算法提取二值图像Ibw中字符的边缘,将边缘图像记为Iedge;Use the Canny algorithm to extract the edges of the characters in the binary image Ibw, and record the edge image as Iedge; 使用Rosenfeld算法提取二值图像Ibw中字符的骨架,将骨架图像记为Isk;Use the Rosenfeld algorithm to extract the skeleton of the character in the binary image Ibw, and record the skeleton image as Isk; 所述像素的方向点角度特征计算包括:The direction point angle feature calculation of the pixel includes: 以当前像素为中心,邻域半径为N的正方形区域为待计算区域localRect,只保留该区域中和当前像素连通的像素;The square area with the current pixel as the center and the neighborhood radius N is the area to be calculated, localRect, and only the pixels in the area that are connected to the current pixel are retained; 其中N的取值和字符图像的大小有关,对100*100左右大小的字符图像来说,N通常取3;The value of N is related to the size of the character image. For a character image of about 100*100, N is usually 3. 删除localRect中N-1半径范围的点,所剩余的部分即为当前像素的方向点;如果多个方向点相互邻接,则只保留其中距离当前像素最远的一个点;Delete the points within the N-1 radius in localRect, and the remaining part is the direction point of the current pixel; if multiple direction points are adjacent to each other, only the point farthest from the current pixel is retained; 以当前像素为坐标原点,计算其和方向点之间连线与x轴的夹角θ,θ的计算公示如式(1)所示:Taking the current pixel as the coordinate origin, calculate the angle θ between the line connecting the current pixel and the direction point and the x-axis. The calculation formula of θ is shown in formula (1): θ=atan2(x-x0,y-y0) (1)θ=atan2(xx 0 ,yy 0 ) (1) 其中(x0,y0)为当前像素的坐标,(x,y)为当前方向点的坐标,θ的可能取值范围为[-180,180),角θ的值即为当前像素点的一阶方向点角度特征;一个像素点可能对应多个方向点;Where (x 0 ,y 0 ) is the coordinate of the current pixel, (x,y) is the coordinate of the current direction point, the possible value range of θ is [-180,180), and the value of angle θ is the first-order direction point angle feature of the current pixel; one pixel may correspond to multiple direction points; 如果某像素点方向点个数大于等于2,则可以计算任意两个方向点之间的相对角度差值δ,δ的计算公示如式(2)所示:If the number of direction points of a pixel is greater than or equal to 2, the relative angle difference δ between any two direction points can be calculated. The calculation formula of δ is shown in formula (2): 其中下标a和b分别代表同一像素点的任意两个方向点,θ为该点所对应的方向点角度,由于相对角度差值δ为相对值,因此需要将其归一化到[0,180)的角度范围中;如果一个像素点有m个方向点,那么就可以计算得到个δ值,这些δ值构成了当前像素的二阶方向点角度特征;The subscripts a and b represent any two direction points of the same pixel, and θ is the angle of the direction point corresponding to the point. Since the relative angle difference δ is a relative value, it needs to be normalized to the angle range of [0,180). If a pixel has m direction points, then it can be calculated. δ values, which constitute the second-order direction point angle features of the current pixel; 所述拐点团的方向点角度特征计算包括:The direction point angle feature calculation of the inflection point group includes: 拐点及骨架线段提取:Inflection point and skeleton segment extraction: 计算骨架图Isk上所有像素点的一阶和二阶方向点角度特征,并删除其中一阶方向点数目为0的像素点,得到新的Isk;Calculate the first-order and second-order direction point angle features of all pixels on the skeleton graph Isk, and delete the pixels whose first-order direction points are 0 to obtain the new Isk; 提取Isk中的所有拐点,判断一个像素点是否为拐点有两个条件,其一为像素的方向点数目为2且其二阶方向点角度小于145,其二为像素的方向点数目大于2;像素点满足上述两个条件之一即可被判定为拐点,所有拐点构成骨架的拐点图Isk-i,求解拐点图中的连通区域,每个连通区域即为一个拐点团;Extract all inflection points in Isk. There are two conditions to determine whether a pixel is an inflection point. One is that the number of direction points of the pixel is 2 and its second-order direction point angle is less than 145, and the other is that the number of direction points of the pixel is greater than 2. A pixel that meets one of the above two conditions can be determined as an inflection point. All inflection points constitute the inflection point graph Isk-i of the skeleton. Solve the connected areas in the inflection point graph. Each connected area is an inflection point cluster. 从Isk中减去Isk-i,得到骨架线段图Isk-l,所有像素的方向点数目均小于3且二阶方向点角度大于135;求解Isk-l中的连通区域,每个连通区域即为一条骨架线段;计算每个骨架线段的方向角φ,其值为当前骨架线段区域长轴与x轴的夹角;Subtract Isk-i from Isk to obtain the skeleton line segment graph Isk-l, where the number of direction points of all pixels is less than 3 and the second-order direction point angle is greater than 135; solve the connected areas in Isk-l, each connected area is a skeleton line segment; calculate the direction angle φ of each skeleton line segment, whose value is the angle between the major axis of the current skeleton line segment area and the x-axis; 拐点团的方向点角度特征计算:Calculation of the angle feature of the direction point of the inflection point group: 针对每一个拐点团,计算其几何中心,以拐点团上距离几何中心最近的像素作为拐点团中心;计算拐点团中所有像素与中心像素之间的距离,以最远距离为半径N构建正方形的待计算区域localRect;For each inflection point cluster, calculate its geometric center, and take the pixel on the inflection point cluster closest to the geometric center as the center of the inflection point cluster; calculate the distance between all pixels in the inflection point cluster and the central pixel, and construct a square area to be calculated localRect with the farthest distance as the radius N; 删除localRect中N-1半径范围的点,所剩余的部分即为当前拐点团的方向点;如果多个方向点相互邻接,则只保留其中距离当前拐点团中心最远的一个点;Delete the points within the N-1 radius in localRect. The remaining points are the direction points of the current inflection point group. If multiple direction points are adjacent to each other, only the point farthest from the center of the current inflection point group is retained. 针对每一个方向点,计算该方向点和当前拐点团中心连线的夹角θ,并寻找与其连通的骨架线段;如果没有骨架线段和方向点连通,则以角θ的值为当前拐点团的一阶方向点角度特征;如果有骨架线段与方向点连通,则以该骨架线段方向角φ的值为当前拐点团的一阶方向点角度特征;由于φ的取值范围为[-90,90),因此还需要根据角θ的值将其归一化至[-180,180)的范围中,归一化方法如式(3)所示:For each direction point, calculate the angle θ between the direction point and the center line of the current inflection point cluster, and find the skeleton line segment connected to it; if there is no skeleton line segment connected to the direction point, the value of angle θ is the first-order direction point angle feature of the current inflection point cluster; if there is a skeleton line segment connected to the direction point, the value of the skeleton line segment direction angle φ is the first-order direction point angle feature of the current inflection point cluster; since the value range of φ is [-90,90), it is also necessary to normalize it to the range of [-180,180) according to the value of angle θ. The normalization method is shown in formula (3): 计算拐点团中任意两个方向点之间的相对角度差值δ,δ的计算公示同式(2);如果一个拐点团有m个方向点,那么就可以计算得到个δ值,这些δ值构成了当前拐点团的二阶方向点角度特征;Calculate the relative angle difference δ between any two direction points in the inflection point cluster. The calculation formula of δ is the same as formula (2). If an inflection point cluster has m direction points, then it can be calculated δ values, which constitute the second-order direction point angle characteristics of the current inflection point group; 所述基于笔划骨架和轮廓的笔划提取包括:The stroke extraction based on the stroke skeleton and the outline comprises: 对所有方向点数目大于三的拐点团,找出其二阶方向点角度最大的两个方向点DPa和DPb,基于原中心点以及DPa和DPb构造新的拐点团,并从原拐点团中移除这两个方向点;迭代执行上述步骤,直到该拐点团的方向点数目小于等于2;执行完本步后,所有的多方向点拐点团均被拆分,且所有的拐点团的二阶方向点角度特征数目均为1;For all inflection point clusters with more than three direction points, find the two direction points DP a and DP b with the largest second-order direction point angles, construct a new inflection point cluster based on the original center point and DP a and DP b , and remove these two direction points from the original inflection point cluster; iterate the above steps until the number of direction points in the inflection point cluster is less than or equal to 2; after executing this step, all multi-directional point inflection point clusters are split, and the number of second-order direction point angle features of all inflection point clusters is 1; 对所有方向点数目等于三的拐点团,如果该拐点团的三个二阶方向点角度方差较小,则判断其为Y形交叉,此时将该拐点团拆分为三个新的拐点团,每个新的拐点团中仅有原始拐点团的一个方向点;否则认为其为T形交叉,将原始拐点团拆分为两个新的拐点团,其中一个拐点团包含了两个二阶方向点角度最大的方向点,另一个拐点团中只有一个方向点;For all inflection point clusters with three direction points, if the angle variance of the three second-order direction points of the inflection point cluster is small, it is considered to be a Y-shaped intersection. At this time, the inflection point cluster is split into three new inflection point clusters, each of which has only one direction point of the original inflection point cluster; otherwise, it is considered to be a T-shaped intersection, and the original inflection point cluster is split into two new inflection point clusters, one of which contains two direction points with the largest second-order direction point angles, and the other inflection point cluster has only one direction point; 判断拐点团的二阶方向点角度特征,如果大于135,则合并该拐点团上两个方向点所对应的骨架线段,否则保持骨架线段相对关系不变;迭代执行本步骤,直到所有拐点团都被处理完毕,得到笔划骨架;Determine the second-order direction point angle feature of the inflection point cluster. If it is greater than 135, merge the skeleton line segments corresponding to the two direction points on the inflection point cluster. Otherwise, keep the relative relationship of the skeleton line segments unchanged. Iterate this step until all inflection point clusters are processed and the stroke skeleton is obtained. 在Iedge中寻找和笔划骨架平行的最近边缘,填充边缘与笔划骨架中间的空白区域,填充对应的骨架交点区域,得到拆分后的笔划。Find the nearest edge parallel to the stroke skeleton in Iedge, fill the blank area between the edge and the stroke skeleton, fill the corresponding skeleton intersection area, and get the split stroke.
CN202210137230.7A 2022-02-15 2022-02-15 A method for extracting strokes from handwritten Chinese characters Active CN114495117B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210137230.7A CN114495117B (en) 2022-02-15 2022-02-15 A method for extracting strokes from handwritten Chinese characters

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210137230.7A CN114495117B (en) 2022-02-15 2022-02-15 A method for extracting strokes from handwritten Chinese characters

Publications (2)

Publication Number Publication Date
CN114495117A CN114495117A (en) 2022-05-13
CN114495117B true CN114495117B (en) 2025-02-14

Family

ID=81480351

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210137230.7A Active CN114495117B (en) 2022-02-15 2022-02-15 A method for extracting strokes from handwritten Chinese characters

Country Status (1)

Country Link
CN (1) CN114495117B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101604393A (en) * 2009-07-10 2009-12-16 华南理工大学 A Chinese Character Stroke Feature Extraction Method for Online Handwritten Chinese Character Recognition
CN102542264A (en) * 2011-12-22 2012-07-04 北京语言大学 Method and device for automatically evaluating right and wrong of Chinese character writing on basis of digital handwriting equipment

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1156741C (en) * 1998-04-16 2004-07-07 国际商业机器公司 Chinese handwriting identifying method and device
CN1333366C (en) * 2005-04-01 2007-08-22 清华大学 On-line hand-written Chinese characters recognition method based on statistic structural features
KR101085699B1 (en) * 2010-02-17 2011-11-23 고려대학교 산학협력단 Apparatus and method for extracting text area using text stroke width calculation
EP3295292B1 (en) * 2015-05-15 2020-09-02 MyScript System and method for superimposed handwriting recognition technology
CN112990183B (en) * 2021-05-19 2021-08-10 中国科学院自动化研究所 Method, system and device for extracting strokes of the same name in offline handwritten Chinese characters

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101604393A (en) * 2009-07-10 2009-12-16 华南理工大学 A Chinese Character Stroke Feature Extraction Method for Online Handwritten Chinese Character Recognition
CN102542264A (en) * 2011-12-22 2012-07-04 北京语言大学 Method and device for automatically evaluating right and wrong of Chinese character writing on basis of digital handwriting equipment

Also Published As

Publication number Publication date
CN114495117A (en) 2022-05-13

Similar Documents

Publication Publication Date Title
Roy et al. HMM-based Indic handwritten word recognition using zone segmentation
Bhowmik et al. Text and non-text separation in offline document images: a survey
O'Gorman The document spectrum for page layout analysis
CN103942552B (en) Character image vectorization method and system based on framework instruction
Boukharouba A new algorithm for skew correction and baseline detection based on the randomized Hough Transform
US8391613B2 (en) Statistical online character recognition
CN101957919A (en) Character recognition method based on image local feature retrieval
EP3058513B1 (en) Multi-color channel detection for note recognition and management
Al Abodi et al. An effective approach to offline Arabic handwriting recognition
CN112837332B (en) Creative design generation method, creative design generation device, terminal and storage medium
Maddouri et al. Text lines and PAWs segmentation of handwritten Arabic document by two hybrid methods
Narang et al. Drop flow method: an iterative algorithm for complete segmentation of Devanagari ancient manuscripts
Zoizou et al. MOJ-DB: A new database of Arabic historical handwriting and a novel approach for subwords extraction
Bogacz et al. Cuneiform character similarity using graph representations
CN109409211A (en) The processing method and system of Chinese character skeleton pen section
CN114495117B (en) A method for extracting strokes from handwritten Chinese characters
CN109325483A (en) The treating method and apparatus of internal short pen section
Melinda et al. Document layout analysis using multigaussian fitting
Pachpande et al. Implementation of devanagri character recognition system through pattern recognition techniques
Tian et al. Ancient Chinese character image segmentation based on interval-valued hesitant fuzzy set
Böschen et al. Formalization and preliminary evaluation of a pipeline for text extraction from infographics
Park et al. A word extraction algorithm for machine-printed documents using a 3D neighborhood graph model
Radzid et al. Framework of page segmentation for mushaf Al-Quran based on multiphase level segmentation
Naunita et al. Segmentation of Touching Characters in Handwritten Gurumukhi Script
Humied Segmentation accuracy for offline Arabic handwritten recognition based on bounding box algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant