Summary of the invention
The objective of the invention is to realize an on-line handwritten Chinese character recognition methods based on statistic structural features.This recognition methods with single on-line handwritten Chinese character character as process object, at first the character object of handling is carried out necessary pre-service, extract the statistic structural features of the fine reflection on-line handwritten Chinese character characteristics of energy then, these primitive characters that will obtain again use linear discriminant analysis LDA compressed transform to be recognition feature, discern with modified quadratic classifier MQDF at last.
The present invention consists of the following components: the extraction of pre-service, statistic structural features, eigentransformation, classifier design.
1. pre-service
Pretreated purpose is to eliminate the noise in the person's handwriting as far as possible and write distortion before identification, makes Chinese character to be known that a better recognition basis be arranged.Its task one is the noise that filtering person's handwriting collecting device and writer cause, and irregular etc. as isolated point noise, serrate noise, pen speed, main methods is the level and smooth and resampling of filtering; The 2nd, treat and know Chinese character and do shaping and handle to eliminate part and write distortion, comprise linear normalization, two functions of non-linear normalizing, make and wait to know the shared area of space of Chinese character and be mapped to a fixed-size position, and stroke is more even on space distribution after the shaping.
If the person's handwriting of an on-line handwritten Chinese character is:
P(x
1,y
1),P(x
2,y
2),…,P(x
i,y
i),(break),P(x
i+1,y
i+1),…,P(x
N,y
N)。
This is a computing machine by digitizer a series of point coordinate of arranging in regular turn from the time that the motion track of nib is sampled and obtained when writing in real time, and (break) mark represent and lifted pen and the interruption of starting to write between two natural strokes.
Remove the isolated point noise and be meant that removal is only by one or two stroke of forming from the person's handwriting point sequence.The method of filtering serrate noise is that the coordinate figure to consecutive point is weighted on average, reaches the effect of low-pass filtering.The filtering formula is:
Eliminate the irregular resampling method of pen speed and be for the track of writing with a fixed length interval resampling, make the stroke of certain-length represent with the point of some, its formula is:
In the following formula, L is the fixed sample interval, and value is a constant 1; (x
i', y
i') being N coordinate points of the stroke of waiting to sample, i satisfies 1≤i≤N and s
i≤ jL<s
I+1 Be two length between the point;
Be cumulative length, and set s
0=0; (x
j", y
j"),
The new coordinate points that obtains for resampling.
Shaping is handled need obtain the new coordinate of each person's handwriting point after conversion, calculates by the density equalization method in the present invention.At first the person's handwriting of online Chinese character is converted to Chinese character image [f (and x ", y ")]
W * H, the picture traverse before the shaping conversion is W, highly is H, any one person's handwriting point P (x
i", y
iThe corresponding black pixel point f (x of ") coordinate place
i", y
i")=1, all the other be white elephant vegetarian refreshments f (x ", y ")=0.(x "), (y ") represents picture element density projection in the horizontal and vertical directions respectively to V to U, that is:
x″=1,2,...,W
y″=1,2,...,H
Wherein, α
U, α
VFor the biasing constant, set α herein
U=α
V=6.Then former coordinate be (x ", the new coordinate of the person's handwriting point of y ") be (x ' ", y ' "):
Wherein, W ' is the maximum horizontal ordinate after handling, and H ' is the maximum ordinate after handling, and these two values are expectation values of the person's handwriting point coordinate scope after handling, need be pre-set before shaping is handled, all be made as 64 herein.
Pretreated last step is that the person's handwriting point in each natural stroke is all joined end to end in regular turn, and the point that does not overlap with former person's handwriting point on the line inserts the person's handwriting sequence, and eliminates the coincide point in the adjacent person's handwriting point.
2. the extraction of statistic structural features
The extraction of statistic structural features is to carry out on the on-line handwritten Chinese character person's handwriting after pre-service.The present invention designs and has extracted two kinds of statistic structural features by scrutinizing the architectural characteristic of on-line handwritten Chinese character, is called direction character and edge feature.
2.1 directional characteristic extraction
Directional characteristic extraction result also is divided into two kinds, is called consecutive point direction character and adjacent flex point direction character.
2.1.1 consecutive point direction character
At first calculate the direction of each person's handwriting point: in person's handwriting point coordinate sequence, appoint and get 1 P
i, except that last point, at least one follow-up some P is arranged all
j(j>i), we are from P
iPoint to P
jThe direction setting of directed line segment be P
iThe direction value θ of point
i, its codomain scope be [0 °, 360 °), as shown in Figure 3, (a) for the some P
iTo consecutive point P
I+1Direction, (b) be flex point P
iTo adjacent flex point P
jDirection, (c) be the calculating synoptic diagram of directed line segment orientation angle.When j=i+1, this direction value is called the consecutive point direction.
θ
iComputing method are, establish (X
i, Y
i) be a some P
iCoordinate, (X
j, Y
j) be a some P
jCoordinate.
Because θ
iThe triangle tan
So
Calculate the direction attribute coefficients of each person's handwriting point then.The direction attribute coefficients of so-called person's handwriting point is meant that the direction value with this point is an independent variable, utilizes trapezoidal and half trapezoidal function shown in Figure 4,4 kinds of functional values of this point that calculates:
Transverse direction attribute coefficients function
Perpendicular direction attribute coefficients function
Cast aside direction attribute coefficients function
Press down direction attribute coefficients function
Above six parameter alpha
1~ α
6Be angle threshold, their effect is a shape of determining direction attribute coefficients function, is made as respectively in the present invention: α
1=-10 °, α
2=260 °, α
3=280 °, α
4=250 °, α
5=300 °, α
6=330 °.
Obtain after the direction attribute coefficients coordinate space of person's handwriting dot image evenly being divided into K
1* K
1The height piece, as shown in Figure 5.Add up 4 kinds of direction attribute coefficients sums separately of all person's handwriting points in each sub-piece respectively, obtain K altogether
1* K
1* 4 dimensional features.(k, l) (1≤k≤K here with
1, 1≤l≤K
1) the height piece is example, 4 dimensional features that statistics obtains are respectively:
θ is some P (x, direction value y);
θ is some P (x, direction value y);
θ is some P (x, direction value y);
θ is some P (x, direction value y);
2.1.2 adjacent flex point direction character
When person's handwriting trembleed, the calculating of consecutive point direction can produce bigger deviation, so we have also designed adjacent flex point direction, promptly Pi and Pj was set at flex point adjacent in the person's handwriting point, recomputated the direction of each person's handwriting point.So-called flex point is meant that the direction front and back that stroke is write change violent point, also are set at a kind of flex point to the stroke end points simultaneously.Flex point is the normal root basic skills of approaching according to polygon really: calculate in the stroke cosine value of subtended angle between each point and consecutive point earlier.The judgement of flex point is that the cosine value of working as subtended angle γ maximal value occurs and, is made as-0.8 greater than setting threshold, and this moment, γ was about 2.5 radians.
The cosine value of subtended angle γ can utilize the triangle cosine law to calculate.If a, b, c are respectively leg-of-mutton three limits that the adjacent person's handwriting point with front and back of current person's handwriting point constitutes.Subtended angle γ is limit a, and the angle of b, c are the opposite side of subtended angle γ, calculates the length on three limits respectively according to the coordinate of triangular apex earlier, can be tried to achieve by the cosine law
As shown in Figure 6.
Point Pi and Pj, j>i is a flex point adjacent in the person's handwriting point, all comprise that the direction of the person's handwriting point between these 2 that Pi is ordered all is set at the directed line segment direction of pointing to some Pj from a Pi.
Recomputate each person's handwriting and put the direction attribute coefficients of adjacent flex point and add up 4 kinds of direction attribute coefficients sums in the sub-piece of each spatial division, obtain other K
1* K
1* 4 dimensional features.
Direction character is the merging of these two kinds of features, total K
1* K
1* 8 dimensional features.
2.2 the extraction of edge feature
Edge feature and direction character difference are that edge feature can reflect the peripheral structural information of Chinese character preferably.
With direction from left to right is example, and the method for extracting edge feature is: the left-half space of pretreated online Chinese character handwriting corresponding image equidistantly is divided into K
2Individual horizontal subregion is shown in Fig. 7 (a).In each subregion, from the direction of arrow, promptly the image left hand edge is turned right and is lined by line scan.If during the i time line scanning, scan certain coordinate points first when being person's handwriting point, calculate 4 consecutive point direction attribute coefficients of this person's handwriting point, remember and be f
I, 1 (h), f
I, 1 (s), f
I, 1 (p), f
I, 1 (n)If, never scan the person's handwriting point, then these 4 coefficients are 0; Continue scanning, when scanning once more that certain coordinate points is person's handwriting point in addition, calculate the consecutive point direction attribute coefficients of this person's handwriting point, remember and be f
I, 2 (h), f
I, 2 (s), f
I, 2 (p), f
I, 2 (n), same, if never scan the person's handwriting point once more, then these 4 coefficients are 0.Until i line scanning finishes, add up the above coefficient that each row obtains respectively, obtain 8 dimensional features:
K
2Sub regions obtains K altogether
2* 8 dimension edge features.
From all the other 7 directions of arrow, promptly right, upper and lower three edges in addition and diagonal repeat above method, and shown in Fig. 7 (b), the direction of arrow is space five equilibrium and direction of scanning, obtains K altogether
2The edge feature of * 8 * 8 dimensions.
After merging, direction character and edge feature obtain the statistic structural features V of a complete on-line handwritten Chinese character.
3. eigentransformation
The primitive character dimension that the front extraction obtains is not under the very sufficient situation than higher at sample number, can cause computation complexity to increase and reduction sorter performance.So, before primitive character is delivered to sorter, also need it is carried out eigentransformation, the conversion of higher-dimension primitive character is compressed to low dimensional feature space.The present invention adopts linear discriminant analysis technology LDA to carry out eigentransformation.If { { V
i (j), 1≤i≤N
j, 1≤j≤C} is the original feature vector set, V in the formula
i (j)Expression belongs to the original feature vector of i sample extraction of j classification, N
jThe number of samples of representing j classification, C are represented the classification number.Each classification is represented a Chinese character in the Chinese characters of the national standard set.Calculate the average of each classification and the average of all categories with following formula:
Divergence matrix S in the compute classes then
wWith the between class scatter matrix S
b:
We choose | (S
b+ S
w)/S
w| as optimizing criterion, promptly ask for matrix of a linear transformation A, make
Maximum.
Transformation matrix A is that n * m ties up matrix, and n is the primitive character dimension, and the intrinsic dimensionality after the setting conversion is m.The acquiring method of transformation matrix is as follows: we are to matrix S
w -1(S
b+ S
w) carry out eigenwert and proper vector decomposition, obtain eigenwert { γ
i, i=1,2 ..., n}, eigenwert big or small descending sort according to value, and proper vector ξ
i, i=1,2 ..., n.Form matrix A=[ξ with preceding m proper vector
1, ξ
2..., ξ
m], then A meets the matrix of a linear transformation that requires previously.
The formula of feature selecting is as follows:
Y=A
T·V
In the following formula, V is the prototype structure proper vector, and Y is through the proper vector after the conversion.
4. classifier design
The present invention has used the modified quadratic classifier MQDF at Gauss model.Here introduce standard quadratic classifier QDF earlier.The decision function of QDF is:
In the following formula, Y is the proper vector of input, and m is an intrinsic dimensionality, μ
jRepresent the mean vector of j classification, ζ
i (j)Be i proper vector of the covariance matrix of j classification, λ
i (j)Be i eigenwert of the covariance matrix of j classification.When input Y is discerned, classify with following criterion:
Y is classified as i classification, if
C is the classification number in the formula
In actual applications, since inaccurate to the estimation of little eigenwert, cause the performance of QDF to descend.Estimate inaccurate adverse effect to classification performance for reducing little eigenwert, we use improved quadratic classifier MQDF.MQDF replaces with pre-determined constant too small eigenwert, and its discriminant function is as follows:
j=1.2……,C
In the following formula, k is the positive integer less than m, and λ is a constant.K and λ are empirical parameter, are determined by experiment.At a minute time-like, input Y is divided into and makes g
j(Y) get the classification of minimum value.
The invention is characterized in that it is a kind of on-line handwritten Chinese character recognition methods based on statistic structural features.It contains following steps successively:
(1) the on-line handwritten Chinese character person's handwriting to input carries out pre-service.
The person's handwriting of supposing an on-line handwritten Chinese character is: P (x
1, y
1), P (x
2, y
2) ..., P (x
i, y
i), (break), P (x
I+1, y
I+1) ..., P (x
N, y
N).Carry out following pre-service successively.
(1.1) remove the isolated point noise.
From the person's handwriting point sequence, only remove by one or two stroke of forming.
(1.2) filtering serrate noise.
Be weighted on average with the coordinate figure of following formula, reach the effect of low-pass filtering consecutive point:
(1.3) eliminate the irregular resampling of pen speed.
Adopt the following formula resampling for the track of writing with a fixed length interval, make the stroke of certain-length represent with the point of some:
In the following formula, L is the fixed sample interval, and value is a constant 1; (x
i', y
i') being N coordinate points of the stroke of waiting to sample, i satisfies 1≤i≤N and s
i≤ jL<s
I+1 Be two length between the point;
Be cumulative length, and set s
0=0; (x
j", y
j"),
The new coordinate points that obtains for resampling.
(1.4) handle with the shaping of density equalization method.
At first the person's handwriting of online Chinese character is converted to Chinese character image [f (and x ", y ")]
W * H, picture traverse is W, highly is H, any one person's handwriting point P (x
i", y
iThe corresponding black pixel point f (x of ") coordinate place
i", y
i")=1, all the other be white elephant vegetarian refreshments f (x ", y ")=0.In the horizontal and vertical directions density projection U of computed image (x "), V (y "):
x″=1,2,...,W
y″=1,2,...,H
Wherein, α
U, α
VFor the biasing constant, set α herein
U=α
V=6.Then former coordinate be (x ", the new coordinate of the person's handwriting point of y ") is (x , y ):
Wherein, the maximum horizontal ordinate after shaping is handled is W ', and maximum ordinate is H '.
(1.5) interpolation and deletion coincide point.
Person's handwriting point in each natural stroke is all joined end to end in regular turn, and the point that does not overlap with former person's handwriting point on the line inserts the person's handwriting sequence, and eliminates the coincide point in the adjacent person's handwriting point.
(2) extract statistic structural features
Extract direction character and edge feature on the on-line handwritten Chinese character person's handwriting after pre-service, merge into original statistic structural features.Its extracting method is as follows respectively:
(2.1) extract direction character
Direction character is consecutive point direction character and the directional characteristic merging of adjacent flex point.These two kinds of Feature Extraction steps are as follows:
(2.1.1) extract the consecutive point direction character
(a) at first calculate the consecutive point direction of all the person's handwriting points except that last point: from P
iPoint to P
I+1The direction θ of directed line segment
i, its codomain scope be [0 °, 360 °).It is invalid that the direction of last point is made as.
(b) press the direction value θ of following formula then according to each person's handwriting point
jCalculate 4 kinds of direction attribute coefficients of this point:
Transverse direction attribute coefficients function
Perpendicular direction attribute coefficients function
Cast aside direction attribute coefficients function
Press down direction attribute coefficients function
Six parameter alpha
1~ α
6Be angle threshold, their effect is a shape of determining direction attribute coefficients function, is made as respectively in the present invention: α
1=-10 °, α
2=260 °, α
31=280 °, α
4=250 °, α
5=300 °, α
6=330 °.
(c) the person's handwriting point coordinate is taken up space evenly be divided into K
1* K
1The height piece, 4 kinds of direction attribute coefficients sums of adding up all person's handwriting points in each sub-piece respectively.With (k, l), 1≤k≤K
1, 1≤l≤K
1The height piece is an example, and 4 dimensional features that statistics obtains are respectively:
θ is some P (x, direction value y);
θ is some P (x, direction value y);
θ is some P (x, direction value y);
θ is some P (x, direction value y);
Obtain K altogether
1* K
1* 4 dimensional feature consecutive point direction characters.
(2.1.2) extract adjacent flex point direction character
The method of approaching with polygon is determined the flex point in the person's handwriting, and flex point is to change violent point before and after the stroke direction of writing, and comprises the stroke flex point, calculates in the stroke cosine value of subtended angle between each point and consecutive point earlier;
The cosine value of subtended angle γ can utilize the triangle cosine law to calculate, if a, b, c is respectively leg-of-mutton three limits that the adjacent person's handwriting point with front and back of current person's handwriting point constitutes, subtended angle γ is limit a, and the angle of b, c are the opposite side of subtended angle γ, earlier calculate the length on three limits respectively, can try to achieve by the cosine law according to the coordinate of triangular apex
The judgement of flex point is that the cosine value of working as subtended angle γ maximal value occurs and, is made as-0.8 greater than setting threshold, and this moment, γ was about 2.5 radians; The stroke end points also is set at a kind of flex point.
Calculate the adjacent flex point direction of each person's handwriting point: P sets up an office
iAnd P
j, j>i is a flex point adjacent in the person's handwriting point,
All comprise P
iPoint all is set at from a P in the direction of the interior person's handwriting point between these 2
iPoint to some P
jThe directed line segment direction.
(b) that repeat in (2.1.1) (c) two goes on foot, and obtains K
1* K
1The adjacent flex point direction character of * 4 dimensions.
(2.2) extract edge feature
At first extract the edge feature of a left side → right scanning: the left-half space of pretreated online Chinese character handwriting corresponding image equidistantly is divided into K
2Individual horizontal subregion is shown in Fig. 7 (a); Line by line scan from the direction of arrow (being that the image left hand edge is turned right).If during the i time line scanning, scan certain coordinate points for the first time when being person's handwriting point, calculate 4 consecutive point direction attribute coefficients of this person's handwriting point, remember and be f
I, 1 (h), f
I, 1 (s), f
I, 1 (p), f
I, 1 (n)If, never scan the person's handwriting point, then these 4 coefficients are 0; Continue scanning,, calculate the consecutive point direction attribute coefficients of this person's handwriting point, remember and be f when scanning for the second time certain coordinate points when being person's handwriting point
I, 2 (h), f
I, 2 (s), f
I, 2 (p), f
I, 2 (n), same, if never for the second time scan the person's handwriting point, then these 4 coefficients are 0.Line scanning finishes, and adds up the above coefficient that each row obtains respectively, obtains 8 dimensional features:
K
2Sub regions obtains K altogether
2* 8 dimension edge features.
Then from right, upper and lower three edges and four oblique line direction of scanning in addition,, repeat above step then, obtain K altogether as Fig. 7 (b)
2The edge feature of * 8 * 8 dimensions.
(3) eigentransformation
Extract recognition feature with linear discriminant analysis LDA from original statistic structural features, to improve characteristic distribution, improve recognition performance, it contains following steps successively:
(3.1) calculate the average μ of each classification with following formula
jAnd the average μ of all categories:
Wherein, V
i (j)Be the original feature vector of i sample extraction belonging to j classification, N
jThe number of samples of representing j classification, C are represented the classification number.
(3.2) with divergence matrix S in the following formula compute classes
wWith the between class scatter matrix S
b:
(3.3) to matrix S
w -1(S
b+ S
w) carry out eigenwert and proper vector decomposition, obtain eigenwert γ by the big or small descending sort of eigenwert
i, i=1,2 ..., n and proper vector ξ
i, i=1,2 ..., n.
(3.4) form matrix of a linear transformation A=[ξ with preceding m proper vector
1, ξ
2..., ξ
m].
(3.5) with the proper vector Y behind primitive character V and the transformation matrix A computational transformation:
Y=A
T·V
4) carry out on-line handwritten Chinese character identification with the MQDF sorter.
Discern with the MQDF sorter and to comprise two parts: at first will generate the identification library file by gathering good sample training in advance according to the recognition feature that obtains previously; Could utilize the identification storehouse that reality input sample to be known is discerned then.
(4.1) training process:
(4.1.1) at first to each classification j,, add up its average μ with following formula according to the m dimension recognition feature that obtains previously
jWith the covariance matrix ∑
j:
Wherein, Y
i (j)Be the recognition feature vector of i sample extraction belonging to j classification, N
jThe number of samples of representing j classification.
(4.1.2) to the covariance matrix ∑ of each classification
jCarry out eigenwert and proper vector and decompose, obtain the eigenvalue of big or small descending sort according to value
i (j), i=1,2 ..., m and proper vector ξ
i (j), i=l, 2 ..., m
(4.1.3) substitution value of the little eigenwert of calculating:
Wherein, k is the positive integer less than m, is determined by experiment.
(4.1.4) the λ that obtains previously
i (j), j=1,2 ..., C, i=1,2 ..., k, ζ
i (j), j=1,2 ..., C, i=1,2 ..., m, μ
j, j=1,2 ..., C and λ store in the identification library file, use for follow-up identification.
(4.2) identifying:
(4.2.1) obtain recognition feature Y, calculate the decision function g of each classification with following formula by sample to be known
j(Y):
Wherein, the same training process of the value of m, k.
(4.2.2) Shu Ru sample to be known is divided into and makes g
j(Y) get the classification of minimum value.
Experiment showed, that average recognition rate of the present invention is 98.43%, reaches gratifying effect.
Embodiment
Realization at first will obtain discerning the storehouse by training during based on the on-line handwritten Chinese character recognition system of statistic structural features, just can discern the on-line handwritten Chinese character character according to the identification storehouse then.Thereby the realization of practical on-line handwritten Chinese character recognition system based on statistic structural features need be considered the realization of training process and two aspects of realization of identifying, and its system constitutes as shown in Figure 1.It is identical in these two processes the section processes content being arranged.
Below the detailed various piece of introducing system:
A. the realization of training process
A.1 pre-service
Pretreatment process as shown in Figure 2.The person's handwriting of supposing an on-line handwritten Chinese character is: P (x
1, y
1), P (x
2, y
2) ..., P (x
i, y
i), (break), P (x
I+1, y
I+1) ..., P (x
N, y
N).
At first to remove the isolated point noise, from the person's handwriting point sequence, only remove by one or two stroke of forming.
Coordinate figure to consecutive point is weighted on average then, filtering serrate noise, and the filtering formula is:
The purpose of resampling is that the elimination pen speed is irregular.Its method be to the track write with a fixed length interval resampling, make the stroke of certain-length represent with the point of some, the sampling formula is:
In the following formula, L is the fixed sample interval, and value is a constant 1; (x
i', y
i') being N coordinate points of the stroke of waiting to sample, i satisfies 1≤i≤N and s
i≤ jL<s
i+ 1;
Be two length between the point;
Be cumulative length, and set s
0=0; (x
j", y
j"),
The new coordinate points that obtains for resampling.
The task that shaping is handled is to eliminate to wait that the part of knowing Chinese character writes distortion, comprise linear normalization, two functions of non-linear normalizing, make and wait to know the shared area of space of Chinese character and be mapped to a fixed-size position, and stroke is more even on space distribution.Shaping is handled each person's handwriting point transformation of back to new coordinate, and transformation for mula calculates by the density equalization method: at first the person's handwriting of online Chinese character is converted to Chinese character image [f (and x ", y ")]
W * H, picture traverse is W, highly is H, any one person's handwriting point P (x
i", y
iThe corresponding black pixel point f (x of ") coordinate place
i", y
i")=1, all the other be white elephant vegetarian refreshments f (x ", y ")=0.(x "), (y ") represents picture element density projection in the horizontal and vertical directions respectively to V to H, that is:
x″=1,2,...,W
y″=1,2,...,H
Wherein, α
U, α
VFor the biasing constant, set α among the present invention
U=α
V=6.Then former coordinate be (x ", the new coordinate of the person's handwriting point of y ") be (x ' ", y ' "):
Wherein, W ' is the maximum horizontal ordinate after handling, and H ' is the maximum ordinate after handling.Set W '=H '=64 among the present invention.
A.2 the extraction of statistic structural features
This step is the feature that extracts the architectural characteristic that is fit to on-line handwritten Chinese character on the basis of the on-line handwritten Chinese character person's handwriting after pre-service.Among the present invention the design and extracted two kinds of statistic structural features, be called direction character and edge feature.
A.2.1 directional characteristic extraction
Direction character is to be merged by consecutive point direction character and these two kinds of features of adjacent flex point direction character to form.
The directional characteristic extracting method of consecutive point is as follows:
1) at first calculates the consecutive point direction of all the person's handwriting points except that last point: from current some P
iPoint to following some P
I+1The direction θ of directed line segment
i, its codomain scope be [0 °, 360 °).It is invalid that the direction of last point is made as.
2) according to the direction value θ of each person's handwriting point
iBe calculated as follows 4 kinds of direction attribute coefficients of this point:
Transverse direction attribute coefficients function
Perpendicular direction attribute coefficients function
Cast aside direction attribute coefficients function
Press down direction attribute coefficients function
Six parameter alpha
1~ α
6Be angle threshold, their effect is a shape of determining direction attribute coefficients function, is made as respectively in the present invention: α
1=-10 °, α
2=260 °, α
31=280 °, α
4=250 °, α
5=300 °, α
6=330 °.
3) the person's handwriting point coordinate is taken up space evenly be divided into K
1* K
1The height piece, 4 kinds of direction attribute coefficients sums of adding up all person's handwriting points in each sub-piece respectively.With (k, l) (1≤k≤K
1, 1≤l≤K
1) the height piece is example, 4 dimensional features that statistics obtains are respectively:
θ is some P (x, direction value y)
θ is some P (x, direction value y)
θ is some P (x, direction value y)
θ is some P (x, direction value y)
In the present invention, K
1=8, so the consecutive point direction character has 8 * 8 * 4=256 dimension.
The directional characteristic extracting method of adjacent flex point is as follows:
The method of approaching with polygon is determined the flex point in the person's handwriting, and flex point is to change violent point before and after the stroke direction of writing, and comprises the stroke flex point, calculates in the stroke cosine value of subtended angle between each point and consecutive point earlier;
The cosine value of subtended angle γ can utilize the triangle cosine law to calculate, if α, b, c is respectively leg-of-mutton three limits that the adjacent person's handwriting point with front and back of current person's handwriting point constitutes, subtended angle γ is limit α, and the angle of b, c are the opposite side of subtended angle γ, earlier calculate the length on three limits respectively, can try to achieve by the cosine law according to the coordinate of triangular apex
The judgement of flex point is that the cosine value of working as subtended angle γ maximal value occurs and, is made as-0.8 greater than setting threshold, and this moment, γ was about 2.5 radians; The stroke end points also is set at a kind of flex point.
Calculate the adjacent flex point direction of each person's handwriting point: P sets up an office
iAnd P
j, j>i is a flex point adjacent in the person's handwriting point, all comprise P
iPoint all is set at from a P in the direction of the interior person's handwriting point between these 2
iPoint to some P
jThe directed line segment direction.
In (2) (3) two steps in the consecutive point direction character extracting method above repeating, obtain the 256 adjacent flex point direction characters of tieing up.
Consecutive point direction character and adjacent flex point direction character are merged into the direction character of 512 dimensions.
A.2.2 the extraction of edge feature
Edge feature and direction character difference are that edge feature can reflect the peripheral structural information of Chinese character preferably.The method of extracting edge feature is as follows:
At first extract the from left to right edge feature of direction of scanning: the left-half in pretreated online Chinese character handwriting corresponding image space equidistantly is divided into K
2Individual horizontal subregion is shown in Fig. 7 (a).In each subregion, from the direction of arrow, promptly the image left hand edge is turned right, and lines by line scan.If during the i time line scanning, scan certain coordinate points for the first time when being person's handwriting point, calculate 4 consecutive point direction attribute coefficients of this person's handwriting point, remember and be f
I, 1 (h), f
I, 1 (s), f
I, 1 (p), f
I, 1 (n)If, never scan the person's handwriting point, then these 4 coefficients are 0; Continue scanning, when scanning for the second time certain coordinate points when being person's handwriting point, the consecutive point direction attribute coefficients of this person's handwriting point that accumulative total runs into is remembered and is f
I, 2 (h), f
I, 2 (s), f
I, 2 (p), f
I, 2 (n), same, if never for the second time scan the person's handwriting point, then these 4 coefficients are 0.Line scanning finishes, and adds up the above coefficient that each row obtains respectively, obtains 8 dimensional features:
K
2Sub regions obtains K altogether
2* 8 dimension edge features.
Repeat above method from right, upper and lower three edges in addition and 4 oblique line direction of scanning then, obtain K altogether
2The edge feature of * 8 * 8 dimensions.
In the present invention, K
2=8, edge feature has 512 dimensions.
After merging, direction character and edge feature obtain 1024 complete dimension on-line handwritten Chinese character statistic structural features.
A.3 eigentransformation
The flow process of eigentransformation has adopted linear discriminant analysis technology LDA method as shown in Figure 8, by asking for transformation matrix A, primitive character is carried out the conversion compression, obtains final recognition feature.
The concrete steps of eigentransformation are as follows:
1) at first calculate the average of each classification and the average of all categories:
2) divergence matrix S in the compute classes then
wWith the between class scatter matrix S
b:
3) to matrix S
w -1(S
b+ S
w) carry out eigenwert and proper vector decomposition, obtain eigenwert { γ
i, i=1,2 ..., n}, eigenwert big or small descending sort according to value, and proper vector ξ
i, i=1,2 ..., n.Form matrix A=[ξ with preceding m proper vector
1, ξ
2..., ξ
256], then A is exactly the matrix of a linear transformation that will ask for.In the present invention, m gets 128.
This transformation matrix A need store in the file, uses for the eigentransformation of identifying.
4) obtain transformation matrix A after, can ask for final feature, transformation for mula is:
Y=A
T·V。
A.4 train the MQDF sorter
M dimension recognition feature Y according to obtaining, add up its average and covariance matrix to each classification with following formula:
Wherein, Y
i (j)The proper vector of representing i training sample extraction of j classification, N
jBe the training sample number of j classification, μ
jThe average of representing j classification, ∑
jThe covariance matrix of representing j classification.
Covariance matrix to each classification carries out eigenwert and proper vector decomposition, obtains eigenvalue
i (j), i=1,2 ..., m, eigenwert big or small descending sort and proper vector ζ according to value
i (j), i=1,2 .., m, λ
i (j)Be i the eigenwert of ∑ j, ζ
i (j)It is ∑
jI proper vector.
We calculate parameter lambda in the MQDF sorter, the substitution value of promptly little eigenwert with following formula:
In the following formula, k is the positive integer less than m, and in the present invention, k gets 32, and C represents the classification number.
Above parameter lambda
i (j), j=1,2 ..., C, i=1,2 ..., k, ζ
i (j), j=1,2 ..., C, i=1,2 ..., m, μ
j, j=1,2 ..., C, λ store in the identification library file, use for identifying.So just finished the training process of MQDF sorter.
B. the realization of identifying
Identifying as shown in Figure 1.The same with training process, identifying also needs at first to carry out pre-service, extracts then and obtains original statistic structural features V.
When carrying out the LDA eigentransformation, the transformation matrix A that identifying directly adopts training process to provide obtains recognition feature vector Y=A
TV.
When discerning with the MQDF sorter, all relevant classifier parameters read from the identification library file that training process provides.The decision function of MQDF sorter is:
j=1.2……,C
Calculate the g of each classification during identification with following formula
j(Y), classifying rules is as follows:
Y is classified as i classification, if
C is the classification number in the formula
For verifying validity of the present invention, we have carried out following experiment:
Training sample set uses 1000 cover GB Chinese characters of level 2's word collection samples and 400 cover GBK word collection samples, and other 60 cover GB Chinese characters of level 2's word collection samples and 30 cover GBK word collection samples are tested in GBK word collection identification range as test sample book.Above sample is the on-line handwritten Chinese character of Free Writing.In the training and identifying of on-line handwritten Chinese character recognition system, the setting in the embodiment that sees above of concrete parameter value.
Experimental result is as follows:
|
6763 Chinese characters of GB Chinese characters of level 2 word collection, 60 covers are totally 405,780 samples |
14240 Chinese characters of GBK Chinese Character collection, 30 covers are totally 427,200 samples |
Comprehensive average |
The test discrimination |
99.30% |
98.17% |
98.43% |
Data can be found out from table, on-line handwritten Chinese character recognition methods based on statistic structural features all reaches very high recognition performance under two kinds of different identification ranges, recognition speed is to reach for 35.27 word/seconds on the computing machine of PentiumIV-1.7GHz in dominant frequency, can satisfy practical needs fully.
In sum, on-line handwritten Chinese character recognition methods and recognition system that the present invention proposes based on statistic structural features, can discern the on-line handwritten Chinese character of Free Writing, and the experiment proved that recognition correct rate and the reliability that reaches high, have very application prospects.