Consult shown in Figure 1, the handwritten character recognition system of no stroke order of the present invention is primarily aimed at the user and touches the character script of being write on the button handwriting pad by a writing pencil one and carry out identification, because in writing process, writing speed may vary with each individual, and the sample rate of input media (as signal digitalized device) in order to the record character script also has nothing in common with each other, therefore, write the influence of speed for elimination, the present invention any two continuity point (X1 on to hand-written character script, Y1), (X2, when Y2) sampling, must carry out pre-service earlier, consult Fig. 2-1,2-2,2-3, shown in 2-4 and the 2-5, judge with mat whether the distance D between this two continuity point satisfies following condition:
L1<D<L2 or second are the last point of this stroke, and wherein, L1, L2 are the normal value range of distance, if satisfy this condition, second is effectively, otherwise, remove second point, and continue sampling.In addition, when the present invention implements in reality, be the convenience of calculating, its distance D may be defined as:
D=|X2-X1|+|Y2-Y1|
Wherein constant value range L1, L2 are within the specific limits.
The present invention still needs and carries out smoothing processing when hand-written character script is carried out aforementioned sample again, removes the burr of each stroke, consult again shown in Fig. 2-1,2-2,2-3,2-4 and the 2-5, to obtain the sample point coordinate value after smoothing processing:
Xn=C0
*Xn+C1
*X(n-1)+………
Yn=C0
*Yn+C1
*Y(n-1)+………
Wherein, 1=C0+C1+ ..., in preferred embodiment of the present invention, smoothing processing is to adopt two adjacent continuity points to be calculated, and obtains the sample point coordinate value after smoothing processing, that is:
Xn=a
*Xn+b
*X(n-1)
Yn=a
*Yn+b
*Y(n-1)
Wherein a, b are in 0 to 1 scope.
The present invention is directed to hand-written character script finish the sampling and pre-service after, consult again shown in Fig. 2-1,2-2,2-3,2-4 and the 2-5, to go out eigenwert at character and stroke extraction thereof again, consult shown in Figure 1 again, and the eigenwert that is extracted must be able to give full expression to out the shape of character and stroke thereof, to make the present invention when discerning, need not the sequential write with reference to stroke, can obtain good identification result, wherein these eigenwerts comprise:
1, the stroke number of character;
2, the starting point of each stroke of character, coordinate figure and the direction code thereof of destination node on X, Y-axis coordinate plane; Wherein direction code is meant on rectangular coordinate system, is the center with the true origin, according to certain interval angle, character is separated into the subregion of different directions, and is encoded one by one.As shown in Figure 3, this direction code is in order to the formed vector coding of any two continuity points in the definition character stroke, as: the direction code of stroke starting point, destination node is in order to definition 2 vector codings that calculated in front and back according to stroke;
3, the coordinate figure of each stroke of character is at the number of the local minimum of X, Y-axis coordinate, maximal point and these local minimums, the maximal point angle changing value for the coordinate figure of X, Y-axis coordinate and these local minimums, maximal point position, as shown in Figure 4.
When the present invention implemented in reality, the stroke of being sampled can be with following symbolic representation:
HXMAX ... be meant the peaked stroke of stroke initial point tool on X
HYMAX ... be meant the peaked stroke of stroke initial point tool on Y
HXMIN ... be meant the stroke of stroke initial point tool minimum value on X
HYMIN ... be meant the stroke of stroke initial point tool minimum value on Y
TXMAX ... be meant that the stroke destination node has peaked stroke at the X coordinate
TYMAX ... be meant that the stroke destination node has peaked stroke at the Y coordinate
UP ... be meant the peaked stroke of Y coordinate tool in the stroke sampled point
LEFT ... be meant the stroke of X coordinate tool minimum value in the stroke sampled point
DOWN ... be meant the stroke of Y coordinate tool minimum value in the stroke sampled point
RIGHT ... be meant the peaked stroke of X coordinate tool in the stroke sampled point
4, the intersection point between each stroke of character itself or different strokes is at the coordinate figure of X, Y-axis coordinate and the angle changing value at these position of intersecting point places.
In addition, the deflection of the basic hand-written character of the present invention is very big, so must consider the ambiguity of hand-written character feature.Press, general statistical classification and structure analysis method are that all characters are adopted unified determination step, and this kind decision procedure will cause bigger negative effect to the discrimination power of identification system.Therefore, the present invention is to adopt fuzzy mathematics theory to the character decision procedure, but because the program of fuzzy quantity is to be difficult to an expression way of determining, especially at above-mentioned eigenwert, so the present invention carries out in the decision table mode the differentiation of character feature value, and the ambiguity of character feature represented with the weights form this weights size is determined when the design dictionary.
The present invention consults shown in Fig. 2-1 in the decision process of hand-written character feature again, except that being differentiated according to above-mentioned eigenwert, also comprises the differentiation that concerns between the unique point of being sampled:
1, the position of unique point is differentiated: be in order to describe the relative position of unique point in whole character, character can be become M * N sub regions in the shared area dividing of X-Y plane among the present invention, to judge the subregion that unique point is affiliated then.When reality is implemented, because the characteristic of western language character can adopt $ * 4 to divide, as shown in Figure 5.
2, the relativeness of unique point is differentiated: this relativeness comprises position relative relation and time relativeness, wherein position relative relation is a position relation of describing upper left, lower-left between two unique points, upper right, bottom right, and the time relativeness then is the sequencing of describing with writing time between two eigenwerts in the unicursal.
3, the angle of eigenwert changes differentiation: the angle variation of unique point is meant a bit preceding and formed vector of unique point of unique point, and the angle between any formed vector after unique point and the unique point, its discriminant approach can be and is greater than or less than a certain constant value range.
The present invention is when carrying out identification at hand-written character script, need earlier to determine these eigenwerts in each stroke at each standard character, and these eigenwerts are complied with affiliated character stroke sort, make these eigenwerts of all strokes of each standard character constitute a stroke feature rule list, stroke feature rule list with all standard characters is stored in the character rule base of a dictionary again, shown in the upright Fig. 2-1 of ginseng, make the present invention when carrying out the identification judgement at hand-written character script, can be in this character rule base, take out the stroke feature rule list of a character representative, and discern with the eigenwert of the hand-written character person's handwriting of being sampled one by one and mate according to each eigenwert of this feature rule list, if carry out repeatedly, can realize not having the character recognition of stroke order.
See also again shown in Fig. 2-1, when the present invention carries out identification to the feature of hand-written character, be earlier in the hand-written character handwriting characteristic value of being sampled, extract the eigenwert (as: stroke number and shared scope etc.) of whole word, with according to these eigenwerts of being extracted, with be stored in character in this character rule base coupling that circulates, find out the character that the match is successful, make follow-up eigenwert carry out identification and coupling at various strokes; Then, the present invention is again according to the eigenwert that is stored in each each stroke of character in this character rule base, aforementioned eigenwert at each stroke in the hand-written character person's handwriting of being sampled is carried out the identification of thin portion, standard character in circulation this hand-written character of coupling and this character rule base, consult shown in Fig. 2-2,2-3 and the 2-4, when the present invention finds that hand-written character to be identified does not pass through the character rule base identification of this dictionary, promptly represent the identification character of non-this dictionary gained of character to be identified of handwriting input; Otherwise, if identification coupling by this dictionary, then the present invention can be by each eigenwert of all being finished gained after the identification in advance to gather, make the similarity of representing the standard character stored in hand-written character to be identified and this dictionary by these weights that gather, that is its ambiguity, it is big more that this gathers weights, and its similarity is more little.
Now be can clearer expression technical characterictic of the present invention and design concept, the spy enumerates specific embodiment, describes the method for expressing of character rule base in detail.
In one embodiment of the invention, with capitalization G is example, as shown in Figure 6, its character is as follows in the data structure of the stroke feature rule list of the character rule base of dictionary: 1, character, and----------------------------------------angle of the relativeness unique point of the position feature point of----33, unique point changes for G2, stroke number
00 04, the stroke point of crossing>=<Count
10 10 10 05-1 HYMAX
Count: > = < DicCount
XLlaclMinCount .. 10 .. 1
XLlaclMaxCount .. 10 .. 0
YLlaclMinCount 10 10 10 0
YLlaclMaxCount 10 10 10 0
SegmentCount 10 10 .. 3
DirectionCode: 0 1 2 3 4 5 6 7
Head .. .. 12 10 10 10 .. ..
Tail 10 10 10 .. .. .. .. ..5-2 TYMAX:
Count: > = < DicCount
XlocalMinCount .. 10 .. 0
XLocalMaxCount .. 10 .. 0
YLocalMinCount 10 10 10 0
YLocalMaxCount 10 10 10 0
SegmentCount 10 10 10 4
DirectionCode: 0 1 2 3 4 5 6 7
Head 10 10 .. .. .. .. .. 10
Tail 10 10 .. .. .. .. .. 105-3 HYMAIN:
Count: > = < DicCount
XlocalMinCount 10 10 10 0
XLocalMaxCount 10 10 10 0
YLocalMinCount .. 10 .. 0
YLocalMaxCount .. 10 .. 0
SegmentCount .. 10 10 4
DirectionCode: 0 1 2 3 4 5 6 7
Head .. .. .. .. .. 10 10 10
Tail .. .. .. .. .. 10 10 105-4 HYMAX:
Count: > = < DicCount
XlocalMinCount .. 10 .. 1
XLocalMaxCount .. 10 .. 0
YLocalMinCount 10 10 10 0
YLocalMaxCount 10 10 10 0
SegmentCount 10 10 .. 3
DirectionCode: 0 1 2 3 4 5 6 7
Head .. .. 12 10 10 10 .. ..
Tail 10 10 10 .. .. .. .. ..
6、Position?Limit:HEAD?TAIL?XMIN?XMAX?YMIN?YAMX?IX?IY
7、Position?Relatiion?Limit
8、Angle?Change?Limit
The meaning of every representative is as follows in the data structure of this character rule base:
1, the character of the 1st this rule base representative of expression, so touch the character script of being write on the button handwriting pad one if the present invention is directed to the user, carry out identification with the rule that this stroke feature rule list in the dictionary is write down, and when the match is successful, then this character was the identification result of being exported.
2, the stroke number of the 2nd expression character, if the stroke of character script to be identified is not inconsistent therewith, then this character promptly can't mate by identification.
3, the projects such as position, relativeness and angle variation of the unique point in the 6th, 7,8 are zero in the rule base of the 3rd expression dictionary.
4, the stroke number of hits of the 4th expression character, if the stroke number of hits of character to be identified is not inconsistent therewith, then this character can't mate by identification.
5, the 5th eigenwert of representing stroke in the character.The present invention is when carrying out identification, can be according to system requirements, first stroke starting point at character to be identified has peaked stroke at the Y coordinate and samples, and then it is carried out attribute differentiate, wherein when this stroke when the number of the local minimum point of X coordinate equals 1, its weights are 10-10=0; When this stroke when the number of the local minimum point of X coordinate is greater than or less than 1, its weights are infinitely great, promptly represent the identification coupling that this stroke can't be by dictionary, continue to differentiate in regular turn this stroke then in eigenwerts such as a little bigger number of the local pole of X coordinate and stroke hop counts.At last, differentiate the starting point (Head) of this stroke, the direction of destination node (Tail) again, the direction code of wherein working as the starting point of this stroke is 0,1,6,7 o'clock, its weights are infinitely great, represent that this stroke can't be by the identification coupling of these eigenwerts in the stroke feature rule list in the dictionary; Sign indicating number is 3,4,5 o'clock when the side of the starting point of this stroke, and its weights are 10-10=0; When the direction code of the starting point of this stroke was 2, its weights were 112-10=2; Similarly, differentiate the direction code of the destination node of this stroke again.When the present invention treats the identification character after the peaked stroke of tool is finished differentiation on the Y coordinate, this stroke can be numbered the first stroke, and repetition aforementioned process, other that treat the identification character in regular turn protected a stroke (TYMAX, HYMIN, TYMIN, HXMAX, TXMAX, HXMIN, TXMIN, UP, LEFT, DOWN, RIGHT) and sampled and differentiate, and numbered one by one.
In another embodiment of the present invention, be to be example with lowercase a, as shown in Figure 7, the data structure of the rule base of its character in dictionary represented as follows: 1, character, and--------------------------------------------angle of the relativeness unique point of the position feature point of-----13, unique point changes for a2, stroke number
61 14, the stroke point of crossing>=<Count
10 10 10 05-1 HXMAX
Count: > = < DicCount
XLlaclMinCount .. 10 10 2
XLlaclMaxCount .. 10 10 2
YLlaclMinCount .. 10 10 2
YLlaclMaxCount .. 10 . 2
SegmentCount 10 10 10 4
DirectionCode: 0 1 2 3 4 5 6 7
Head .. 10 10 10 10 .. .. ..
Tail 10 10 .. .. .. .. 10 10
6、Position?Limit:HEAD?TAIL?XMIN?XMAX?YMIN?YMAX?IX?IY
Point stroke number number type weights
1 1 HEAD ..
Position 11 10 10 10
12 10 10 10
.. 12 11 11
.. .. .. ..
.. .. .. ..
.. .. .. ..
Point stroke number number type weights
1 2 HEAD 10
Position .. .. .. ..
.. .. 13 13
.. .. 12 12
.. .. 11 11
.. .. 10 10
.. .. 10 10
Point stroke number number type weights
1 1 YMIN ..
Position .. .. .. ..
.. .. .. ..
.. .. .. ..
.. .. .. ..
10 10 10 11
10 10 10 11
Point stroke number number type weights
1 2 YMIN 10
Position .. .. .. ..
.. .. .. ..
.. .. .. ..
.. .. .. ..
10 10 10 10
10 10 10 10
Point stroke number number type weights
1 1 YMAX ..
Position 10 10 10 10
10 10 10 10
.. .. .. ..
.. .. .. ..
.. .. .. ..
.. .. .. ..
Point stroke number number type weights
1 2 YMAX ..
Position .. 10 10 10
.. 10 10 10
.. 10 10 10
.. 10 10 10
.. .. .. ..
.. .. .. ..
7, characteristic point position
Point 1: stroke number number type weights
1 1 XMIN ..
Point 2: stroke number number type weights
Before and after 12 XMIN, the 10 relation bottom rights, upper left upper right lower-left
.. 10 .. 10 10 ..
8, the unique point angle changes
Point 2: stroke number number type weights
1 1 XMIN ..
Condition:><value (angle)
.. 10 90
The meaning of every representative is as follows in this data structure:
1, the character of the 1st this rule base representative of expression, so touch the character script of being write on the button handwriting pad one if the present invention is directed to the user, carry out identification with the rule that this stroke feature rule list in the dictionary is write down, and when the match is successful, then this character was the identification result of output.
2, the stroke number of the 2nd expression character, if the stroke of character script to be identified is not inconsistent therewith, then this character promptly can't mate by identification.
3, the test condition of the projects such as position, relativeness and angle variation of the unique point in the 6th, 7,8 is respectively 6,1,2 in the rule base of the 3rd expression dictionary.
4, the stroke number of hits of the 4th expression character, if the stroke number of hits of character to be identified is not inconsistent therewith, then this character can't mate by identification.
5, the 5th eigenwert of representing stroke in the character.The present invention can be according to system requirements when carrying out identification, and first stroke starting point at character to be identified has peaked stroke at the Y coordinate and samples.And then it is carried out attribute differentiate, and have after peaked stroke differentiates in finishing at the Y coordinate, this stroke is numbered the first stroke, and repetition said process, other stroke (TYMAX, HYMIN, TYMIN, HXMAX, TXMAX, HXMIN, TXMIN, UP, LEFT, DOWN, RIGHT) for the treatment of the identification character is in regular turn sampled and is differentiated, and is numbered one by one.
6, the position of the 6th representation feature point.Unique point is that the starting point for the treatment of in first stroke of identification character is sampled, this character region then is divided into 24 sub regions with 6 * 4, when this position mark was the subregion of " .. ", its weights were infinitely great, represented that this character can't be by the identification coupling of dictionary; When this point is positioned at when being labeled as 10 subregion, its weights are 10-10=0; When this point is positioned at when being labeled as 11 subregion, its weights are 11-10=1; When this point is positioned at when being labeled as 12 subregion, its weights are 12-10=2; So in regular turn 6 conditions are differentiated.
7, the relativeness of two unique points of the 7th expression.First unique point is for extracting first minimal point that the first stroke is drawn in the X coordinate.Second unique point is to extract second minimal point that the first stroke is drawn in the X coordinate; When second unique point on the upper left side of first unique point or during the lower left, its weights are infinitely great, represent the identification coupling by dictionary of this character; When second unique point in the upper right side of first unique point or during the lower right, its weights are 10-10=0.When first unique point write out earlier, its weights were 10-10=0; When writing out after first unique point, its weights are infinitely great, represent the not identification by dictionary of this character.
8, the angle of the 8th representation feature point changes.Be characterized in that the first stroke is drawn in first minimal point of X coordinate, when the angle variable quantity at this some place was spent greater than 90, its weights were infinitely great, represented the not character by dictionary of this character; When the angle variable quantity at this some place was spent greater than 90, its weights were 10-10=0.
The above is preferred embodiment of the present invention, but, the interest field that the present invention advocated is not limited thereto, and is familiar with the personage of this technology in every case, according to disclosed technology contents, the equivalence that can obtain easily changes, and all should belong to and not break away from hold in range of the present invention.