CN108764195A - Handwriting model training method, hand-written character recognizing method, device, equipment and medium - Google Patents
Handwriting model training method, hand-written character recognizing method, device, equipment and medium Download PDFInfo
- Publication number
- CN108764195A CN108764195A CN201810564062.3A CN201810564062A CN108764195A CN 108764195 A CN108764195 A CN 108764195A CN 201810564062 A CN201810564062 A CN 201810564062A CN 108764195 A CN108764195 A CN 108764195A
- Authority
- CN
- China
- Prior art keywords
- word
- chinese
- identification model
- train sample
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/22—Character recognition characterised by the type of writing
- G06V30/226—Character recognition characterised by the type of writing of cursive writing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Character Discrimination (AREA)
Abstract
The invention discloses a kind of handwriting model training method, hand-written character recognizing method, device, equipment and media.The handwriting model training method includes:Obtain specification Chinese word train sample, initialize convolutional neural networks, the specification Chinese word train sample is input in convolutional neural networks and is trained, weights and the biasing that convolutional neural networks are updated using the Back Propagation Algorithm based on stochastic gradient descent, obtain word identification model in specification;Obtain and use non-standard Chinese word train sample, training to obtain adjustment Chinese handwritten word identification model;It obtains and Chinese characters to be tested is used originally to obtain error word train sample;Based on the Back Propagation Algorithm that batch gradient declines, the weights using error word train Sample Refreshment Chinese handwritten word identification model and biasing obtain target Chinese handwritten word identification model.Using the handwriting model training method, the high target Chinese handwritten word identification model of identification handwritten word discrimination can be obtained.
Description
Technical field
The present invention relates to middle field of character recognition more particularly to a kind of handwriting model training method, hand-written character recognizing method,
Device, equipment and medium.
Background technology
The recognition methods of traditional handwriting word includes binary conversion treatment, Character segmentation, feature extraction and support vector machines etc. mostly
Step is identified, and using the recognition methods of traditional handwriting word when identifying more hasty and careless non-standard word (handwritten Chinese word), knows
Other accuracy is not high so that its recognition effect is undesirable.The recognition methods of traditional handwriting word can only largely identify specification
Word, to when various handwritten words are identified in real life, accuracy rate is relatively low.
Invention content
A kind of handwriting model training method of offer of the embodiment of the present invention, device, equipment and medium, to solve current handwritten word
The not high problem of recognition accuracy.
A kind of handwriting model training method, including:
The pixel value tag of each middle word in pending Chinese word train sample is obtained using optical character recognition technology
Matrix;
Pixel value eigenmatrix based on each middle word in pending Chinese word train sample obtains word in specification and instructs
Practice sample;
Initialize convolutional neural networks;
The specification Chinese word train sample is input in convolutional neural networks and is trained, using based on stochastic gradient
The weights of the Back Propagation Algorithm update convolutional neural networks of decline and biasing, obtain word identification model in specification;
Non-standard Chinese word train sample is obtained, the non-standard Chinese word train sample is input to the specification Chinese
It is trained in word identification model, Text region in the specification is updated using the Back Propagation Algorithm based on stochastic gradient descent
The weights of model and biasing obtain adjustment Chinese handwritten word identification model;
Chinese characters sheet to be tested is obtained, the Chinese to be tested is identified using the adjustment Chinese handwritten word identification model
Printed words sheet, obtain that recognition result and legitimate reading be not inconsistent goes out wrongly written character, using it is all it is described go out wrongly written character as the word train sample that malfunctions;
The error word train sample is input in the adjustment Chinese handwritten word identification model and is trained, using base
Weights in the Back Propagation Algorithm update adjustment Chinese handwritten word identification model that batch gradient declines and biasing, obtain in target
Literary handwritten word identification model.
A kind of handwriting model training device, including:
Pixel value eigenmatrix acquisition module, for obtaining pending Chinese word train sample using optical character recognition technology
The pixel value eigenmatrix of each middle word in this;
Specification Chinese word train sample acquisition module, for based on each middle word in pending Chinese word train sample
Pixel value eigenmatrix obtains specification Chinese word train sample;
Initialization module, for initializing convolutional neural networks;
Word identification model acquisition module in specification, for the specification Chinese word train sample to be input to convolutional Neural
Be trained in network, using based on stochastic gradient descent Back Propagation Algorithm update convolutional neural networks weights and partially
It sets, obtains word identification model in specification;
Adjusting Chinese handwritten word identification model acquisition module will be described non-for obtaining non-standard Chinese word train sample
Specification Chinese word train sample is input in the specification to be trained in word identification model, using based on stochastic gradient descent
Back Propagation Algorithm update the weights of word identification model and biasing in the specification, obtain adjustment Chinese handwritten word and identify mould
Type;
Malfunction word train sample acquisition module, for obtaining Chinese characters sheet to be tested, using the adjustment Chinese handwritten
Word identification model identifies the Chinese characters sheet to be tested, obtains recognition result and goes out wrongly written character with what legitimate reading was not inconsistent, all
It is described go out wrongly written character as error word train sample;
Target Chinese handwritten word identification model acquisition module, for the error word train sample to be input to the adjustment
It is trained in Chinese handwritten word identification model, using the Chinese hand of Back Propagation Algorithm update adjustment declined based on batch gradient
The weights for identification model of writing and biasing obtain target Chinese handwritten word identification model.
The embodiment of the present invention also provides a kind of hand-written character recognizing method, device, equipment and medium, to solve current handwritten word
The not high problem of recognition accuracy.
A kind of hand-written character recognizing method, including:
Middle word to be identified is obtained, the middle word to be identified is identified using target Chinese handwritten word identification model, is obtained
Output valve of the middle word to be identified in the target Chinese handwritten word identification model;The target Chinese handwritten word identification
Model is got using the handwriting model training method;
Destination probability output valve is obtained according to the output valve and the semantic dictionary of preset Chinese, is based on the destination probability
Output valve obtains the recognition result of the middle word to be identified.
The embodiment of the present invention provides a kind of hand-written character distinguishing device, including:
Output valve acquisition module identifies institute for obtaining middle word to be identified using target Chinese handwritten word identification model
Middle word to be identified is stated, output valve of the middle word to be identified in the target Chinese handwritten word identification model is obtained;Institute
Target Chinese handwritten word identification model is stated to get using the handwriting model training method;
Recognition result acquisition module, it is defeated for obtaining destination probability according to the output valve and the semantic dictionary of preset Chinese
Go out value, the recognition result of the middle word to be identified is obtained based on the destination probability output valve.
A kind of computer equipment, including memory, processor and be stored in the memory and can be in the processing
The computer program run on device, the processor realize above-mentioned handwriting model training method when executing the computer program
Step.
A kind of computer equipment, including memory, processor and be stored in the memory and can be in the processing
The computer program run on device, the processor realize the step of above-mentioned hand-written character recognizing method when executing the computer program
Suddenly.
The embodiment of the present invention provides a kind of computer readable storage medium, and the computer-readable recording medium storage has meter
The step of calculation machine program, the computer program realizes above-mentioned handwriting model training method when being executed by processor.
The embodiment of the present invention provides a kind of computer readable storage medium, and the computer-readable recording medium storage has meter
The step of calculation machine program, the computer program realizes above-mentioned hand-written character recognizing method when being executed by processor.
In handwriting model training method, device, equipment and medium that the embodiment of the present invention is provided, known using optical character
The pixel value eigenmatrix of each middle word in the pending Chinese word train sample of other technical limit spacing, based on pending middle word instruction
The pixel value eigenmatrix for practicing each middle word in sample obtains specification Chinese word train sample, the specification Chinese word train sample
It can be by computer Direct Recognition and reading.Then convolutional neural networks are initialized, the training effect for improving neural network is conducive to
Rate.Then the Back Propagation Algorithm based on stochastic gradient descent, code requirement Chinese word train sample training simultaneously obtain in specification
Text region model, word identification model has identification specification Chinese handwritten word ability in the specification.Then it is based on boarding steps again
The Back Propagation Algorithm declined is spent, the update by word in non-standard to being adjusted property of word identification model in specification makes
The adjustment Chinese handwritten word identification model obtained after must updating passes through training under the premise of having criterion of identification standardized characters ability
Newer mode learns the further feature of handwritten Chinese word so that adjustment Chinese handwritten word identification model can preferably identify hand
Write middle word.Then Chinese characters sheet to be tested is identified using adjustment Chinese handwritten word identification model, obtain recognition result with it is true
What real result was not consistent go out wrongly written character, and using it is all go out wrongly written character be input to adjustment Chinese handwritten word identification as error word train sample
It is trained update in model, mould is identified using the Back Propagation Algorithm update adjustment Chinese handwritten word declined based on batch gradient
The weights of type and biasing obtain target Chinese handwritten word identification model.Knowledge can be advanced optimized using error word train sample
Other accuracy rate can further reduce the influence of the excessive study and undue weakening that are generated when training pattern.Word in specification
Identification model and adjustment Chinese handwritten word identification model in training using the Back Propagation Algorithm based on stochastic gradient,
Still there are preferable training effectiveness and training effect in the case that training samples number is more.Target Chinese handwritten word identification model exists
Using the Back Propagation Algorithm declined based on batch gradient when training, it can ensure the abundant update to Model Parameter,
Parameter update is comprehensively carried out according to the error of generation, improves the recognition accuracy of acquired model.
It is in hand-written character recognizing method, device, equipment and medium that the embodiment of the present invention is provided, middle word to be identified is defeated
Enter into target Chinese handwritten word identification model and be identified, and the semantic dictionary of preset Chinese is combined to obtain recognition result.It adopts
When Chinese handwritten word being identified with the target Chinese handwritten word identification model, accurate recognition result can be obtained.
Description of the drawings
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below by institute in the description to the embodiment of the present invention
Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the present invention
Example, for those of ordinary skill in the art, without having to pay creative labor, can also be according to these attached drawings
Obtain other attached drawings.
Fig. 1 is an applied environment figure of handwriting model training method in one embodiment of the invention;
Fig. 2 is a flow chart of handwriting model training method in one embodiment of the invention;
Fig. 3 is a particular flow sheet of step S20 in Fig. 2;
Fig. 4 is a particular flow sheet of step S40 in Fig. 2;
Fig. 5 is a particular flow sheet of step S60 in Fig. 2;
Fig. 6 is a schematic diagram of handwriting model training device in one embodiment of the invention;
Fig. 7 is a flow chart of hand-written character recognizing method in one embodiment of the invention;
Fig. 8 is a schematic diagram of hand-written character distinguishing device in one embodiment of the invention;
Fig. 9 is a schematic diagram of one embodiment of the invention Computer equipment.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation describes, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair
Embodiment in bright, the every other implementation that those of ordinary skill in the art are obtained without creative efforts
Example, shall fall within the protection scope of the present invention.
Fig. 1 shows the application environment of handwriting model training method provided in an embodiment of the present invention.The handwriting model training side
The application environment of method includes server-side and client, wherein is attached by network between server-side and client, client
It is the equipment that can carry out human-computer interaction with user, the including but not limited to equipment such as computer, smart mobile phone and tablet, server-side is specific
It can be realized with the server cluster that independent server or multiple servers form.Hand-written mould provided in an embodiment of the present invention
Type training method is applied to server-side.
As shown in Fig. 2, Fig. 2 shows a flow chart of handwriting model training method in the embodiment of the present invention, the handwriting model
Training method includes the following steps:
S10:The pixel value of each middle word in pending Chinese word train sample is obtained using optical character recognition technology
Eigenmatrix.
Wherein, optical character recognition technology (Optical Character Recognition, abbreviation OCR) refers to that will scheme
As upper word is converted into the editable word content of computer, pending Chinese word train sample refers to initial acquisition, not
Processed training sample.Pixel value eigenmatrix is indicated i.e. using pixel value as feature by the way of matrix
Matrix.
In the present embodiment, using OCR technique, the word on image is positioned, is divided and the operations such as feature extraction, is obtained
The pixel value eigenmatrix of each middle word in pending Chinese word train sample is taken, pixel value eigenmatrix can be by computer
It directly reads and identifies, can will extract the pixel value tag of pending Chinese word train sample, and be indicated using matrix.
S20:Pixel value eigenmatrix based on each middle word in pending Chinese word train sample obtains specification Chinese
Word train sample.
Wherein, specification Chinese word train sample refers to (such as belonging to regular script, the Song typeface or lishu font according to standard criterion word
Word, general font selection regular script or the Song typeface) acquired in training sample, the word in pending Chinese word train sample belongs to
Standard criterion word.
In the present embodiment, based on the pixel value eigenmatrix of each middle word in pending Chinese word train sample, obtain
To the specification Chinese word train sample for training convolutional neural networks, to improve the efficiency of network training.Word in the specification
Training sample is from the standard criterion word acquisition for belonging to the Chinese fonts such as regular script, the Song typeface or lishu, with Song in the present embodiment
It is illustrated for body.It is to be appreciated that standard criterion word here refers to the word for belonging to mainstream font in current Chinese font,
As computer equipment input method in the default font Song typeface word, be usually used in copy mainstream font regular script word etc.;And picture
The word of the word of the word such as rapid style of writing of the fewer Chinese font used, children's circle, then be not included in the standard criterion word in daily life
Range.
S30:Initialize convolutional neural networks.
In one embodiment, convolutional neural networks are initialized, including:The weights that convolutional neural networks initialize are enabled to meet public
FormulaWherein, nlIndicate that the number of samples of the training sample inputted at l layers, S () indicate variance operation,
WlIndicate l layers of weights,Indicate arbitrary, l indicates l layers in convolutional neural networks.
Wherein, convolutional neural networks (Convolutional Neural Network, abbreviation CNN) are a kind of feed forward neurals
Network, its artificial neuron can respond the surrounding cells in a part of coverage area, can carry out video procession.
Convolutional neural networks and the main distinction of general deep neural network (Deep Neural Networks, abbreviation DNN) are
Convolutional neural networks include convolutional layer and pond layer, this for convolutional neural networks the image with word can be carried out processing and
Identification provides important technical support.
Convolutional neural networks include between each layer the weights of each neuron connection and biasing, these weights and biasing determine
Determine the recognition effect of convolutional neural networks.
In the present embodiment, initialize convolutional neural networks, the initialization operation i.e. be arranged convolutional neural networks in weights and
The initial value of biasing.Specifically, if ClConvolutional neural networks in l layers of convolution, the C known to the property of convolutional neural networksl
=Wlxl+bl, wherein WlIndicate l layers of weights, xlIndicate the training sample for initialization of l layers of input, blIndicate the
L layers of biasing.Then ClVariance can acquire as S (Cl)=nlS(Wlxl), wherein S () indicates variance operation, nlIt indicates in l
The number of samples of the training sample of layer input.When convolutional neural networks are trained, the mean value of weights is excessive may to lead to ladder
Spend it is big, the case where and can not effectively finding the minimum of error function, therefore be set as weights W to meet mean value 0 herein, then on
State ClVariance expression formula can further be written as S (Cl)=nlS(Wl)E((xl)2), wherein E () indicates mathematic expectaion operation.
Particularly, the activation primitive that convolutional layer uses in convolutional neural networks is ReLU (Rectified Linear
Unit, Chinese are line rectification function), linear unit is also known as corrected, is commonly activated in a kind of artificial neural network
Function is generally referred to using ramp function and its mutation as the nonlinear function of representative.X can be obtained by activation primitive ReLUl=
ReLU(Cl-1) andThe two formulas are updated to above-mentioned ClVariance expression formula S (Cl)=nlS
(Wl)E((xl)2), it obtainsVariance should be kept consistent as possible when convolutional neural networks are trained,
Just will not lead to variance in the training process becomes increasing or smaller and smaller, and gradient is caused to restrain too fast or excessively slow, from
And the problem that the speed that occurs effectively finding the minimum of error function or training is excessively slow.Therefore, in order to make variance keep
Unanimously, by above formulaIt is found that weights should meet Indicate arbitrary, then root
The weights of convolutional neural networks can be correspondingly set according to the formula.It is biased in and could be provided as smaller value when initial setting up, such as
It is arranged between section [- 0.3,0.3].
Reasonably initialization convolutional neural networks, which can make network in the early stage, more flexible adjustment capability, can be in training
Network is effectively adjusted in the process, the minimum of error function can be fast and effeciently found, be conducive to convolutional Neural
The update and adjustment of network so that the underway literary handwritten word of model that model training acquisition is carried out based on convolutional neural networks is known
Has accurate recognition effect when other.
S40:Specification Chinese word train sample is input in convolutional neural networks and is trained, using based on stochastic gradient
The weights of the Back Propagation Algorithm update convolutional neural networks of decline and biasing, obtain word identification model in specification.
Wherein, stochastic gradient descent (Stochastic Gradient Descent, abbreviation SGD) is in update network ginseng
When number, the error that each training sample generates in the training process is obtained, and repeatedly at random using single sample in training process
The error of middle generation, the processing method that network parameter is updated.Back Propagation Algorithm (Back Propagation, referred to as
BP algorithm) it is a kind of training and learning method in neural network learning, for adjusting the power in neural network between each node
Value and biasing.Using Back Propagation Algorithm in neural network weights and biasing be adjusted when need to find out error function
Minimum, and in the present embodiment, the minimum of error function specifically uses the processing method of stochastic gradient descent to find out.
In the present embodiment, specification Chinese word train sample is input in convolutional neural networks and is trained, using based on
The weights of the Back Propagation Algorithm update convolutional neural networks of stochastic gradient descent and biasing, obtain Text region mould in specification
Type.Word identification model has learnt the further feature of specification Chinese word train sample in the training process in the specification so that should
Model can accurately identify standard criterion word, have the recognition capability to standard criterion word.It should be noted that nothing
By specification Chinese word train sample using the corresponding standard criterion word of other Chinese fonts such as regular script, the Song typeface or lishu, by
In these standard criterion words in the level of Character Font Recognition difference and little, therefore word identification model can be to pattern in the specification
The corresponding standard criterion word of the fonts such as body, the Song typeface or lishu is accurately identified, accurate recognition result is obtained.
S50:Non-standard Chinese word train sample is obtained, non-standard Chinese word train sample, which is input to word in specification, to be known
It is trained in other model, using the power of word identification model in the Back Propagation Algorithm update specification based on stochastic gradient descent
Value and biasing obtain adjustment Chinese handwritten word identification model.
Wherein, non-standard Chinese word train sample refers to the training sample acquired in handwritten Chinese word, during this is hand-written
Word can be specifically to pass through handwriting mode according to the type posture of the corresponding standard criterion word of the fonts such as regular script, the Song typeface or lishu
Obtained word.It is to be appreciated that the non-standard Chinese word train sample is with specification Chinese word train sample, difference lies in non-rule
Model Chinese word train sample is acquired in handwritten Chinese word, since being hand-written, certainly just comprising various different
Type posture.
In the present embodiment, server-side obtains non-standard Chinese word train sample, which includes handwritten Chinese word
Feature, non-standard Chinese word train sample is input in specification in word identification model and is trained and adjusts, using base
The weights of word identification model and biasing, obtain the Chinese hand of adjustment in the Back Propagation Algorithm update specification of stochastic gradient descent
It writes identification model.It is to be appreciated that word identification model has an ability of word in criterion of identification specification in specification, but
There is no higher identification accuracies when handwritten Chinese word is identified.Therefore the present embodiment uses non-standard Chinese word train
Sample is trained, and allows specification Chinese handwritten word identification model on the basis of existing criterion of identification standardized characters, in model
Parameter (weights and biasing) is adjusted, and obtains adjustment Chinese handwritten word identification model.The adjustment Chinese handwritten word identification model
Learn the further feature of handwritten Chinese word on the basis of script criterion of identification standardized characters so that the adjustment Chinese handwritten word identifies
The models coupling further feature of standard criterion word and handwritten Chinese word, can simultaneously to standard criterion word and handwritten Chinese word into
The effective identification of row, obtains the higher recognition result of accuracy rate.
Convolutional neural networks are to be judged according to the pixel distribution of word, in real life when carrying out word identification
Handwritten Chinese word and standard criterion word there are difference, but this difference compares the difference for not corresponding to standard criterion word with other
It is much smaller, for example, " I " of handwritten Chinese word and " I " of standard criterion word in pixel distribution there are difference, but it is this
Difference is obviously much smaller compared to the difference between handwritten Chinese word " you " and standard criterion word " I ".It may be considered that i.e.
Make between handwritten Chinese word and corresponding standard criterion word there are certain difference, but this difference and the mark that is not correspond to
The difference of quasi- standardized characters is much smaller, therefore, can determine recognition result by the principle of most like (i.e. difference is minimum).In adjustment
Literary handwritten word identification model is being trained by convolutional neural networks, the models coupling standard criterion word and handwritten Chinese word
Further feature can effectively identify handwritten Chinese word according to the further feature.
For step S40 and S50, error-duration model update is carried out using the Back Propagation Algorithm based on stochastic gradient descent,
It can still be smoothed out model training in the case where the quantity of training sample is more huge, the efficiency of network training can be improved
And effect so that training is more effective.
It should be noted that the sequence of the step S40 and step S50 of the present embodiment can not exchange, step is first carried out
S40 executes step S50 again.First code requirement Chinese training sample training convolutional neural networks can make word in the specification of acquisition
Identification model possesses preferable recognition capability, it is made to have accurate recognition result to standard criterion word.Possessing good identification
The fine tuning of step S50 is carried out on the basis of ability again so that the adjustment Chinese handwritten word identification model that training obtains being capable of basis
The further feature of the handwritten Chinese word learnt effectively identifies handwritten Chinese word, it is made to have the identification of handwritten Chinese word
More accurate recognition result.If first carrying out step S50 or only carrying out step S50, since handwritten Chinese word has various shapes
State directly can not preferably be reflected the feature of handwritten Chinese word using the feature that handwritten Chinese word train learns, can make one
Start model and go to school " bad ", later how to be adjusted again is caused also to be difficult to so that having accurate identification to the identification of handwritten Chinese word
As a result.Although everyone handwritten Chinese word is different, extreme portions are all similar to standard criterion word (in such as hand-written
Word imitates standard criterion word).Therefore, model training is carried out according to standard criterion word at the beginning and more meets objective circumstances, than
The effect that model training is directly carried out to handwritten Chinese word is more preferable, can be adjusted correspondingly under the model of " good ", obtains
The high adjustment Chinese handwritten word identification model of handwritten Chinese word discrimination.
S60:Chinese characters sheet to be tested is obtained, Chinese characters to be tested are identified using adjustment Chinese handwritten word identification model
This, obtain that recognition result and legitimate reading be not inconsistent goes out wrongly written character, using it is all go out wrongly written character as the word train sample that malfunctions.
Wherein, Chinese characters to be tested originally refer to acquired in standard criterion word and handwritten Chinese word for test
Standard criterion word in training sample, the step accepted standard standardized characters and step S40 for training is identical (because such as
Each word corresponding to the fonts such as regular script, the Song typeface uniquely determines);The handwritten Chinese word of use with and step S50 in use
It can be different that (the hand-written middle word of different people is not exactly the same, handwritten Chinese word institute in trained handwritten Chinese word
Corresponding each word can correspond to multiple fonts form, for the non-standard Chinese word train sample with step S50 for training
The case where distinguishing, avoiding model training over-fitting, the general step use the handwritten Chinese word different from step S50).
In the present embodiment, it is used for trained adjustment Chinese handwritten word identification model to identify Chinese characters sheet to be tested,
During the Chinese characters to be tested originally include standard criterion word and its label value (i.e. legitimate reading) pre-set and are hand-written
Word and its label value pre-set.Standard standardized characters and handwritten Chinese word can be defeated by the way of mixing when training
Enter to adjustment Chinese handwritten word identification model.Chinese handwritten word identification model is adjusted to this progress of Chinese characters to be tested using
When identification, corresponding recognition result will be got, recognition result and label value (legitimate reading) be not consistent it is all go out wrongly written character
As error word train sample.Word handwriting recognition model still has accuracy of identification in error word train sample reflection adjustment
Insufficient problem, subsequently further to be updated according to the error word train sample, to optimize and revise Chinese handwritten word identification model.
Since the accuracy of identification of adjustment Chinese handwritten word identification model is substantially subjected to specification Chinese word train sample and non-
The joint effect of specification Chinese word train sample, first code requirement Chinese word train Sample Refreshment network parameter (weights and partially
Set), then using under the premise of non-standard Chinese word train Sample Refreshment network parameter (weights and biasing), can cause to get
Adjust the feature of Chinese handwritten word identification model overlearning non-standard Chinese word train sample so that the adjustment Chinese hand of acquisition
Identification model of writing possesses very high accuracy of identification, but mistake to non-standard Chinese word train sample (including handwritten Chinese word)
Degree learns the feature of the non-standard Chinese characters sheet, influences the handwritten Chinese word in addition to the non-standard Chinese word train sample
Accuracy of identification, therefore, using Chinese characters to be tested, adjustment Chinese handwritten word identification model is identified in this to step S60, energy
Enough overlearnings for largely eliminating the non-standard Chinese word train sample that the when of training uses.I.e. by adjusting Chinese handwritten
Word identification model identifies Chinese characters sheet to be tested, and to find out the error generated due to overlearning, which specifically can be with
It is reflected by going out wrongly written character, therefore wrongly written character can be gone out according to this and further update, optimize and revise Chinese handwritten word identification mould
The network parameter of type.
S70:Will error word train sample be input to adjustment Chinese handwritten word identification model in is trained, using based on batch
Weights and the biasing for measuring the Back Propagation Algorithm update adjustment Chinese handwritten word identification model that gradient declines, obtain target Chinese hand
It writes identification model.
In the present embodiment, error word train sample is input in adjustment Chinese handwritten word identification model and is trained, it should
Error word train sample is reflected when training adjusts Chinese handwritten word identification model, since word is instructed in overlearning non-standard
The feature for practicing sample leads to adjust Chinese handwritten word identification model in identifying hand-written other than non-standard Chinese word train sample
What is occurred when word identifies inaccurate problem.Also, since first code requirement Chinese word train sample uses in non-standard again
The reason of word training sample training pattern, the feature for the standard criterion word that meeting undue weakening originally learnt, this can influence model
" frame " that standard criterion word is identified initially built.Excessive can be well solved using error word train sample
It the problem of habit and undue weakening, can be according to the problem on the identification accuracy of error word train sample reflection, largely
The adverse effect that the overlearning and undue weakening generated in upper elimination script training process is brought.Specifically, using going out wrongly written character
Using the Back Propagation Algorithm declined based on batch gradient when training sample is trained, according in algorithm update adjustment
The weights of literary handwritten word identification model and biasing obtain target Chinese handwritten word identification model, target Chinese handwritten word identification
Model refers to finally training the model that can be used to identify Chinese handwritten word come.When updating network parameter, malfunction word train
The sample size of sample is less (it is less to go out wrongly written character), using the Back Propagation Algorithm declined based on batch gradient can by it is all go out
The error that wrongly written character training sample is generated when convolutional neural networks are trained all carries out anti-pass update, ensures all errors generated all
Network can be adjusted and be updated, can comprehensively training convolutional neural networks, improve target Chinese handwritten word identification model
Recognition accuracy.
It should be noted that in the present embodiment, step S40 and S50 is using the backward biography based on stochastic gradient descent
Broadcast algorithm;Step S70 is using the Back Propagation Algorithm declined based on batch gradient.
In step S40, using based on stochastic gradient descent Back Propagation Algorithm update convolutional neural networks weights and
The process of biasing specifically comprises the following steps:
Obtain the corresponding binaryzation pixel value tag of each training sample (each word) in specification Chinese word train sample
Matrix, it is defeated each corresponding forward direction is obtained in each binaryzation pixel value eigenmatrix stochastic inputs to convolutional neural networks
Go out, calculate each forward direction output and the error between corresponding label value (legitimate reading), often obtains the i.e. corresponding progress of an error
The backpropagation that one subgradient declines, updates weights and the biasing of network.It repeats each error of above-mentioned calculating and uses each mistake
The process of the weights and biasing of difference update network stops iteration threshold ε until error is less than1When, terminate the cycle, is updated
Good weights and biasing are to get to word identification model in specification.
Weights and biasing of the step S50 using the Back Propagation Algorithm update convolutional neural networks based on stochastic gradient descent
Process it is similar to the process of step S40, details are not described herein.
In step S70, using the Back Propagation Algorithm that is declined based on batch gradient update convolutional neural networks weights and
The process of biasing specifically comprises the following steps:
The corresponding binaryzation pixel value eigenmatrix of a training sample in error word train sample is obtained, the two-value
Change pixel value eigenmatrix is input in adjustment Chinese handwritten word identification model (substantially or a convolutional neural networks) and obtains
To preceding to output, the preceding error between output and legitimate reading is calculated, obtains and sequentially inputs remaining training sample pair
In the binaryzation pixel value eigenmatrix answered to adjustment Chinese handwritten word identification model, corresponding forward direction output and true knot are calculated
Error between fruit, and add up error is adjusted Chinese handwritten word identification model for the overall error for the word train sample that malfunctions, and adopts
The backpropagation once declined based on gradient is carried out with overall error, updates weights and the biasing of network, above-mentioned calculating is repeated and always misses
The process of weights and biasing poor and using overall error update network, stops iteration threshold ε until error is less than2When, terminate this
Cycle, obtains weights update and biases to get to target Chinese handwritten word identification model.
It is to be appreciated that for step S40 and S50, due to carrying out model training used by training sample quantity compared with
To be huge, the efficiency and effect or even nothing of network training will be influenced according to the Back Propagation Algorithm declined based on batch gradient
Method is normally carried out model training, it is difficult to training be effectively performed.It is carried out using the Back Propagation Algorithm based on stochastic gradient descent
Error-duration model updates the efficiency and effect that can improve network training so that training is more effective.
It is less (it is less to go out wrongly written character) for the sample size of step S70, the word train sample that malfunctions, using based on batch gradient
The Back Propagation Algorithm of decline can by all errors that be generated when convolutional neural networks are trained of error word train samples all into
Row anti-pass update, ensure generate all errors network can be adjusted and be updated, can comprehensively training convolutional nerve
Network.The Back Propagation Algorithm declined based on batch gradient is compared to the Back Propagation Algorithm based on stochastic gradient descent, preceding
The gradient of person is standard, being capable of comprehensively training convolutional neural networks;And the latter extracts one from training sample at random every time
Training sample updates the parameter of network, and gradient is approximate, and nonstandard, is not so good as the former in trained accuracy.Using
The accuracy of model training can be improved based on the Back Propagation Algorithm that batch gradient declines so that the target Chinese that training obtains
Handwritten word identification model possesses accurate recognition capability.
In step S10-S70, each Chinese in pending Chinese word train sample is obtained using optical character recognition technology
The pixel value eigenmatrix of word, the pixel value eigenmatrix based on each middle word in pending Chinese word train sample obtain rule
Model Chinese word train sample, which can be by computer Direct Recognition and reading.Then initialization volume
Product neural network is conducive to the training effectiveness for improving neural network.Code requirement Chinese word train sample training simultaneously obtains specification
Middle word identification model, then the update by word in non-standard to being adjusted property of word identification model in specification so that more
The adjustment Chinese handwritten word identification model obtained after new is updated under the premise of having criterion of identification standardized characters ability by training
Mode learn the further feature of handwritten Chinese word so that adjustment Chinese handwritten word identification model can preferably identify it is hand-written in
Word.Then Chinese characters sheet to be tested is identified using adjustment Chinese handwritten word identification model, obtains recognition result and true knot
What fruit was not consistent go out wrongly written character, and using it is all go out wrongly written character be input to adjustment Chinese handwritten word identification model as error word train sample
In be trained update, obtain target Chinese handwritten word identification model.It can be largely using error word train sample
The adverse effect that the overlearning and undue weakening generated in elimination script training process is brought, it is accurate can to advanced optimize identification
True rate.Word identification model and adjustment Chinese handwritten word identification model are used based on after stochastic gradient descent in training specification
Still there is preferable training effect to propagation algorithm, in the case of can be more than the training samples number.Training objective Chinese handwritten
Word identification model uses the Back Propagation Algorithm declined based on batch gradient, can be ensured to model using the decline of batch gradient
The abundant update of middle parameter all carries out backpropagation update, comprehensively root to the error that training sample generates in the training process
Parameter update is carried out according to the error of generation, improves the recognition accuracy of acquired model.
In one embodiment, as shown in figure 3, in step S20, based on each middle word in pending Chinese word train sample
Pixel value eigenmatrix obtain specification Chinese word train sample, specifically comprise the following steps:
S21:The pixel value eigenmatrix for obtaining each middle word in pending Chinese word train sample, by pixel value tag
Each pixel value is normalized in matrix, obtains the normalization pixel value eigenmatrix of each middle word, wherein normalizing
Changing the formula handled isMaxValue is picture in the pixel value eigenmatrix of each middle word
The maximum value that element is worth, MinValue are the minimum value of pixel value in the pixel value eigenmatrix of each middle word, and x is before normalizing
Pixel value, y be normalization after pixel value.
In the present embodiment, the pixel value eigenmatrix of each middle word in pending Chinese word train sample is obtained, each
The pixel value eigenmatrix of middle word represents the feature of corresponding word, represents the feature of word with pixel value herein, since word is
(graphical representation of one m × n of a general word) based on two-dimensional representation, therefore matrix expression, i.e. shape may be used in pixel value
Pixel value eigenmatrix.Computer equipment can identify the form of pixel value eigenmatrix, read in pixel value eigenmatrix
Numerical value.After server-side obtains pixel value eigenmatrix, using the formula of normalized to each Chinese in eigenmatrix
The pixel value of word is normalized, and obtains the normalization pixel value tag of each middle word.In the present embodiment, using normalizing
Change processing mode the pixel value eigenmatrix of each middle word can be compressed in the same range intervals, can accelerate with
The relevant calculating of pixel value eigenmatrix, helps to improve the training effectiveness of word identification model in trained specification.
S22:Pixel value in the normalization pixel value eigenmatrix of each middle word is divided into two class pixel values, is based on
Two class pixel values establish the binaryzation pixel value eigenmatrix of each middle word, by the binaryzation pixel characteristic square of each middle word
Battle array combination is used as specification Chinese word train sample.
In the present embodiment, the pixel value in the normalization pixel value eigenmatrix of each middle word is divided into two class pixels
Value, it includes only pixel value A or pixel value B which, which refers in pixel value,.It specifically, can be special by normalization pixel
Pixel value in sign matrix more than or equal to 0.5 is taken as 1, and the pixel value less than 0.5 is taken as 0, establishes corresponding each Chinese
The binaryzation pixel value eigenmatrix of word, each original in the binaryzation pixel characteristic matrix of middle word includes 0 or 1.?
After establishing the binaryzation pixel value eigenmatrix of each middle word, by the corresponding middle spelling words intellectual of binaryzation pixel value eigenmatrix
As specification Chinese word train sample.For example, at one in the image comprising word, including the part of word pixel and blank pixel
Part.The general color of pixel value on word can be deep, and " 1 " in binaryzation pixel value eigenmatrix represents the portion of word pixel
Point, and " 0 " then in representative image blank pixel part.It is to be appreciated that can by establishing binaryzation pixel value eigenmatrix
To be further simplified the character representation to word, by each middle textual representation and can be distinguished only with 0 and 1 matrix,
Speed of the computer disposal about the eigenmatrix of middle word can be improved, word identification model in trained specification is further increased
Training effectiveness.
Step S21-S22 is normalized and carries out the division of two class values to pending Chinese word train sample, obtains
The binaryzation pixel value eigenmatrix of each middle word is taken, and the binaryzation pixel value eigenmatrix of each middle word is corresponding
Word can significantly shorten the duration of word identification model in trained specification as specification Chinese word train sample.
In one embodiment, as shown in figure 4, in step S40, specification Chinese word train sample is input to convolutional Neural net
It is trained in network, weights and the biasing of convolutional neural networks is updated using the Back Propagation Algorithm based on stochastic gradient descent,
Word identification model in specification is obtained, is specifically comprised the following steps:
S41:Specification Chinese word train sample is input in convolutional neural networks, specification Chinese word train sample is obtained and exists
Forward direction output in convolutional neural networks.
Wherein, convolutional neural networks are a kind of feedforward neural networks, its artificial neuron can respond a part of covering
Surrounding cells in range can carry out video procession.Convolutional neural networks generally include at least two it is non-linear can
Trained convolutional layer, at least two nonlinear pond layers and at least one full articulamentum include at least five hidden layers, this
Outer further includes input layer and output layer.
In the present embodiment, specification Chinese word train sample is input in convolutional neural networks and is trained, specification Chinese
(specially weights and biasing are to specification Chinese word train sample after the processing of each layer in convolutional neural networks for word train sample
Response is handled), meeting corresponding output valve after each layer of convolutional neural networks is processed.Due to convolutional neural networks packet
The number of plies contained is more, and the function of each layer is different, therefore the output of each layer is different.
Specifically, if l layers are convolutional layers, the output of convolutional layer can be expressed as al=σ (zl)=σ (al-1*Wl+
bl), wherein alIndicate l layers of output, zlIt indicates not use the output before activation primitive processing, al-1Indicate l-1 layers of output
(i.e. the output of last layer), σ indicate that (the activation primitive σ used for convolutional layer compares other and activate letter activation primitive for ReLU
Several effects can be more preferable), * indicates convolution algorithm, WlIndicate l layers of weights, blIndicate l layers of biasing.If l layers are ponds
Change layer, then the output of pond layer can be expressed as al=pool (al-1), wherein pool refers to that down-sampling calculates, the down-sampling meter
The method that can select maximum pond is calculated, maximum pondization is actually to be maximized in the sample of n*n, after sampling
Sample value.It is common to also have averagely pond other than maximum pond, that is, take the average value that each sample is taken in the sample of n*n
As the sample value after sampling.If l layers are full articulamentums, output and the conventional depth neural network of the full articulamentum are calculated
The mode for calculating output is identical, is formulated as al=σ (zl)=σ (Wlal-1+bl), the meaning of parameter and the above-mentioned solution referred to
It releases identical, is no longer repeated herein.Particularly, output layer L, activation primitive σ are calculated using softmax functions
The formula of output layer L outputs is aL=softmax (zl)=softmax (WLaL-1+bL).It is each according to above-mentioned convolutional neural networks
The calculation formula of layer, can find out each layer in convolutional neural networks of output, and finally obtain the output a of output layerL, this is defeated
To output before going out i.e..It is to be appreciated that the forward direction output obtained in step S111, can reflect that specification Chinese word train sample exists
Output situation in convolutional neural networks can be compared according to the output situation and objective fact (legitimate reading), with root
Convolutional neural networks are adjusted according to error between the two.
S42:Error function is built to output and legitimate reading according to preceding, the expression formula of error function isWherein, n indicates training sample sum, xiIndicate i-th of training sample
Forward direction output, yiExpression and xiThe legitimate reading of corresponding i-th of training sample.
Wherein, legitimate reading, that is, objective fact, such as the word of input are " too " of regular script, then the result of forward direction output may
Be " big " etc. other as a result, and legitimate reading be exactly originally input " too ", legitimate reading can be interpreted as training sample
Label value, for calculating the error with forward direction output.
In the present embodiment, the forward direction obtained after being handled specification Chinese word train sample due to convolutional neural networks is defeated
Go out that there are errors with legitimate reading, then corresponding error function can be built according to the error, to utilize the error
Function training convolutional neural networks update weights and biasing, so that updated weights and the training sample for being biased in processing input
Originally it is that can obtain the forward direction output identical or more like with legitimate reading.Specifically, it is suitable to be built according to actual conditions
Error function, the present embodiment structure error function beIt can be compared with
To the error between output and legitimate reading before reflecting well.
S43:According to error function, convolutional neural networks are updated using the Back Propagation Algorithm based on stochastic gradient descent
Weights and biasing obtain word identification model in specification, wherein in the full articulamentum of convolutional neural networks, update the public affairs of weights
Formula isIn the convolutional layer of convolutional neural networks, the formula for updating weights isWl' indicate updated weights, WlIndicate that the weights before update, α indicate study
Rate, m indicate that specification Chinese word train sample, i indicate i-th of Chinese characters sheet of input, δi,lIndicate i-th of Chinese of input
Sensitivity of the printed words sheet at l layers, ai,l-1Indicate output of i-th of the Chinese characters sheet of input at l-1 layers, T representing matrixes
Transposition operation, * indicate that convolution algorithm, rot180 are indicated the operation of matrix turning 180 degree;In the full connection of convolutional neural networks
Layer, the formula for updating biasing areIn the convolutional layer of convolutional neural networks, the formula for updating biasing isbl' indicate updated biasing, blIndicate that the biasing before update, α indicate that learning rate, m indicate
Specification Chinese word train sample, i indicate i-th of Chinese characters sheet of input, δi,lIndicate that i-th of Chinese characters sheet of input exists
L layers of sensitivity, (u, v) refer to small in each convolution characteristic pattern in the convolution characteristic pattern of acquisition when carrying out convolution algorithm
Block position.
In the present embodiment, after building suitable error function, using the Back Propagation Algorithm based on stochastic gradient descent
Network parameter is updated, and using updated convolutional neural networks as word identification model in specification.Specifically, in back-propagating
In the process since each layer of convolutional neural networks has larger difference, back-propagating should be carried out according to each layer of actual conditions,
Network parameter is updated.In back-propagation process, the weights and biasing of output layer after update are calculated first, are adopted
The operation for weights W and biasing b ask local derviation respectively with error function, can obtain the public factor, the i.e. spirit of output layer
Sensitivity δL(L indicates output layer), by sensitivity δLL layers of sensitivity δ can be found out successivelyl, according to δlAcquire neural network
In l layers of gradient, recycle weights and the biasing of gradient updating convolutional neural networks.Specifically, if being currently full articulamentum,
ThenWherein, Wl+1Indicate l+1 layers of weights, T representing matrix transposition operations, δl+1Indicate l+1 layers
Sensitivity,Indicate that the operation (Hadamard products) that two matrix corresponding elements are multiplied, σ indicate activation primitive, zlExpression is being counted
It calculates propagated forward and does not use the output before activation primitive processing in the process.If being currently convolutional layer,Wherein, * indicates that convolution algorithm, rot180 are indicated the operation of matrix turning 180 degree,
The content that meaning degree of the seeing above meaning of parameters of remaining parameter explains in formula, details are not described herein.If being currently pond
Change layer, thenUpsample indicates up-sampling operation.It is each according to above-mentioned convolutional neural networks
Layer seeks corresponding sensitivity δl, according to sensitivity δlThe weights of update step l and biasing.Pond layer does not have weights and biasing, therefore only
Need to update weights and the biasing of full articulamentum and convolutional layer.
Specifically, in step S43, if being currently full articulamentum, the formula for updating weights is expressed asWherein, Wl' indicate updated weights, WlIndicate that the weights before update, α indicate study
Rate, m indicate that specification Chinese word train sample, i indicate i-th of Chinese characters sheet of input, δi,lIndicate i-th of Chinese of input
Sensitivity of the printed words sheet at l layers, ai,l-1Indicate output of i-th of the Chinese characters sheet of input at l-1 layers, T representing matrixes
Transposition operation,That is the gradient of l layers of weights W;The formula of update biasing is expressed asbl' table
Show updated biasing, blIndicate that the biasing α before update indicates that learning rate, m indicate that specification Chinese word train sample, i indicate defeated
I-th of the Chinese characters sheet entered, δi,lIndicate sensitivity of i-th of the Chinese characters sheet of input at l layers.If being currently convolution
Layer, the then formula for updating weights areUpdating the formula biased isWherein, each convolution in the convolution characteristic pattern of acquisition when (u, v) refers to progress convolution algorithm
Fritter (element of composition convolution characteristic pattern) position in characteristic pattern.By using the Back Propagation Algorithm of stochastic gradient descent,
Corresponding update is carried out to the weights of each layer in convolutional neural networks and biasing and obtains word identification model in specification.
Step S41-S43 can export structure according to the forward direction that specification Chinese word train sample is obtained in convolutional neural networks
Error functionAnd according to error function anti-pass update weights and partially
It sets, can obtain word identification model in specification, the model learning further feature of specification Chinese word train sample, Neng Goujing
True ground criterion of identification standardized characters.
In one embodiment, to be tested using adjustment Chinese handwritten word identification model identification as shown in figure 5, in step S60
Chinese characters sheet, obtain that recognition result and legitimate reading be not inconsistent goes out wrongly written character, using it is all go out wrongly written character as the word train sample that malfunctions,
Specifically comprise the following steps:
S61:Chinese characters to be tested are originally input to adjustment Chinese handwritten word identification model, obtain Chinese characters to be tested
Output valve of each word in adjusting Chinese handwritten word identification model in this.
In the present embodiment, Chinese characters to be tested are originally identified using adjustment Chinese handwritten word identification model, it is to be measured
It tries to include several middle words in Chinese characters sheet.In Chinese word library, common middle word probably has more than 3,000, in adjustment
It is similar to the Chinese characters sheet to be tested of input that each word in Chinese word library should be arranged in the output layer of literary handwritten word identification model
The probability value of degree, the probability value are each word in Chinese characters sheet to be tested in adjusting Chinese handwritten word identification model
Output valve can be realized by softmax functions.Briefly, when input " I " word, in adjustment Chinese handwritten word
Its output valve (being indicated with probability) corresponding with each word in Chinese word library will be obtained in identification model, such as and Chinese word library
In " I " corresponding output valve be 99.5%, it is 0.5% that the output valve of remaining word, which adds up,.By the way that Chinese characters to be tested are arranged
This, the output valve corresponding with each word in Chinese word library after being adjusted the identification of Chinese handwritten word identification model can be with
It is worth to rational recognition result according to the output.
S62:The maximum output value in the corresponding output valve of each word is chosen, each word is obtained according to maximum output value
Recognition result.
In the present embodiment, the maximum output value in the corresponding all output valves of each word is selected, according to the maximum output
Value can obtain the recognition result of the word.It is to be appreciated that output valve directly reflects in the Chinese characters sheet to be tested of input
Word and Chinese word library in each word similarity degree, and maximum output value then shows printed words sheet to be tested closest to middle word
Some word in library can be then the recognition result of the word according to the corresponding word of the maximum output value, such as input " I " word most
The recognition result exported afterwards is " I ".
S63:According to recognition result, obtain recognition result and go out wrongly written character with what legitimate reading was not inconsistent, using it is all go out wrongly written character as
Malfunction word train sample.
In the present embodiment, obtained recognition result is made comparisons with legitimate reading (objective fact), recognition result will be compared
Go out wrongly written character as error word train sample with what legitimate reading was not inconsistent.It is to be appreciated that the recognition result is Chinese to be tested
Word train sample is differed what adjustment Chinese handwritten word identification model identified as a result, being likely to be compared with legitimate reading
, reflect the model still has deficiency in the accuracy of identification, and these deficiencies can be carried out by the word train sample that malfunctions
Optimization, to reach more accurate recognition effect.
Step S61-S63 is according to each word in Chinese characters sheet to be tested in adjusting Chinese handwritten word identification model
Output valve can reflect the maximum output value of similarity degree between word from selection in output valve;Knowledge is worth to by maximum output again
Not as a result, and according to recognition result obtain error word train sample, for later use malfunction word train sample advanced optimize knowledge
Other accuracy provides important technology premise.
In the handwriting model training method that the present embodiment is provided, pending Chinese is obtained using optical character recognition technology
The pixel value eigenmatrix of each middle word in word train sample, based on each middle word in pending Chinese word train sample
Pixel value eigenmatrix obtains specification Chinese word train sample, which can be by computer Direct Recognition
And reading.According to formulaThe weights of initialization convolutional neural networks, the smaller value of use such as section [- 0.3,
0.3] initialization biasing, the minimum of error function can be fast and effeciently found by the way of the initialization, is conducive to roll up
The update and adjustment of product neural network.Pending Chinese word train sample is normalized and carries out two class values and is drawn
Point, binaryzation pixel value eigenmatrix is obtained, and using the corresponding word of eigenmatrix as specification Chinese word train sample, Neng Gouxian
Write the duration for shortening word identification model in training specification.It is obtained in convolutional neural networks according to specification Chinese word train sample
Forward direction output structure error functionAnd more according to the error function anti-pass
New weights and biasing, can obtain word identification model in specification, the model learning deep layer of specification Chinese word train sample
Feature, being capable of accurately criterion of identification standardized characters.Then word identification model in specification is adjusted by word in non-standard
The update of whole property so that the adjustment Chinese handwritten word identification model obtained after update is having identification specification Chinese handwritten word ability
Under the premise of, learn the further feature of word in non-standard by the newer mode of training so that adjustment Chinese handwritten word identification
Model can preferably identify non-standard Chinese handwritten word.Then, it is being adjusted according to each word in Chinese characters sheet to be tested
Output valve in Chinese handwritten word identification model can reflect the maximum output value of similarity degree between word from selection in output valve,
Be worth to recognition result using maximum output, and error word train sample obtained according to recognition result, and by it is all go out wrongly written character make
It is input in adjustment Chinese handwritten word identification model for error word train sample and is trained update, obtain target Chinese handwritten word
Identification model.Using error word train sample can largely eliminate the overlearning that is generated in script training process and
The adverse effect that undue weakening is brought can advanced optimize recognition accuracy.In addition, the handwriting model that the present embodiment is provided
In training method, in specification word identification model and adjustment Chinese handwritten word identification model training when using based on random
The Back Propagation Algorithm of gradient still has preferable training effectiveness and training effect in the case of more than the training samples number.Mesh
Chinese handwritten word identification model is marked in training using the Back Propagation Algorithm declined based on batch gradient, can be ensured pair
The abundant update of Model Parameter all carries out backpropagation update, comprehensively to the error that training sample generates in the training process
Ground carries out parameter update according to the error of generation, improves the recognition accuracy of acquired model.
It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each process
Execution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present invention constitutes any limit
It is fixed.
Fig. 6 shows the principle frame with the one-to-one handwriting model training device of handwriting model training method in embodiment
Figure.As shown in fig. 6, the handwriting model training device includes pixel value eigenmatrix acquisition module 10, specification Chinese word train sample
Word identification model acquisition module 40, adjustment Chinese handwritten word identification model in this acquisition module 20, initialization module 30, specification
Acquisition module 50, error word train sample acquisition module 60 and target Chinese handwritten word identification model acquisition module 70.Wherein, as
Word is known in plain value tag matrix acquisition module 10, specification Chinese word train sample acquisition module 20, initialization module 30, specification
Other model acquisition module 40, adjustment Chinese handwritten word identification model acquisition module 50,60 and of error word train sample acquisition module
The realization function step corresponding with handwriting model training method in embodiment of target Chinese handwritten word identification model acquisition module 70
Rapid to correspond, to avoid repeating, the present embodiment is not described in detail one by one.
Pixel value eigenmatrix acquisition module 10, for obtaining pending Chinese word train using optical character recognition technology
The pixel value eigenmatrix of each middle word in sample.
Specification Chinese word train sample acquisition module 20, for based on each middle word in pending Chinese word train sample
Pixel value eigenmatrix obtain specification Chinese word train sample.
Initialization module 30, for initializing convolutional neural networks.
Word identification model acquisition module 40 in specification, for specification Chinese word train sample to be input to convolutional Neural net
It is trained in network, weights and the biasing of convolutional neural networks is updated using the Back Propagation Algorithm based on stochastic gradient descent,
Obtain word identification model in specification.
Chinese handwritten word identification model acquisition module 50 is adjusted, for obtaining non-standard Chinese word train sample, by non-rule
Model Chinese word train sample is input in specification to be trained in word identification model, using based on the backward of stochastic gradient descent
Propagation algorithm updates the weights of word identification model and biasing in specification, obtains adjustment Chinese handwritten word identification model.
Malfunction word train sample acquisition module 60, for obtaining Chinese characters sheet to be tested, using adjustment Chinese handwritten word
Identification model identifies Chinese characters sheet to be tested, obtains recognition result and goes out wrongly written character with what legitimate reading was not inconsistent, it is all go out wrongly written character
As error word train sample.
Target Chinese handwritten word identification model acquisition module 70 is input to the Chinese hand of adjustment for the word train sample that will malfunction
It is trained in identification model of writing, is known using the Back Propagation Algorithm update adjustment Chinese handwritten word declined based on batch gradient
The weights of other model and biasing obtain target Chinese handwritten word identification model.
Preferably, specification Chinese word train sample acquisition module 20 includes normalization pixel value eigenmatrix acquiring unit 21
With specification Chinese word train sample acquisition unit 22.
Pixel value eigenmatrix acquiring unit 21 is normalized, for obtaining each Chinese in pending Chinese word train sample
Each pixel value in pixel value eigenmatrix is normalized the pixel value eigenmatrix of word, obtains each middle word
Normalization pixel value eigenmatrix, wherein the formula of normalized isMaxValue
For the maximum value of pixel value in the pixel value eigenmatrix of each middle word, MinValue is the pixel value tag of each middle word
The minimum value of pixel value in matrix, x are the pixel value before normalization, and y is the pixel value after normalization.
Specification Chinese word train sample acquisition unit 22, being used for will be in the normalization pixel value eigenmatrix of each middle word
Pixel value be divided into two class pixel values, based on two class pixel values establish it is each in word binaryzation pixel value eigenmatrix,
It is specification Chinese word train sample by the binaryzation pixel characteristic matrix group cooperation of each middle word.
Preferably, initialization module 30, for initializing convolutional neural networks, wherein convolutional neural networks initialization
Weights meet formulanlIndicate that the number of samples of the training sample inputted at l layers, S () indicate variance
Operation, WlIndicate l layers of weights,Indicate arbitrary, l indicates l layers in convolutional neural networks.
Preferably, word identification model acquisition module 40 includes preceding to output acquiring unit 41, error function structure in specification
Build word identification model acquiring unit 43 in unit 42 and specification.
Forward direction exports acquiring unit 41, for specification Chinese word train sample to be input in convolutional neural networks, obtains
Forward direction output of the specification Chinese word train sample in convolutional neural networks.
Error function construction unit 42, for building error function to output and legitimate reading according to preceding, error function
Expression formula isWherein, n indicates training sample sum, xiIt indicates i-th
The forward direction of training sample exports, yiExpression and xiThe legitimate reading of corresponding i-th of training sample.
Word identification model acquiring unit 43 in specification is used for according to error function, using based on stochastic gradient descent
Back Propagation Algorithm updates weights and the biasing of convolutional neural networks, obtains word identification model in specification, wherein in convolution god
Full articulamentum through network, the formula for updating weights areIn the convolution of convolutional neural networks
Layer, the formula for updating weights areWl' indicate updated weights, WlIndicate update
Preceding weights, α indicate that learning rate, m indicate that specification Chinese word train sample, i indicate i-th of Chinese characters sheet of input, δi,lTable
Show sensitivity of i-th of the Chinese characters sheet of input at l layers, ai,l-1Indicate i-th of Chinese characters sheet of input at l-1 layers
Output, T representing matrix transposition operations, * indicates that convolution algorithm, rot180 are indicated the operation of matrix turning 180 degree;In convolution
The full articulamentum of neural network, the formula for updating biasing areIn the convolutional layer of convolutional neural networks, update
The formula of biasing isbl' indicate updated biasing, blIndicate that the biasing before update, α indicate
Learning rate, m indicate that specification Chinese word train sample, i indicate i-th of Chinese characters sheet of input, δi,lIndicate i-th of input
Sensitivity of the Chinese characters sheet at l layers, each convolution in the convolution characteristic pattern of acquisition when (u, v) refers to progress convolution algorithm
Tile position in characteristic pattern.
Preferably, error word train sample acquisition module 60 includes model output value acquiring unit 61, Model Identification result
Acquiring unit 62 and error word train sample acquisition unit 63.
Model output value acquiring unit 61 identifies mould for Chinese characters to be tested to be originally input to adjustment Chinese handwritten word
Type obtains output valve of each word in adjusting Chinese handwritten word identification model in Chinese characters sheet to be tested.
Model Identification result acquiring unit 62, for choosing the maximum output value in the corresponding output valve of each word, root
The recognition result of each word is obtained according to maximum output value.
Malfunction word train sample acquisition unit 63, for according to recognition result, obtaining recognition result and not being inconsistent with legitimate reading
Go out wrongly written character, using it is all go out wrongly written character as error word train sample.
Fig. 7 shows a flow chart of hand-written character recognizing method in the present embodiment.The hand-written character recognizing method can be applicable to silver
The computer equipment of the mechanisms such as row, investment and insurance configuration reaches artificial intelligence mesh for handwritten Chinese word to be identified
's.As shown in fig. 7, the hand-written character recognizing method includes the following steps:
S80:Middle word to be identified is obtained, middle word to be identified is identified using target Chinese handwritten word identification model, is obtained
Middle word to be identified output valve in literary handwritten word identification model in the target, target Chinese handwritten word identification model is using upper
State what handwriting model training method was got.
Wherein, middle word to be identified refers to the middle word to be identified.
In the present embodiment, obtains middle word to be identified and middle word to be identified is input to target Chinese handwritten word identification model
In be identified, obtain the output valve of middle word to be identified in the target in literary handwritten word identification model, a Chinese to be identified
Word is corresponding with more than 3,000 (particular number be subject to Chinese word library) output valve, in can determining that this is to be identified based on the output valve
The recognition result of word.Specifically, middle word to be identified is specifically the binaryzation pixel value for capableing of Direct Recognition using computer
Eigenmatrix indicates.
S90:Destination probability output valve is obtained according to output valve and the semantic dictionary of preset Chinese, is exported based on destination probability
Value obtains the recognition result of middle word to be identified.
Wherein, the semantic dictionary of preset Chinese refers to semantic pass between the description Chinese word based on word frequency pre-set
The dictionary of system.For example, for the word of " X sun " the two words in the semantic dictionary of the Chinese, the probability that " sun " occurs is
30.5%, the probability that " big sun " occurs is 0.5%, the probability of the word appearance of two words of remaining such as " blazing sun " " X sun " it
Be 69%.Destination probability output valve is in conjunction with output valve and the semantic dictionary of preset Chinese, and what is obtained is to be identified for obtaining
The probability value of the recognition result of middle word.
Specifically, it obtains destination probability output valve using output valve and the semantic dictionary of preset Chinese and includes the following steps:
(1) maximum value is as the first probability value in the corresponding output valve of each word in selection middle word to be identified, according to the first probability
Value obtains the preliminary recognition result of middle word to be identified.(2) it is waited for according to the preliminary recognition result and the semantic dictionary acquisition of Chinese
The probability value semantic to the left of identification word and to the right semantic probability value.It is to be appreciated that for a text, the word in the text is that have
Sequencing, such as " red X sun ", then for " X " word, oriented from left to right both direction word " red X " and " X sun " are corresponding
Probability value, i.e. semantic probability value and to the right semantic probability value to the left.(3) each word in middle word to be identified is respectively set to correspond to
Output valve weights, the to the left weights of semantic probability value and weights of semantic probability value to the right.Specifically, it can assign and wait knowing
The weights of the corresponding output valve of each word 0.4 in not middle word assign the weights of semantic probability value 0.3 to the left, assign 0.3 to
The weights of right semanteme probability value.(4) according to each weights of above-mentioned setting be multiplied by respectively corresponding probability be worth to it is each
Probability value after ranking operation, is added the probability value after each ranking operation to obtain destination probability output valve that (destination probability is defeated
Go out value have it is multiple, specific number can be by Chinese word library subject to), and choose the corresponding word of maximum value in destination probability output valve and make
For the recognition result of middle word to be identified.Indeed, it is possible to first choose in output valve, maximum preceding 5 probability values of numerical value, before this
5 probability values represent 5 most possible words (recognition result), only combine the semantic dictionary of Chinese to calculate destination probability this 5 word
Output valve, then destination probability output valve just only 5, can greatly improve the efficiency of identification.By combining output valve and presetting
The semantic dictionary of Chinese, accurate recognition result can be obtained.It is to be appreciated that the identification for single word (non-textual), then
Corresponding recognition result can be directly obtained according to maximum value in output valve, without being added based on Chinese semantic knowledge
Not.
Step S80-S90 identifies middle word to be identified using target Chinese handwritten word identification model, in conjunction with output valve and in advance
If the semantic dictionary of Chinese obtain the recognition result of middle word to be identified.It is gathered around using the target Chinese handwritten word identification model itself
There is higher identification accuracy, the recognition accuracy of Chinese handwritten is further increased in conjunction with Chinese semantic dictionary.
In the hand-written character recognizing method that the embodiment of the present invention is provided, it is hand-written that middle word to be identified is input to target Chinese
It is identified in word identification model, and the semantic dictionary of preset Chinese is combined to obtain recognition result.It is hand-written using the target Chinese
When Chinese handwritten word is identified in word identification model, accurate recognition result can be obtained.
Fig. 8 shows the functional block diagram with the hand-written character distinguishing device correspondingly of hand-written character recognizing method in embodiment.Such as
Shown in Fig. 8, which includes output valve acquisition module 80 and recognition result acquisition module 90.Wherein, output valve
The realization function of acquisition module 80 and recognition result acquisition module 90 step 1 corresponding with hand-written character recognizing method in embodiment
One corresponds to, and to avoid repeating, the present embodiment is not described in detail one by one.
Hand-written character distinguishing device includes output valve acquisition module 80, for obtaining middle word to be identified, using target Chinese
Handwritten word identification model identifies middle word to be identified, and it is defeated in literary handwritten word identification model in the target to obtain middle word to be identified
Go out value;Target Chinese handwritten word identification model is got using handwriting model training method.
Recognition result acquisition module 90, for obtaining destination probability output according to output valve and the semantic dictionary of preset Chinese
Value, the recognition result of middle word to be identified is obtained based on destination probability output valve.
The present embodiment provides a computer readable storage medium, computer journey is stored on the computer readable storage medium
Sequence realizes handwriting model training method in embodiment, to avoid repeating, here no longer when the computer program is executed by processor
It repeats.Alternatively, realizing each module/unit of handwriting model training device in embodiment when the computer program is executed by processor
Function, to avoid repeating, which is not described herein again.Alternatively, realizing hand in embodiment when the computer program is executed by processor
The function of each step in recognition methods of writing does not repeat one by one herein to avoid repeating.Alternatively, the computer program is handled
Realize that the function of each module/unit in hand-written character distinguishing device in embodiment is not gone to live in the household of one's in-laws on getting married one by one herein to avoid repeating when device executes
It states.
Fig. 9 is the schematic diagram for the computer equipment that one embodiment of the invention provides.As shown in figure 9, the calculating of the embodiment
Machine equipment 100 includes:Processor 101, memory 102 and storage can be run in the memory 102 and on processor 101
Computer program 103 realizes the handwriting model training method in embodiment when the computer program 103 is executed by processor 101,
To avoid repeating, do not repeat one by one herein.Alternatively, being realized when the computer program is executed by processor 101 hand-written in embodiment
The function of each model/unit does not repeat one by one herein to avoid repeating in model training apparatus.Alternatively, the computer program quilt
The function of each step in hand-written character recognizing method in embodiment is realized when processor 101 executes, to avoid repeating, herein not one by one
It repeats.Alternatively, realizing in embodiment each module in hand-written character distinguishing device/mono- when the computer program is executed by processor 101
The function of member.To avoid repeating, do not repeat one by one herein.
Computer equipment 100 can be that the calculating such as desktop PC, notebook, palm PC and cloud server are set
It is standby.Computer equipment may include, but be not limited only to, processor 101, memory 102.It will be understood by those skilled in the art that Fig. 9
The only example of computer equipment 100 does not constitute the restriction to computer equipment 100, may include than illustrate it is more or
Less component either combines certain components or different components, such as computer equipment can also be set including input and output
Standby, network access equipment, bus etc..
Alleged processor 101 can be central processing unit (Central Processing Unit, CPU), can also be
Other general processors, digital signal processor (Digital Signal Processor, DSP), application-specific integrated circuit
(Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-
Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic,
Discrete hardware components etc..General processor can be microprocessor or the processor can also be any conventional processor
Deng.
Memory 102 can be the internal storage unit of computer equipment 100, for example, the hard disk of computer equipment 100 or
Memory.Memory 102 can also be equipped on the External memory equipment of computer equipment 100, such as computer equipment 100 insert
Connect formula hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash memory
Block (Flash Card) etc..Further, memory 102 can also both include computer equipment 100 internal storage unit or
Including External memory equipment.Memory 102 is used to store other program sum numbers needed for computer program and computer equipment
According to.Memory 102 can be also used for temporarily storing the data that has exported or will export.
It is apparent to those skilled in the art that for convenience of description and succinctly, only with above-mentioned each work(
Can unit, module division progress for example, in practical application, can be as needed and by above-mentioned function distribution by different
Functional unit, module are completed, i.e., the internal structure of described device are divided into different functional units or module, more than completion
The all or part of function of description.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, it can also
It is that each unit physically exists alone, it can also be during two or more units be integrated in one unit.Above-mentioned integrated list
The form that hardware had both may be used in member is realized, can also be realized in the form of SFU software functional unit.
Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations;Although with reference to aforementioned reality
Applying example, invention is explained in detail, it will be understood by those of ordinary skill in the art that:It still can be to aforementioned each
Technical solution recorded in embodiment is modified or equivalent replacement of some of the technical features;And these are changed
Or replace, the spirit and scope for various embodiments of the present invention technical solution that it does not separate the essence of the corresponding technical solution should all
It is included within protection scope of the present invention.
Claims (10)
1. a kind of handwriting model training method, which is characterized in that including:
The pixel value eigenmatrix of each middle word in pending Chinese word train sample is obtained using optical character recognition technology;
Pixel value eigenmatrix based on each middle word in pending Chinese word train sample obtains specification Chinese word train sample
This;
Initialize convolutional neural networks;
The specification Chinese word train sample is input in convolutional neural networks and is trained, using based on stochastic gradient descent
Back Propagation Algorithm update convolutional neural networks weights and biasing, obtain specification in word identification model;
Non-standard Chinese word train sample is obtained, the non-standard Chinese word train sample, which is input to word in the specification, to be known
It is trained in other model, word identification model in the specification is updated using the Back Propagation Algorithm based on stochastic gradient descent
Weights and biasing, obtain adjustment Chinese handwritten word identification model;
Chinese characters sheet to be tested is obtained, the Chinese characters to be tested are identified using the adjustment Chinese handwritten word identification model
This, obtain that recognition result and legitimate reading be not inconsistent goes out wrongly written character, using it is all it is described go out wrongly written character as the word train sample that malfunctions;
The error word train sample is input in the adjustment Chinese handwritten word identification model and is trained, using based on batch
Weights and the biasing for measuring the Back Propagation Algorithm update adjustment Chinese handwritten word identification model that gradient declines, obtain target Chinese hand
It writes identification model.
2. handwriting model training method according to claim 1, which is characterized in that described based on pending Chinese word train
The pixel value eigenmatrix of each middle word obtains specification Chinese word train sample in sample, including:
The pixel value eigenmatrix for obtaining each middle word in pending Chinese word train sample, by the pixel value eigenmatrix
In each pixel value be normalized, obtain it is each in word normalization pixel value eigenmatrix, wherein at normalization
The formula of reason isMaxValue is pixel value in the pixel value eigenmatrix of each middle word
Maximum value, MinValue be it is each in word pixel value eigenmatrix in pixel value minimum value, x be the picture before normalizing
Element value, y are the pixel value after normalization;
Pixel value in the normalization pixel value eigenmatrix of each middle word is divided into two class pixel values, is based on two class
Pixel value establishes the binaryzation pixel value eigenmatrix of each middle word, by the binaryzation pixel characteristic matrix group of each middle word
Cooperation is specification Chinese word train sample.
3. handwriting model training method according to claim 1, which is characterized in that described by the specification Chinese word train
Sample is input in convolutional neural networks and is trained, using the Back Propagation Algorithm update convolution god based on stochastic gradient descent
Weights through network and biasing obtain word identification model in specification, including:
Specification Chinese word train sample is input in convolutional neural networks, obtains the specification Chinese word train sample described
Forward direction output in convolutional neural networks;
Error function is built according to forward direction output and legitimate reading, the expression formula of the error function isWherein, n indicates training sample sum, xiIndicate i-th of training sample
Forward direction output, yiExpression and xiThe legitimate reading of corresponding i-th of training sample;
According to the error function, the weights of convolutional neural networks are updated using the Back Propagation Algorithm based on stochastic gradient descent
And biasing, obtain word identification model in specification, wherein in the full articulamentum of the convolutional neural networks, update the public affairs of weights
Formula isIn the convolutional layer of the convolutional neural networks, the formula for updating weights isWl'Indicate updated weights, WlIndicate that the weights before update, α indicate study
Rate, m indicate that specification Chinese word train sample, i indicate i-th of Chinese characters sheet of input, δi,lIndicate i-th of Chinese of input
Sensitivity of the printed words sheet at l layers, ai,l-1Indicate output of i-th of the Chinese characters sheet of input at l-1 layers, T representing matrixes
Transposition operation, * indicate that convolution algorithm, rot180 are indicated the operation of matrix turning 180 degree;In the complete of the convolutional neural networks
Articulamentum, the formula for updating biasing areIn the convolutional layer of the convolutional neural networks, the public affairs of biasing are updated
Formula isbl' indicate updated biasing, blIndicate that the biasing before update, α indicate learning rate, m
Indicate that specification Chinese word train sample, i indicate i-th of Chinese characters sheet of input, δi,lIndicate i-th of Chinese characters of input
Originally in l layers of sensitivity, when (u, v) refers to progress convolution algorithm in the convolution characteristic pattern of acquisition in each convolution characteristic pattern
Tile position.
4. handwriting model training method according to claim 1, which is characterized in that described to be known using adjustment Chinese handwritten word
Other Model Identification Chinese characters sheet to be tested obtains recognition result and goes out wrongly written character with what legitimate reading was not inconsistent, all errors
Word is used as error word train sample, including:
Chinese characters to be tested are originally input to adjustment Chinese handwritten word identification model, are obtained in the Chinese characters sheet to be tested
Output valve of each word in the adjustment Chinese handwritten word identification model;
The maximum output value in the corresponding output valve of each described word is chosen, each institute is obtained according to the maximum output value
State the recognition result of word;
According to recognition result, obtain recognition result and go out wrongly written character with what legitimate reading was not inconsistent, using it is all it is described go out wrongly written character as malfunctioning
Word train sample.
5. handwriting model training method according to claim 1, which is characterized in that the initialization convolutional neural networks,
Including:
The weights that convolutional neural networks initialize are enabled to meet formulaWherein, nlIndicate the input at l layers
The number of samples of training sample, S () indicate variance operation, WlIndicate l layers of weights,Indicate arbitrary, l indicates convolutional Neural
L layers in network.
6. a kind of hand-written character recognizing method, which is characterized in that including:
Middle word to be identified is obtained, the middle word to be identified is identified using target Chinese handwritten word identification model, described in acquisition
Output valve of the middle word to be identified in the target Chinese handwritten word identification model;The target Chinese handwritten word identification model
It is got using any one of the claim 1-5 handwriting model training methods;
Destination probability output valve is obtained according to the output valve and the semantic dictionary of preset Chinese, is exported based on the destination probability
Value obtains the recognition result of the middle word to be identified.
7. a kind of handwriting model training device, which is characterized in that including:
Pixel value eigenmatrix acquisition module, for being obtained in pending Chinese word train sample using optical character recognition technology
The pixel value eigenmatrix of word in each;
Specification Chinese word train sample acquisition module, for the pixel based on each middle word in pending Chinese word train sample
Value tag matrix obtains specification Chinese word train sample;
Initialization module, for initializing convolutional neural networks;
Word identification model acquisition module in specification, for the specification Chinese word train sample to be input to convolutional neural networks
In be trained, using based on stochastic gradient descent Back Propagation Algorithm update convolutional neural networks weights and biasing, obtain
Take word identification model in specification;
Chinese handwritten word identification model acquisition module is adjusted, for obtaining non-standard Chinese word train sample, by the non-standard
Chinese word train sample is input in the specification to be trained in word identification model, after based on stochastic gradient descent
The weights of word identification model and biasing in the specification are updated to propagation algorithm, obtains adjustment Chinese handwritten word identification model;
Malfunction word train sample acquisition module, for obtaining Chinese characters sheet to be tested, is known using the adjustment Chinese handwritten word
Chinese characters sheet to be tested described in other Model Identification obtains recognition result and goes out wrongly written character with what legitimate reading was not inconsistent, all described
Go out wrongly written character as error word train sample;
Target Chinese handwritten word identification model acquisition module, for the error word train sample to be input to the adjustment Chinese
It is trained in handwritten word identification model, using the Back Propagation Algorithm update adjustment Chinese handwritten word declined based on batch gradient
The weights of identification model and biasing obtain target Chinese handwritten word identification model.
8. a kind of hand-written character distinguishing device, which is characterized in that including:
Output valve acquisition module is waited for for obtaining middle word to be identified using described in the identification of target Chinese handwritten word identification model
Word in identification obtains output valve of the middle word to be identified in the target Chinese handwritten word identification model;The mesh
Mark Chinese handwritten word identification model is got using any one of the claim 1-5 handwriting model training methods;
Recognition result acquisition module, for obtaining destination probability output according to the output valve and the semantic dictionary of preset Chinese
Value, the recognition result of the middle word to be identified is obtained based on the destination probability output valve.
9. a kind of computer equipment, including memory, processor and it is stored in the memory and can be in the processor
The computer program of upper operation, which is characterized in that the processor realized when executing the computer program as claim 1 to
The step of any one of 5 handwriting model training method;Alternatively, the processor is realized when executing the computer program as weighed
Profit requires the step of 6 hand-written character recognizing method.
10. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, feature to exist
In realizing the handwriting model training method as described in any one of claim 1 to 5 when the computer program is executed by processor
Step;Alternatively, the processor realizes the step of hand-written character recognizing method as claimed in claim 6 when executing the computer program
Suddenly.
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201810564062.3A CN108764195B (en) | 2018-06-04 | 2018-06-04 | Handwriting model training method, handwritten character recognition method, device, equipment and medium |
| PCT/CN2018/094193 WO2019232847A1 (en) | 2018-06-04 | 2018-07-03 | Handwriting model training method, handwritten character recognition method and apparatus, and device and medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201810564062.3A CN108764195B (en) | 2018-06-04 | 2018-06-04 | Handwriting model training method, handwritten character recognition method, device, equipment and medium |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN108764195A true CN108764195A (en) | 2018-11-06 |
| CN108764195B CN108764195B (en) | 2023-04-18 |
Family
ID=64002667
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201810564062.3A Active CN108764195B (en) | 2018-06-04 | 2018-06-04 | Handwriting model training method, handwritten character recognition method, device, equipment and medium |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN108764195B (en) |
| WO (1) | WO2019232847A1 (en) |
Cited By (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109685968A (en) * | 2018-12-15 | 2019-04-26 | 西安建筑科技大学 | A kind of the identification model building and recognition methods of the banknote image defect based on convolutional neural networks |
| CN110378318A (en) * | 2019-07-30 | 2019-10-25 | 腾讯科技(深圳)有限公司 | Character recognition method, device, computer equipment and storage medium |
| CN110399912A (en) * | 2019-07-12 | 2019-11-01 | 广东浪潮大数据研究有限公司 | A kind of method of character recognition, system, equipment and computer readable storage medium |
| CN110688997A (en) * | 2019-09-24 | 2020-01-14 | 北京猎户星空科技有限公司 | Image processing method and device |
| CN111723807A (en) * | 2019-03-20 | 2020-09-29 | Sap欧洲公司 | Recognizing Typed and Handwritten Characters Using End-to-End Deep Learning |
| CN111738269A (en) * | 2020-08-25 | 2020-10-02 | 北京易真学思教育科技有限公司 | Model training method, image processing device, model training apparatus, and storage medium |
| CN113449726A (en) * | 2021-07-08 | 2021-09-28 | 中国工商银行股份有限公司 | Character comparison and identification method and device |
| CN113785305A (en) * | 2019-05-05 | 2021-12-10 | 华为技术有限公司 | A method, device and device for detecting oblique characters |
| CN114120336A (en) * | 2020-08-25 | 2022-03-01 | 合肥本源量子计算科技有限责任公司 | Handwritten digit recognition method, system, device and computer readable storage medium |
Families Citing this family (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111680690B (en) * | 2020-04-26 | 2023-07-11 | 泰康保险集团股份有限公司 | Character recognition method and device |
| CN111476237A (en) * | 2020-04-28 | 2020-07-31 | 有米科技股份有限公司 | Character recognition method, device, server and storage medium |
| CN111950548B (en) * | 2020-08-10 | 2023-07-28 | 河南大学 | A Chinese Character Recognition Method Using Font Image for Deep Template Matching |
| CN112308058B (en) * | 2020-10-25 | 2023-10-24 | 北京信息科技大学 | Method for recognizing handwritten characters |
| CN112766051B (en) * | 2020-12-29 | 2024-09-06 | 有米科技股份有限公司 | Attention-based image character recognition method and device |
| CN112801085A (en) * | 2021-02-09 | 2021-05-14 | 沈阳麟龙科技股份有限公司 | Method, device, medium and electronic equipment for recognizing characters in image |
| CN116012860B (en) * | 2022-12-29 | 2024-01-16 | 华南师范大学 | Teacher blackboard writing design level diagnosis method and device based on image recognition |
| CN116311543B (en) * | 2023-02-03 | 2024-03-08 | 汇金智融(深圳)科技有限公司 | Handwriting analysis method and system based on image recognition technology |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101785030A (en) * | 2007-08-10 | 2010-07-21 | 微软公司 | Hidden markov model based handwriting/calligraphy generation |
| US20150317336A1 (en) * | 2014-04-30 | 2015-11-05 | Hewlett-Packard Development Company, L.P. | Data reconstruction |
| CN107316054A (en) * | 2017-05-26 | 2017-11-03 | 昆山遥矽微电子科技有限公司 | Non-standard character recognition methods based on convolutional neural networks and SVMs |
Family Cites Families (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101630368B (en) * | 2009-08-25 | 2011-09-14 | 华南理工大学 | Self-adaptive method of user writing style for recognizing handwritten Chinese characters |
| US9390370B2 (en) * | 2012-08-28 | 2016-07-12 | International Business Machines Corporation | Training deep neural network acoustic models using distributed hessian-free optimization |
| CN105184226A (en) * | 2015-08-11 | 2015-12-23 | 北京新晨阳光科技有限公司 | Digital identification method, digital identification device, neural network training method and neural network training device |
| CN105654135A (en) * | 2015-12-30 | 2016-06-08 | 成都数联铭品科技有限公司 | Image character sequence recognition system based on recurrent neural network |
| CN106599941A (en) * | 2016-12-12 | 2017-04-26 | 西安电子科技大学 | Method for identifying handwritten numbers based on convolutional neural network and support vector machine |
-
2018
- 2018-06-04 CN CN201810564062.3A patent/CN108764195B/en active Active
- 2018-07-03 WO PCT/CN2018/094193 patent/WO2019232847A1/en not_active Ceased
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101785030A (en) * | 2007-08-10 | 2010-07-21 | 微软公司 | Hidden markov model based handwriting/calligraphy generation |
| US20150317336A1 (en) * | 2014-04-30 | 2015-11-05 | Hewlett-Packard Development Company, L.P. | Data reconstruction |
| CN107316054A (en) * | 2017-05-26 | 2017-11-03 | 昆山遥矽微电子科技有限公司 | Non-standard character recognition methods based on convolutional neural networks and SVMs |
Cited By (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109685968A (en) * | 2018-12-15 | 2019-04-26 | 西安建筑科技大学 | A kind of the identification model building and recognition methods of the banknote image defect based on convolutional neural networks |
| CN111723807A (en) * | 2019-03-20 | 2020-09-29 | Sap欧洲公司 | Recognizing Typed and Handwritten Characters Using End-to-End Deep Learning |
| CN111723807B (en) * | 2019-03-20 | 2023-12-26 | Sap欧洲公司 | End-to-end deep learning recognition machine for typing characters and handwriting characters |
| CN113785305B (en) * | 2019-05-05 | 2024-04-16 | 华为云计算技术有限公司 | A method, device and apparatus for detecting tilted text |
| CN113785305A (en) * | 2019-05-05 | 2021-12-10 | 华为技术有限公司 | A method, device and device for detecting oblique characters |
| CN110399912A (en) * | 2019-07-12 | 2019-11-01 | 广东浪潮大数据研究有限公司 | A kind of method of character recognition, system, equipment and computer readable storage medium |
| CN110378318A (en) * | 2019-07-30 | 2019-10-25 | 腾讯科技(深圳)有限公司 | Character recognition method, device, computer equipment and storage medium |
| CN110688997A (en) * | 2019-09-24 | 2020-01-14 | 北京猎户星空科技有限公司 | Image processing method and device |
| CN114120336B (en) * | 2020-08-25 | 2023-08-08 | 本源量子计算科技(合肥)股份有限公司 | Handwriting digital recognition method, system, equipment and computer readable storage medium |
| CN114120336A (en) * | 2020-08-25 | 2022-03-01 | 合肥本源量子计算科技有限责任公司 | Handwritten digit recognition method, system, device and computer readable storage medium |
| CN111738269A (en) * | 2020-08-25 | 2020-10-02 | 北京易真学思教育科技有限公司 | Model training method, image processing device, model training apparatus, and storage medium |
| CN113449726A (en) * | 2021-07-08 | 2021-09-28 | 中国工商银行股份有限公司 | Character comparison and identification method and device |
| CN113449726B (en) * | 2021-07-08 | 2025-03-11 | 中国工商银行股份有限公司 | Text matching and recognition method and device |
Also Published As
| Publication number | Publication date |
|---|---|
| CN108764195B (en) | 2023-04-18 |
| WO2019232847A1 (en) | 2019-12-12 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN108764195A (en) | Handwriting model training method, hand-written character recognizing method, device, equipment and medium | |
| US12079696B2 (en) | Machine learning model training method and device, and expression image classification method and device | |
| Hafemann et al. | Learning features for offline handwritten signature verification using deep convolutional neural networks | |
| CN109086653A (en) | Handwriting model training method, hand-written character recognizing method, device, equipment and medium | |
| CN110414498B (en) | A Natural Scene Text Recognition Method Based on Cross Attention Mechanism | |
| CN108985442A (en) | Handwriting model training method, hand-written character recognizing method, device, equipment and medium | |
| CN108491765A (en) | A kind of classifying identification method and system of vegetables image | |
| CN109034280A (en) | Handwriting model training method, hand-written character recognizing method, device, equipment and medium | |
| CN106845527A (en) | A kind of vegetable recognition methods | |
| CN112686345A (en) | Off-line English handwriting recognition method based on attention mechanism | |
| Ahmad et al. | Offline Urdu Nastaleeq optical character recognition based on stacked denoising autoencoder | |
| CN109086652A (en) | Handwritten word model training method, Chinese characters recognition method, device, equipment and medium | |
| CN108090498A (en) | A kind of fiber recognition method and device based on deep learning | |
| CN112861718A (en) | Lightweight feature fusion crowd counting method and system | |
| CN110059677A (en) | Digital table recognition methods and equipment based on deep learning | |
| CN107704859A (en) | A kind of character recognition method based on deep learning training framework | |
| CN107145885A (en) | A kind of individual character figure character recognition method and device based on convolutional neural networks | |
| CN112101364A (en) | A Semantic Segmentation Method Based on Incremental Learning of Parameter Importance | |
| CN114549698A (en) | Text synthesis method, device and electronic device | |
| CN118037580A (en) | Image denoising method, system, storage medium and electronic device | |
| CN109034279A (en) | Handwriting model training method, hand-written character recognizing method, device, equipment and medium | |
| CN108985151A (en) | Handwriting model training method, hand-written character recognizing method, device, equipment and medium | |
| CN109086651A (en) | Handwriting model training method, hand-written character recognizing method, device, equipment and medium | |
| CN111310820A (en) | Ground-based meteorological cloud image classification method based on cross-validation deep CNN feature integration | |
| WO2023084759A1 (en) | Image processing device, image processing method, and program |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |