[go: up one dir, main page]

CN114495114A - Text sequence identification model calibration method based on CTC decoder - Google Patents

Text sequence identification model calibration method based on CTC decoder Download PDF

Info

Publication number
CN114495114A
CN114495114A CN202210402975.1A CN202210402975A CN114495114A CN 114495114 A CN114495114 A CN 114495114A CN 202210402975 A CN202210402975 A CN 202210402975A CN 114495114 A CN114495114 A CN 114495114A
Authority
CN
China
Prior art keywords
character
label
context
predicted
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210402975.1A
Other languages
Chinese (zh)
Other versions
CN114495114B (en
Inventor
黄双萍
罗钰
徐可可
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Guangzhou
South China University of Technology SCUT
Original Assignee
Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Guangzhou
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Guangzhou, South China University of Technology SCUT filed Critical Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Guangzhou
Priority to CN202210402975.1A priority Critical patent/CN114495114B/en
Publication of CN114495114A publication Critical patent/CN114495114A/en
Application granted granted Critical
Publication of CN114495114B publication Critical patent/CN114495114B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • G06F18/2193Validation; Performance evaluation; Active pattern learning techniques based on specific statistical tests

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了基于CTC解码器的文本序列识别模型校准方法,包括:将文本图像支撑集输入至待校准训练模型中,获得文本序列识别结果;利用文本图像支撑集的文本序列识别结果计算上下文混淆矩阵,上下文混淆矩阵用于表征序列中相邻时刻预测字符之间的上下文分布关系;根据上下文混淆矩阵,利用上下文相关预测分布对标签平滑中平滑强度有选择性地进行自适应的变化,以实现序列置信度的自适应校准;基于上下文选择性损失函数重新训练待校准训练模型,输出预测文本序列及校准的置信度。本发明方法将标签平滑扩展到基于CTC解码器的文本序列识别模型上,引入序列间上下文关系,对预测序列进行自适应的校准,使得模型输出预测文本置信度能够更加精准。

Figure 202210402975

The invention discloses a method for calibrating a text sequence recognition model based on a CTC decoder. Matrix, context confusion matrix is used to characterize the context distribution relationship between predicted characters at adjacent moments in the sequence; Adaptive calibration of sequence confidence; retrain the training model to be calibrated based on the context selective loss function, and output the confidence of the predicted text sequence and calibration. The method of the invention extends the label smoothing to the text sequence recognition model based on the CTC decoder, introduces the context relationship between sequences, and performs self-adaptive calibration on the prediction sequence, so that the model output prediction text confidence level can be more accurate.

Figure 202210402975

Description

Text sequence identification model calibration method based on CTC decoder
Technical Field
The invention belongs to the technical field of artificial intelligence and text sequence processing, and particularly relates to a text sequence identification model calibration method based on a CTC decoder.
Background
With the development of deep learning, deep neural network models are deployed in the fields of medical treatment, transportation, finance and the like due to high prediction accuracy, for example: the medical image recognition model can provide auxiliary basis for doctors to diagnose the state of illness, the target detection recognition model enables the vehicle to have intelligent analysis capability so as to control the speed or direction of the sensor, and the OCR (optical character recognition) model provides powerful support for the digitization of the financial bill entry. However, the potential risks of the depth model are gradually exposed in the process of popularizing and deepening the depth model in various fields. Scene text images are widely existed in various industries and fields of our lives as one of data forms widely existed in our daily scenes. For example: the medical diagnosis, the medical examination order, and the financial system. Compared with the unstructured data such as common single-frame images and characters, the structured sequence data is more difficult to predict, and the reliability of the structured sequence data is more complicated to obtain and judge.
Currently, confidence is one of the most direct indicators for evaluating the reliability of a prediction. The model prediction score is generally normalized to probability as its confidence. The confidence of reliability can accurately reflect the accuracy of the prediction, and when the model is not confident about the prediction and gives a relatively low confidence, an intervention decision is manually needed to ensure that the task is safely performed. However, it is found that the confidence of the output of many existing deep neural network models is not calibrated, but there is a self-confidence problem that the confidence of the output is higher than the accuracy. The reason for the model not being calibrated comes from several aspects. On one hand, as the structure of the model is gradually huge and complex, the problem of overfitting of the model is caused by high fitting capacity caused by a large number of parameters. For the over-fit prediction label category, the model tends to assign a high confidence to the erroneous predictions as well. Moreover, the one-hot coding-based loss function and softmax confidence coefficient calculation method increases the distance between positive and negative prediction samples, and although the correct samples are selected conveniently, the prediction confidence coefficient is easily over-confident. On the other hand, the distribution of training data and test data is different. It is also difficult for a model to give a reliable confidence when it needs to deal with data never or rarely seen in the training dataset in real-world scenarios.
Due to the complex structure of the text sequence, calibration of the scene text recognition model is also very difficult. Specifically, one is that a text sequence is usually composed of a plurality of characters, and the confidence space size thereof becomes larger as the number of characters increases. Secondly, text recognition is usually a time-sequence related process, and the context relationship between characters is important prior information. And the strength of the context dependence between them is different for different characters. Therefore, alignment at the sequence level is difficult to achieve by simply aligning all characters uniformly.
However, most of the existing confidence calibration methods are mainly directed to unstructured simple data expansion. These calibration methods can be largely divided into two main categories: post-processing calibration and predictive model training calibration. The post-processing approach typically learns a regression equation related to confidence on the set-aside (hold-out) data set, transforming the output confidence. The calibration methods for traditional classifiers proposed earlier in the field of machine learning are mostly based on post-processing ideas, such as: plattscealing, order preservation regression, histogramming, and the like. In the field of deep learning, a scholars proposes temperature regulation based on platting and calibrates confidence level by introducing a temperature parameter. The prediction model training calibration generally adjusts the depth model directly. The method mainly considers the over-confidence problem caused by over-fitting, and the model is calibrated by relieving the over-fitting in modes of dropout, label smoothing loss, entropy regulation and the like. In addition, from the data aspect, part of the methods are to perform enhancement operation on the training data during the training process to solve the problem, for example, methods such as MixUp, GAN, and the like. However, these methods do not consider the heterogeneity of the distribution of different classes of data in the data set, or only consider the correlation between local single prediction and real tags, neglect the length and intrinsic context-dependent characteristics of the sequence data, and are difficult to directly migrate to the confidence calibration of the sequence data. Therefore, a specific calibration design needs to be further made according to the sequence data characteristics, so as to improve the calibration performance of the sequence confidence.
Disclosure of Invention
In view of the above, it is necessary to provide a method for calibrating a text sequence recognition model based on a CTC decoder for solving the technical problem of confidence calibration of a scene text recognition model, where the method reviews the essence of a label smoothing method, the effectiveness of label smoothing is mainly embodied by adding a Kullback-leibler (kl) divergence term as a regular term on the basis of an original loss function, and in consideration of context dependence existing in a sequence, a context relationship between characters is modeled in the form of a confusion matrix and used as a language knowledge prior to guide label probability distribution, and smoothing strengths of different types of labels are adaptively adjusted according to context prediction error rates thereof.
The invention discloses a text sequence identification model calibration method based on a CTC decoder, which comprises the following steps:
step 1, inputting a text image support set into a training model to be calibrated to obtain a text sequence recognition result;
step 2, calculating a context confusion matrix by using a text sequence identification result of the text image support set, wherein the context confusion matrix is used for representing the context distribution relation between predicted characters at adjacent moments in the sequence;
step 3, selectively carrying out self-adaptive change on the smooth intensity in the label smoothing by utilizing context correlation prediction distribution according to the context confusion matrix so as to realize self-adaptive calibration of the sequence confidence coefficient;
and 4, retraining the training model to be calibrated based on the context selective loss function, and finally outputting the predicted text sequence and the confidence coefficient of the calibration.
Specifically, the process of computing the context confusion matrix comprises the following steps:
initial setup is common
Figure 840152DEST_PATH_IMAGE001
Context confusion matrix with 0 element for each prediction class
Figure 216907DEST_PATH_IMAGE002
Figure 555484DEST_PATH_IMAGE003
Indexing for the corresponding prediction categories;
text sequence recognition results of aligned text image support sets
Figure 992282DEST_PATH_IMAGE004
And corresponding genuine label
Figure 432621DEST_PATH_IMAGE005
Figure 613067DEST_PATH_IMAGE006
The result length is identified for the text sequence,
Figure 337309DEST_PATH_IMAGE007
is the length of the real label sequence;
if the recognition result is aligned with the real label, the character is known at the last moment
Figure 210587DEST_PATH_IMAGE008
The class to which the tag belongs is indexed by
Figure 604135DEST_PATH_IMAGE003
In case of directly counting the current
Figure 588272DEST_PATH_IMAGE009
Time of day character
Figure 901441DEST_PATH_IMAGE010
Is predicted as a character
Figure 555408DEST_PATH_IMAGE011
Of the context confusion matrix
Figure 563815DEST_PATH_IMAGE002
Wherein each element of the context confusion matrix
Figure 210697DEST_PATH_IMAGE012
Indicating that the predicted character at the previous time is known to belong to
Figure 519319DEST_PATH_IMAGE003
Class time, the true tag belongs to
Figure 344186DEST_PATH_IMAGE013
The current time character of the class is predicted to be the first
Figure 371048DEST_PATH_IMAGE014
The times of class labels are that for the characters at the head of the text, the category of the characters at the head of the text is set as a space by default;
if the identification result is not aligned with the real label, calculating the operation sequence from the prediction sequence to the real label through the edit distance to obtain the alignment relation between the sequences, and then counting to obtain the context confusion matrix.
Preferably, the process of obtaining the alignment relationship between the sequences requires performing the following operations several times: deleting one character operation, inserting one character operation or replacing one character operation until the characters are correctly predicted and aligned, wherein the deleting one character operation is used for correcting the empty symbols in the real label sequence to be wrongly predicted into other characters, the inserting one character operation is used for correcting the corresponding characters in the real label sequence to be predicted into the empty symbols, and the replacing one character operation is used for correcting the corresponding characters in the real label sequence to be predicted into other characters.
Specifically, the selectively adaptively changing the smoothing intensity in smoothing the label by using the context-dependent prediction distribution in step 3 means that the smoothing intensity is adaptively adjusted according to the context relationship, and the label probability is adjusted to obtain a selective context-aware probability distribution formula:
Figure 821621DEST_PATH_IMAGE015
wherein
Figure 250328DEST_PATH_IMAGE016
Represents the character at the last time
Figure 42835DEST_PATH_IMAGE017
The corresponding error-prone set when the category is known,
Figure 556993DEST_PATH_IMAGE018
indicating that a character was last time-of-day
Figure 293480DEST_PATH_IMAGE017
Current character in case of label belonging to category
Figure 576694DEST_PATH_IMAGE019
Is predicted to be
Figure 992632DEST_PATH_IMAGE020
The number of times of the operation of the motor,
Figure 994086DEST_PATH_IMAGE021
the index of the category is represented by,
Figure 661828DEST_PATH_IMAGE022
indicating that a character was last time-of-day
Figure 940493DEST_PATH_IMAGE017
Current character in case of label belonging to category
Figure 402699DEST_PATH_IMAGE019
Is predicted to be
Figure 750503DEST_PATH_IMAGE021
The number of times of the operation of the motor,
Figure 487515DEST_PATH_IMAGE023
representing the intensity of the smoothingFor the previous character
Figure 479742DEST_PATH_IMAGE024
When the label type is known, the current character needs to be confirmed first
Figure 988215DEST_PATH_IMAGE019
Whether the label belongs to an error-prone set or not, if not, label smooth prediction is not needed; otherwise, the sliding strength is adaptively adjusted and leveled according to the error rate;
the error-prone set is obtained by
Figure 964261DEST_PATH_IMAGE025
Corresponding to different prediction categories, and counting the characters at the last moment according to the frequency of predicting the characters appearing in each category in the context confusion matrix
Figure 98439DEST_PATH_IMAGE024
Belong to the first
Figure 945172DEST_PATH_IMAGE026
Error prone set of time classes
Figure 280339DEST_PATH_IMAGE027
The division is according to the following:
Figure 87889DEST_PATH_IMAGE028
wherein,
Figure 166703DEST_PATH_IMAGE029
representing the character of the last moment
Figure 992577DEST_PATH_IMAGE030
Belong to the first
Figure 233065DEST_PATH_IMAGE031
Class time, the accuracy of the prediction at the current moment, if the class error rate is greater than the set threshold value
Figure 918125DEST_PATH_IMAGE032
Then the corresponding category will be corresponded
Figure 141908DEST_PATH_IMAGE033
To be classified into the error-prone set
Figure 963234DEST_PATH_IMAGE030
After the label belongs to the category, the corresponding error-prone set can be obtained
Figure 764836DEST_PATH_IMAGE034
More specifically, the context selectivity loss function described in step 4 is:
Figure 671613DEST_PATH_IMAGE035
wherein,
Figure 92230DEST_PATH_IMAGE036
to represent
Figure 643428DEST_PATH_IMAGE037
At time, corresponding tag class index
Figure 225719DEST_PATH_IMAGE031
Is selected based on the selected context-aware probability distribution,
Figure 10004DEST_PATH_IMAGE038
is shown in
Figure 703154DEST_PATH_IMAGE037
Time of day corresponding prediction category label
Figure 374438DEST_PATH_IMAGE039
The probability of (a) of (b) being,
Figure 127630DEST_PATH_IMAGE040
the loss of the CTC is indicated,
Figure 8998DEST_PATH_IMAGE041
to represent
Figure 896052DEST_PATH_IMAGE042
The divergence of the light beam is measured by the light source,
Figure 546476DEST_PATH_IMAGE043
to represent
Figure 345936DEST_PATH_IMAGE044
The time corresponds to all
Figure 714600DEST_PATH_IMAGE045
A probability vector for the class label category,
Figure 749552DEST_PATH_IMAGE046
the divergence definition formula is:
Figure 910275DEST_PATH_IMAGE047
Figure 5270DEST_PATH_IMAGE048
representing corresponding prediction category labels
Figure 861231DEST_PATH_IMAGE031
The probability of (a) of (b) being,
Figure 572310DEST_PATH_IMAGE049
representing corresponding genuine category label
Figure 197327DEST_PATH_IMAGE031
The probability of (c).
Compared with the prior art, the invention has the beneficial effects that:
according to the method, the label is smoothly expanded to the text sequence recognition model based on the CTC decoder, the context relationship among sequences is introduced, the predicted sequence is subjected to self-adaptive calibration, the calibration performance of the text sequence recognition model can be well improved, and the confidence coefficient of the model output predicted text can be more accurate.
Drawings
FIG. 1 shows a schematic flow diagram of a method embodying the present invention;
FIG. 2 is a schematic diagram showing the operation of modules according to an embodiment of the present invention;
fig. 3 shows a schematic flow of an alignment policy in an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
For the sake of reference and clarity, the technical terms, abbreviations or abbreviations used hereinafter are to be interpreted in summary as follows:
CTC: connectionsist Temporal Classification (Link definition Temporal classifier)
KL divergence: Kullback-Leibler divergence
NLL: negative Log-Likelihood (Negative Log Likelihood)
LS: label smoothening (Label Smoothing)
CASSLS: context-aware Selective Label Smoothing
The invention discloses a character image augmentation method based on shape transformation, which aims to solve various problems in the prior art.
Fig. 1 shows a schematic flow diagram of an embodiment of the invention. A text sequence identification model calibration method based on a CTC decoder comprises the following steps:
inputting the text image support set into a training model to be calibrated to obtain a text sequence recognition result;
calculating a context confusion matrix by using a text sequence identification result of the text image support set, wherein the context confusion matrix is used for representing the context distribution relation between predicted characters at adjacent moments in the sequence;
according to the context confusion matrix, selectively carrying out self-adaptive change on the smooth intensity in the label smoothing by utilizing context correlation prediction distribution so as to realize self-adaptive calibration of the sequence confidence coefficient;
and retraining the training model to be calibrated based on the context selective loss function, and finally outputting the predicted text sequence and the calibrated confidence coefficient.
Specifically, the present embodiment adopts the following steps to implement the inventive method.
Step 1, constructing a support data set, inputting the support data set into a corresponding scene text recognition pre-training model, and obtaining a recognition result, namely a corresponding text sequence.
The data distribution in the support set needs to be similar to the training set, and a validation set or a part of the training set of the reference data set is generally selected as the support set. Here, the training data set of IIIT5k, SVT, IC03, IC13, and IC15 is selected as the support set. And inputting the data to be tested into a corresponding scene text recognition pre-training model, and performing model recognition prediction to obtain a corresponding prediction sequence. In the next step the confusion matrix is constructed.
And 2, acquiring a sequence context prediction distribution relation by utilizing the support set prediction result, and representing the sequence context prediction distribution relation in the form of a confusion matrix as context modeling output.
In step 2, for the input data, the text is predicted
Figure 853436DEST_PATH_IMAGE004
And corresponding genuine label
Figure 931113DEST_PATH_IMAGE005
Based on (
Figure 573447DEST_PATH_IMAGE006
And
Figure 928336DEST_PATH_IMAGE050
corresponding to the length of the text sequence), the context relationship of the predictions at adjacent times in the sequence is obtained, that is, under the condition that the category to which the character prediction at the previous time belongs is known, the probability that the character prediction at the next time belongs to the category has certain correlation with the probability.
If the recognition result is aligned with the real label, the character is known at the last moment
Figure 365134DEST_PATH_IMAGE008
The class to which the tag belongs is indexed by
Figure 320320DEST_PATH_IMAGE003
In case of directly counting the current
Figure 500766DEST_PATH_IMAGE009
Time of day character
Figure 100375DEST_PATH_IMAGE010
Is predicted as a character
Figure 583440DEST_PATH_IMAGE011
Context confusion matrix of
Figure 635709DEST_PATH_IMAGE002
. Wherein each element in the context confusion matrix
Figure 10059DEST_PATH_IMAGE012
Meta-representation knowing that the last-moment predicted character belongs to the second
Figure 198595DEST_PATH_IMAGE003
Class time, the true tag belongs to
Figure 852561DEST_PATH_IMAGE013
The current time character of the class is predicted to be the first
Figure 657706DEST_PATH_IMAGE014
Number of class labels. Specifically, for the character at the beginning of the text, the first character isDefault setting of the category of the time character as a space;
the specific construction mode is that firstly, the initialization is carried out
Figure 179954DEST_PATH_IMAGE001
The element of each prediction class is a confusion matrix of 0. For scene text recognition, initialization is first performed
Figure 878789DEST_PATH_IMAGE051
(containing 10 digits, 26 english letters, and 1 space category). Assuming that there is a predicted sequence "cat" (the true label is "cat"), the character class is blank for the time immediately preceding the first character "c", and the true character label "c" is correctly predicted as "c" at the current time, then one is added to the position element representing the corresponding label "c" and prediction "c" in the confusion matrix for the corresponding blank class. And counting all samples in the support set in the same way to finally obtain a confusion matrix representing the context prediction frequency distribution of different prediction categories. FIG. 2 shows context confusion matrices for previous time characters "3", "A", and "V", respectively, and for previous time character predictions of different classes, the class to which the current character prediction belongs is different, and therefore a differential calibration operation is required.
Considering the situation that the predicted sequence is not aligned with the real sequence label of the predicted sequence due to error prediction, the editing distance is used for calculating the operation sequence between the real sequence and the predicted sequence, and the alignment relation between the sequences is obtained. The specific alignment strategy is as shown in fig. 3, and in order to realize the one-to-one correspondence between characters of the predicted text and the real text, the method includes (1) deleting one character (d); (2) inserting a character (i); (3) an operation of replacing one character(s). If not, it is indicated by the symbol "-". Taking the prediction sequence "lapaitmen" as an example, in order to obtain the edit distance from the prediction sequence to the real tag "avatar", the following operations are required: the deleted character "l", the replaced character "i" is "r", and the inserted character "t". Accordingly, in the statistical confusion matrix process, a delete operation indicates that the null symbol label "#" in the real sequence is mispredicted to another character "l", and an insert operation indicates that the corresponding label "t" in the real sequence is predicted to be the null symbol "#".
And 3, utilizing the confusion matrix to perform self-adaptive change on the label smoothness according to the context relationship, and introducing a penalty item to realize self-adaptive calibration of the sequence confidence coefficient.
In step 3, for the CTC decoder, the optimization goal is the maximum likelihood of the sequence probability, which can be defined by the following equation:
Figure 828290DEST_PATH_IMAGE052
wherein,
Figure 727589DEST_PATH_IMAGE053
for a given input
Figure 53528DEST_PATH_IMAGE054
Output is
Figure 216656DEST_PATH_IMAGE055
The probability of (a) of (b) being,
Figure 992851DEST_PATH_IMAGE056
in order to predict the total step size for decoding,
Figure 241430DEST_PATH_IMAGE057
is shown in the decoding path
Figure 980847DEST_PATH_IMAGE058
To middle
Figure 529640DEST_PATH_IMAGE059
The confidence level of the character corresponding to the time instant,
Figure 820944DEST_PATH_IMAGE060
representing the mapping rules. The probability of this decoded path is directly considered as the confidence of the predicted sequence.
The tag smoothing strategy, which is typically used on cross-entropy loss, is then generalized to CTC losses. Label smoothness loss is deduced. The label smoothed probability distribution can be expressed as:
Figure 681452DEST_PATH_IMAGE061
wherein,
Figure 349194DEST_PATH_IMAGE062
in order to even out the probability of label smoothing,
Figure 627860DEST_PATH_IMAGE063
in order to be a smoothing factor, the method,
Figure 90065DEST_PATH_IMAGE064
the label accords with one-hot probability distribution, if the prediction is correct, the label is 1, otherwise, the label is 0, the label is equivalent to a Dirac function, and
Figure 437870DEST_PATH_IMAGE065
uniform distribution of label probability over all classes of labels, value
Figure 909302DEST_PATH_IMAGE066
Is as follows. Substituting the above equation into a general text sequence recognition loss function can obtain:
Figure 901529DEST_PATH_IMAGE067
wherein,
Figure 675581DEST_PATH_IMAGE068
is composed of
Figure 651628DEST_PATH_IMAGE059
The time of day is predicted as
Figure 520226DEST_PATH_IMAGE003
The probability of the category(s) is,
Figure 632539DEST_PATH_IMAGE056
the total step size is decoded for prediction.
According to
Figure 702126DEST_PATH_IMAGE069
Divergence definition:
Figure 772326DEST_PATH_IMAGE070
the loss function can be derived to be decoupled as a sum of standard Negative Log Likelihood (NLL) losses
Figure 319982DEST_PATH_IMAGE071
Divergence term
Figure 411435DEST_PATH_IMAGE072
Two items are as follows:
Figure 917502DEST_PATH_IMAGE073
wherein,
Figure 212349DEST_PATH_IMAGE074
represents a constant term with no effect in gradient back propagation and negligible effect.
Due to the function
Figure 563695DEST_PATH_IMAGE075
The overall loss is expected to approach zero for
Figure 509655DEST_PATH_IMAGE069
The divergence penalty term can be approximately understood to mean that the smaller the expected prediction probability distribution and the uniform distribution distance is, the more the prediction probability is prevented from changing towards the direction of over confidence. Thus, although the CTC decoder-based text sequence recognition model does not one-hot encode true tags, but its core optimization goal is still sequence confidence maximization, then the smooth CTC loss in combination with standard tags can be defined as:
Figure 186624DEST_PATH_IMAGE076
wherein,
Figure 93400DEST_PATH_IMAGE077
is lost as CTCs. Smoothing factor
Figure 858225DEST_PATH_IMAGE063
As a weight of the penalty term, the strength of the calibration is controlled. In-service setting
Figure 799636DEST_PATH_IMAGE078
Further introducing sequence context in tag smoothing. Firstly, screening an error-prone set, and only carrying out label smoothing on error-prone categories. To pair
Figure 772140DEST_PATH_IMAGE001
Corresponding to different prediction categories, the characters at the last time can be obtained through statistics according to the frequency of prediction appearing in each category in the corresponding confusion matrix
Figure 166212DEST_PATH_IMAGE008
Belong to the first
Figure 124941DEST_PATH_IMAGE003
Error-prone set of time classes
Figure 530646DEST_PATH_IMAGE027
. The division is based on the following:
Figure 283838DEST_PATH_IMAGE079
wherein,
Figure 555419DEST_PATH_IMAGE080
representing the character of the last moment
Figure 52260DEST_PATH_IMAGE008
Belong to the first
Figure 575120DEST_PATH_IMAGE003
Class time, the accuracy of the prediction at the current moment. If the class error rate is larger than the set threshold
Figure 499214DEST_PATH_IMAGE081
Then, the corresponding category is classified into the error-prone set, so as to be clear
Figure 867878DEST_PATH_IMAGE008
After the label belongs to the category, the corresponding error-prone set can be obtained
Figure 293044DEST_PATH_IMAGE034
. In-service setting
Figure 797974DEST_PATH_IMAGE082
And further carrying out self-adaptive adjustment on the smoothing intensity according to the context relation according to the confusion matrix obtained in the step 2. And adjusting the label probability to obtain a selective Context Awareness (CASSLS) probability distribution formula:
Figure 768335DEST_PATH_IMAGE083
for the last character
Figure 624296DEST_PATH_IMAGE008
When the label type is known, the current character needs to be confirmed first
Figure 587573DEST_PATH_IMAGE019
Whether the label belongs to an error-prone set or not, if not, label smooth prediction is not required; otherwise, the sliding strength is adaptively adjusted according to the error rate.
Bringing this probability distribution into the tag-smoothed CTC loss, we can derive the CTC decoder-based selective context-aware loss:
Figure 212589DEST_PATH_IMAGE084
taking into account the calculation
Figure 478485DEST_PATH_IMAGE069
In divergence, the output probability is the probability of the predicted path, there is a case where the length thereof is misaligned with the length of the real label, and for the predicted probability
Figure 697108DEST_PATH_IMAGE043
Only the position of the sequence after the predicted path mapping is reserved. Then, according to the editing distance alignment strategy in the step 2, for deletion operation, adding a one-hot code of a space at a blank position of a corresponding target sequence; for the insertion operation, adding an equipartition distribution probability vector at a blank position corresponding to the prediction sequence; and the replacement operation does not bring any changes to the probability distribution.
And 4, retraining the target model after adjusting the loss function, and finally outputting the prediction sequence and the calibration confidence coefficient thereof.
In step 4, the original loss function is adjusted according to the context perception selective label smoothing strategy in step 3, and the target over-confidence model is retrained, so that the model can be calibrated. Because the fine tuning model is adopted for training, the learning rate is set as
Figure 339442DEST_PATH_IMAGE085
And after 200000 times of iterative training, finally outputting the prediction text and the confidence coefficient after calibration.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (6)

1.基于CTC解码器的文本序列识别模型校准方法,其特征在于,包括以下步骤:1. based on the text sequence recognition model calibration method of CTC decoder, is characterized in that, comprises the following steps: 步骤1,将文本图像支撑集输入至待校准训练模型中,获得文本序列识别结果;Step 1, input the text image support set into the training model to be calibrated, and obtain the text sequence recognition result; 步骤2,利用文本图像支撑集的文本序列识别结果计算上下文混淆矩阵,上下文混淆矩阵用于表征序列中相邻时刻预测字符之间的上下文分布关系;Step 2, using the text sequence recognition result of the text image support set to calculate a context confusion matrix, and the context confusion matrix is used to characterize the context distribution relationship between the predicted characters at adjacent moments in the sequence; 步骤3,根据上下文混淆矩阵,利用上下文相关预测分布对标签平滑中平滑强度有选择性地进行自适应的变化,以实现序列置信度的自适应校准;Step 3, according to the context confusion matrix, use the context-related prediction distribution to selectively adaptively change the smoothing strength in the label smoothing, so as to realize the self-adaptive calibration of the sequence confidence; 步骤4,基于上下文选择性损失函数重新训练待校准训练模型,最后输出预测文本序列及校准的置信度。Step 4: Retrain the training model to be calibrated based on the context selective loss function, and finally output the predicted text sequence and the confidence level of the calibration. 2.根据权利要求1所述的基于CTC解码器的文本序列识别模型校准方法,其特征在于,所述的计算上下文混淆矩阵的过程,包括以下步骤:2. the text sequence recognition model calibration method based on CTC decoder according to claim 1, is characterized in that, the process of described calculation context confusion matrix, comprises the following steps: 初始化设置共
Figure 500499DEST_PATH_IMAGE001
个预测类别的元素为0的上下文混淆矩阵
Figure 812663DEST_PATH_IMAGE002
Figure 377636DEST_PATH_IMAGE003
为对应预测类别索引;
Initial settings total
Figure 500499DEST_PATH_IMAGE001
A contextual confusion matrix with 0 elements for each predicted class
Figure 812663DEST_PATH_IMAGE002
,
Figure 377636DEST_PATH_IMAGE003
is the index of the corresponding prediction category;
比对文本图像支撑集的文本序列识别结果
Figure 682716DEST_PATH_IMAGE004
和对应真实标签
Figure 16745DEST_PATH_IMAGE005
Figure 765389DEST_PATH_IMAGE006
为文本序列识别结果长度,
Figure 817659DEST_PATH_IMAGE007
为真实标签序列长度;
Text Sequence Recognition Results by Aligning Text-Image Support Sets
Figure 682716DEST_PATH_IMAGE004
and corresponding ground truth labels
Figure 16745DEST_PATH_IMAGE005
,
Figure 765389DEST_PATH_IMAGE006
is the length of the recognition result for the text sequence,
Figure 817659DEST_PATH_IMAGE007
is the real label sequence length;
若识别结果和真实标签对齐,则在已知上一时刻字符
Figure 660850DEST_PATH_IMAGE008
标签所属类别索引为
Figure 380544DEST_PATH_IMAGE003
情况下,直接统计当前
Figure 159145DEST_PATH_IMAGE009
时刻真实字符
Figure 571147DEST_PATH_IMAGE010
被预测为字符
Figure 93395DEST_PATH_IMAGE011
的上下文混淆矩阵
Figure 792230DEST_PATH_IMAGE002
,其中,上下文混淆矩阵中每个元素
Figure 741731DEST_PATH_IMAGE012
表示已知上一时刻真实字符属于第
Figure 378380DEST_PATH_IMAGE003
类时,真实标签属于第
Figure 704319DEST_PATH_IMAGE013
类的当前时刻字符被预测为第
Figure 992081DEST_PATH_IMAGE014
类标签的次数,对于位于文本首位的字符,其上一时刻字符所属类别默认设置为空格;
If the recognition result is aligned with the real label, the character at the previous moment is known
Figure 660850DEST_PATH_IMAGE008
The category index to which the label belongs is
Figure 380544DEST_PATH_IMAGE003
In this case, directly count the current
Figure 159145DEST_PATH_IMAGE009
moment real character
Figure 571147DEST_PATH_IMAGE010
is predicted as a character
Figure 93395DEST_PATH_IMAGE011
The contextual confusion matrix of
Figure 792230DEST_PATH_IMAGE002
, where each element in the contextual confusion matrix
Figure 741731DEST_PATH_IMAGE012
Indicates that it is known that the real character at the last moment belongs to the first
Figure 378380DEST_PATH_IMAGE003
class, the true label belongs to the first
Figure 704319DEST_PATH_IMAGE013
The current moment character of the class is predicted to be the first
Figure 992081DEST_PATH_IMAGE014
The number of class labels. For the character at the first position of the text, the default category of the character at the previous moment is set to a space;
若识别结果和真实标签未对齐,则先通过编辑距离计算预测序列到真实标签的操作序列,获得序列之间的对齐关系,然后再统计获得上下文混淆矩阵。If the recognition result and the real label are not aligned, first calculate the operation sequence from the predicted sequence to the real label through the edit distance to obtain the alignment relationship between the sequences, and then statistically obtain the context confusion matrix.
3.根据权利要求2所述的基于CTC解码器的文本序列识别模型校准方法,其特征在于,所述的获得序列之间的对齐关系的过程需要进行若干次下列操作:删除一个字符操作、插入一个字符操作或替换一个字符操作,直到字符正确预测且对齐,其中,删除一个字符操作为纠正真实标签序列中的空符号被错误预测成其他字符,插入一个字符操作为纠正真实标签序列中对应字符被预测成空符号,替换一个字符操作为纠正真实标签序列中对应字符被预测成其他字符。3. the text sequence recognition model calibration method based on CTC decoder according to claim 2, is characterized in that, the process of described obtaining the alignment relation between sequences needs to carry out several following operations: delete a character operation, insert One character operation or one character operation is replaced until the characters are correctly predicted and aligned. Among them, the operation of deleting a character is to correct the null symbol in the real label sequence that is mispredicted to other characters, and the operation of inserting a character is to correct the corresponding character in the real label sequence. It is predicted to be an empty symbol, and the replacement of a character operation is to correct that the corresponding character in the real label sequence is predicted to be another character. 4.根据权利要求2所述的基于CTC解码器的文本序列识别模型校准方法,其特征在于,步骤3中所述的利用上下文相关预测分布对标签平滑中平滑强度有选择性地进行自适应的变化,是指将平滑强度根据上下文关系进行自适应调整,对标签概率进行调整,得到选择性上下文感知概率分布公式为:4. the text sequence recognition model calibration method based on CTC decoder according to claim 2, is characterized in that, utilizes the context-dependent prediction distribution described in step 3 to selectively carry out self-adaptation to smoothing strength in label smoothing. Change means that the smoothing strength is adaptively adjusted according to the context relationship, and the label probability is adjusted to obtain the selective context-aware probability distribution formula:
Figure 378063DEST_PATH_IMAGE015
Figure 378063DEST_PATH_IMAGE015
其中,
Figure 892221DEST_PATH_IMAGE016
表示上一时刻字符
Figure 631638DEST_PATH_IMAGE017
所属类别已知时所对应易错集,
Figure 649272DEST_PATH_IMAGE018
表示对于确定上一时刻字符
Figure 596369DEST_PATH_IMAGE017
标签所属类别情况下,当前字符
Figure 332244DEST_PATH_IMAGE019
被预测为
Figure 875351DEST_PATH_IMAGE020
的次数,
Figure 747492DEST_PATH_IMAGE021
表示类别索引,
Figure 599911DEST_PATH_IMAGE022
表示对于确定上一时刻字符
Figure 88661DEST_PATH_IMAGE017
标签所属类别情况下,当前字符
Figure 294514DEST_PATH_IMAGE019
被预测为
Figure 159178DEST_PATH_IMAGE021
的次数,
Figure 323443DEST_PATH_IMAGE023
表示平滑强度,对于上一字符
Figure 892964DEST_PATH_IMAGE017
标签类别已知情况下,要先确认当前字符
Figure 902509DEST_PATH_IMAGE019
是否属于易错集,如果不是,则不需要进行标签平滑校准;反之,则根据错误率自适应调整平滑强度。
in,
Figure 892221DEST_PATH_IMAGE016
Indicates the character at the previous moment
Figure 631638DEST_PATH_IMAGE017
The corresponding error-prone set when the class to which it belongs is known,
Figure 649272DEST_PATH_IMAGE018
Indicates that the character for determining the last moment
Figure 596369DEST_PATH_IMAGE017
In the case of the category to which the label belongs, the current character
Figure 332244DEST_PATH_IMAGE019
is predicted to be
Figure 875351DEST_PATH_IMAGE020
number of times,
Figure 747492DEST_PATH_IMAGE021
represents the category index,
Figure 599911DEST_PATH_IMAGE022
Indicates that the character for determining the last moment
Figure 88661DEST_PATH_IMAGE017
In the case of the category to which the label belongs, the current character
Figure 294514DEST_PATH_IMAGE019
is predicted to be
Figure 159178DEST_PATH_IMAGE021
number of times,
Figure 323443DEST_PATH_IMAGE023
Indicates the smoothing strength, for the previous character
Figure 892964DEST_PATH_IMAGE017
When the label category is known, first confirm the current character
Figure 902509DEST_PATH_IMAGE019
Whether it belongs to the error-prone set, if not, label smoothing calibration is not required; otherwise, the smoothing strength is adaptively adjusted according to the error rate.
5.根据权利要求4所述的基于CTC解码器的文本序列识别模型校准方法,其特征在于,所述的易错集的获得,是对
Figure 890187DEST_PATH_IMAGE024
个对应不同预测类别,根据对应上下文混淆矩阵中预测出现在各个类别中的频次,统计得到上一时刻字符
Figure 959775DEST_PATH_IMAGE025
属于第
Figure 157538DEST_PATH_IMAGE026
类类别时的易错集
Figure 829828DEST_PATH_IMAGE027
,划分依据如下:
5. the text sequence recognition model calibration method based on CTC decoder according to claim 4, is characterized in that, the acquisition of described error-prone set is to
Figure 890187DEST_PATH_IMAGE024
Corresponding to different prediction categories, according to the frequency of the predicted occurrences in each category in the corresponding context confusion matrix, the characters at the previous moment are obtained by statistics.
Figure 959775DEST_PATH_IMAGE025
belongs to the
Figure 157538DEST_PATH_IMAGE026
error-prone set
Figure 829828DEST_PATH_IMAGE027
, divided according to the following:
Figure 62226DEST_PATH_IMAGE028
Figure 62226DEST_PATH_IMAGE028
其中,
Figure 912501DEST_PATH_IMAGE029
表示上一时刻字符
Figure 597560DEST_PATH_IMAGE030
属于第
Figure 214487DEST_PATH_IMAGE026
类时,当前时刻预测的准确率,若类别错误率大于设定阈值
Figure 160446DEST_PATH_IMAGE031
,则将对应类别
Figure 571836DEST_PATH_IMAGE032
划入至易错集当中,明确
Figure 353978DEST_PATH_IMAGE030
标签所属类别后,即可获得对应易错集
Figure 509016DEST_PATH_IMAGE033
in,
Figure 912501DEST_PATH_IMAGE029
Indicates the character at the previous moment
Figure 597560DEST_PATH_IMAGE030
belongs to the
Figure 214487DEST_PATH_IMAGE026
Class time, the prediction accuracy at the current moment, if the class error rate is greater than the set threshold
Figure 160446DEST_PATH_IMAGE031
, the corresponding category will be
Figure 571836DEST_PATH_IMAGE032
Into the error-prone set, clear
Figure 353978DEST_PATH_IMAGE030
After the label belongs to the category, the corresponding error-prone set can be obtained.
Figure 509016DEST_PATH_IMAGE033
.
6.根据权利要求4所述的基于CTC解码器的文本序列识别模型校准方法,其特征在于,步骤4中所述的上下文选择性损失函数为:6. the text sequence recognition model calibration method based on CTC decoder according to claim 4 is characterized in that, the context selective loss function described in step 4 is:
Figure 575061DEST_PATH_IMAGE034
Figure 575061DEST_PATH_IMAGE034
其中,
Figure 891773DEST_PATH_IMAGE035
表示
Figure 551424DEST_PATH_IMAGE036
时刻时,对应标签类别索引
Figure 117010DEST_PATH_IMAGE037
的选择性上下文感知概率分布,
Figure 912928DEST_PATH_IMAGE038
表示在
Figure 56333DEST_PATH_IMAGE036
时刻对应预测类别标签
Figure 937702DEST_PATH_IMAGE037
的概率,
Figure 44329DEST_PATH_IMAGE039
表示CTC损失,
Figure 694753DEST_PATH_IMAGE040
表示
Figure 884426DEST_PATH_IMAGE041
散度,
Figure 377724DEST_PATH_IMAGE042
表示
Figure 412676DEST_PATH_IMAGE036
时刻对应所有
Figure 58552DEST_PATH_IMAGE043
类标签类别的概率向量;
in,
Figure 891773DEST_PATH_IMAGE035
express
Figure 551424DEST_PATH_IMAGE036
At the moment, the corresponding label category index
Figure 117010DEST_PATH_IMAGE037
The selective context-aware probability distribution of ,
Figure 912928DEST_PATH_IMAGE038
expressed in
Figure 56333DEST_PATH_IMAGE036
The time corresponds to the predicted category label
Figure 937702DEST_PATH_IMAGE037
The probability,
Figure 44329DEST_PATH_IMAGE039
represents the CTC loss,
Figure 694753DEST_PATH_IMAGE040
express
Figure 884426DEST_PATH_IMAGE041
Divergence,
Figure 377724DEST_PATH_IMAGE042
express
Figure 412676DEST_PATH_IMAGE036
time corresponds to all
Figure 58552DEST_PATH_IMAGE043
class label class probability vector;
所述的
Figure 153547DEST_PATH_IMAGE044
散度定义公式为:
said
Figure 153547DEST_PATH_IMAGE044
The divergence definition formula is:
Figure 134142DEST_PATH_IMAGE045
Figure 134142DEST_PATH_IMAGE045
Figure 707206DEST_PATH_IMAGE046
表示对应预测类别标签
Figure 207588DEST_PATH_IMAGE037
的概率,
Figure 473484DEST_PATH_IMAGE047
表示对应真实类别标签
Figure 816741DEST_PATH_IMAGE048
的概率。
Figure 707206DEST_PATH_IMAGE046
Represents the corresponding predicted class label
Figure 207588DEST_PATH_IMAGE037
The probability,
Figure 473484DEST_PATH_IMAGE047
Represents the corresponding real class label
Figure 816741DEST_PATH_IMAGE048
The probability.
CN202210402975.1A 2022-04-18 2022-04-18 Text sequence recognition model calibration method based on CTC decoder Active CN114495114B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210402975.1A CN114495114B (en) 2022-04-18 2022-04-18 Text sequence recognition model calibration method based on CTC decoder

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210402975.1A CN114495114B (en) 2022-04-18 2022-04-18 Text sequence recognition model calibration method based on CTC decoder

Publications (2)

Publication Number Publication Date
CN114495114A true CN114495114A (en) 2022-05-13
CN114495114B CN114495114B (en) 2022-08-05

Family

ID=81489555

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210402975.1A Active CN114495114B (en) 2022-04-18 2022-04-18 Text sequence recognition model calibration method based on CTC decoder

Country Status (1)

Country Link
CN (1) CN114495114B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115482544A (en) * 2022-09-19 2022-12-16 深圳思谋信息科技有限公司 Adaptive fitting model training method and device, computer equipment and storage medium
CN117151111A (en) * 2023-08-15 2023-12-01 华南理工大学 Reliability regularization method for text recognition models based on perceptual and semantic correlation
CN120997846A (en) * 2025-08-07 2025-11-21 北京诺君安信息技术股份有限公司 A method and system for improving the accuracy of OCR recognition models in recognizing specific characters.

Citations (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109934293A (en) * 2019-03-15 2019-06-25 苏州大学 Image recognition method, device, medium and confusion-aware convolutional neural network
US10366362B1 (en) * 2012-10-18 2019-07-30 Featuremetrics, LLC Feature based modeling for forecasting and optimization
US10388272B1 (en) * 2018-12-04 2019-08-20 Sorenson Ip Holdings, Llc Training speech recognition systems using word sequences
CN110399845A (en) * 2019-07-29 2019-11-01 上海海事大学 A method for detecting and recognizing text in continuous segments in images
CN110634491A (en) * 2019-10-23 2019-12-31 大连东软信息学院 System and method for tandem feature extraction for general speech tasks in speech signals
US20200027444A1 (en) * 2018-07-20 2020-01-23 Google Llc Speech recognition with sequence-to-sequence models
US10573312B1 (en) * 2018-12-04 2020-02-25 Sorenson Ip Holdings, Llc Transcription generation from multiple speech recognition systems
US20200357388A1 (en) * 2019-05-10 2020-11-12 Google Llc Using Context Information With End-to-End Models for Speech Recognition
CN112068555A (en) * 2020-08-27 2020-12-11 江南大学 A voice-controlled mobile robot based on semantic SLAM method
US20200402500A1 (en) * 2019-09-06 2020-12-24 Beijing Dajia Internet Information Technology Co., Ltd. Method and device for generating speech recognition model and storage medium
CN112712804A (en) * 2020-12-23 2021-04-27 哈尔滨工业大学(威海) Speech recognition method, system, medium, computer device, terminal and application
WO2021081562A2 (en) * 2021-01-20 2021-04-29 Innopeak Technology, Inc. Multi-head text recognition model for multi-lingual optical character recognition
US20210150200A1 (en) * 2019-11-19 2021-05-20 Samsung Electronics Co., Ltd. Electronic device for converting handwriting input to text and method of operating the same
CN112989834A (en) * 2021-04-15 2021-06-18 杭州一知智能科技有限公司 Named entity identification method and system based on flat grid enhanced linear converter
CN113160803A (en) * 2021-06-09 2021-07-23 中国科学技术大学 End-to-end voice recognition model based on multilevel identification and modeling method
CN113283336A (en) * 2021-05-21 2021-08-20 湖南大学 Text recognition method and system
CN113516968A (en) * 2021-06-07 2021-10-19 北京邮电大学 An end-to-end long-term speech recognition method
CN113609859A (en) * 2021-08-04 2021-11-05 浙江工业大学 Special equipment Chinese named entity recognition method based on pre-training model
EP3910534A1 (en) * 2020-05-15 2021-11-17 MyScript Recognizing handwritten text by combining neural networks
CN113887480A (en) * 2021-10-19 2022-01-04 小语智能信息科技(云南)有限公司 Burma language image text recognition method and device based on multi-decoder joint learning
CN114023316A (en) * 2021-11-04 2022-02-08 匀熵科技(无锡)有限公司 TCN-Transformer-CTC-based end-to-end Chinese voice recognition method
CN114155527A (en) * 2021-11-12 2022-03-08 虹软科技股份有限公司 A scene text recognition method and device

Patent Citations (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10366362B1 (en) * 2012-10-18 2019-07-30 Featuremetrics, LLC Feature based modeling for forecasting and optimization
US20200027444A1 (en) * 2018-07-20 2020-01-23 Google Llc Speech recognition with sequence-to-sequence models
US10388272B1 (en) * 2018-12-04 2019-08-20 Sorenson Ip Holdings, Llc Training speech recognition systems using word sequences
US10573312B1 (en) * 2018-12-04 2020-02-25 Sorenson Ip Holdings, Llc Transcription generation from multiple speech recognition systems
CN109934293A (en) * 2019-03-15 2019-06-25 苏州大学 Image recognition method, device, medium and confusion-aware convolutional neural network
US20200357388A1 (en) * 2019-05-10 2020-11-12 Google Llc Using Context Information With End-to-End Models for Speech Recognition
CN110399845A (en) * 2019-07-29 2019-11-01 上海海事大学 A method for detecting and recognizing text in continuous segments in images
US20200402500A1 (en) * 2019-09-06 2020-12-24 Beijing Dajia Internet Information Technology Co., Ltd. Method and device for generating speech recognition model and storage medium
CN110634491A (en) * 2019-10-23 2019-12-31 大连东软信息学院 System and method for tandem feature extraction for general speech tasks in speech signals
US20210150200A1 (en) * 2019-11-19 2021-05-20 Samsung Electronics Co., Ltd. Electronic device for converting handwriting input to text and method of operating the same
EP3910534A1 (en) * 2020-05-15 2021-11-17 MyScript Recognizing handwritten text by combining neural networks
CN112068555A (en) * 2020-08-27 2020-12-11 江南大学 A voice-controlled mobile robot based on semantic SLAM method
CN112712804A (en) * 2020-12-23 2021-04-27 哈尔滨工业大学(威海) Speech recognition method, system, medium, computer device, terminal and application
WO2021081562A2 (en) * 2021-01-20 2021-04-29 Innopeak Technology, Inc. Multi-head text recognition model for multi-lingual optical character recognition
CN112989834A (en) * 2021-04-15 2021-06-18 杭州一知智能科技有限公司 Named entity identification method and system based on flat grid enhanced linear converter
CN113283336A (en) * 2021-05-21 2021-08-20 湖南大学 Text recognition method and system
CN113516968A (en) * 2021-06-07 2021-10-19 北京邮电大学 An end-to-end long-term speech recognition method
CN113160803A (en) * 2021-06-09 2021-07-23 中国科学技术大学 End-to-end voice recognition model based on multilevel identification and modeling method
CN113609859A (en) * 2021-08-04 2021-11-05 浙江工业大学 Special equipment Chinese named entity recognition method based on pre-training model
CN113887480A (en) * 2021-10-19 2022-01-04 小语智能信息科技(云南)有限公司 Burma language image text recognition method and device based on multi-decoder joint learning
CN114023316A (en) * 2021-11-04 2022-02-08 匀熵科技(无锡)有限公司 TCN-Transformer-CTC-based end-to-end Chinese voice recognition method
CN114155527A (en) * 2021-11-12 2022-03-08 虹软科技股份有限公司 A scene text recognition method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SHUANGPING HUANG ET AL: "Context-Aware Selective Label Smoothing for Calibrating Sequence Recognition Model", 《ACM》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115482544A (en) * 2022-09-19 2022-12-16 深圳思谋信息科技有限公司 Adaptive fitting model training method and device, computer equipment and storage medium
CN115482544B (en) * 2022-09-19 2026-01-09 深圳思谋信息科技有限公司 Adaptive fitting model training methods, devices, computer equipment, and storage media
CN117151111A (en) * 2023-08-15 2023-12-01 华南理工大学 Reliability regularization method for text recognition models based on perceptual and semantic correlation
CN120997846A (en) * 2025-08-07 2025-11-21 北京诺君安信息技术股份有限公司 A method and system for improving the accuracy of OCR recognition models in recognizing specific characters.

Also Published As

Publication number Publication date
CN114495114B (en) 2022-08-05

Similar Documents

Publication Publication Date Title
CN113159048B (en) A Weakly Supervised Semantic Segmentation Method Based on Deep Learning
CN117237733B (en) Breast cancer full-slice image classification method combining self-supervision and weak supervision learning
US12217139B2 (en) Transforming a trained artificial intelligence model into a trustworthy artificial intelligence model
US6397200B1 (en) Data reduction system for improving classifier performance
CN114495114A (en) Text sequence identification model calibration method based on CTC decoder
CN117611932B (en) Image classification method and system based on double pseudo tag refinement and sample re-weighting
CN108399428A (en) A kind of triple loss function design method based on mark than criterion
CN110490239B (en) Training methods, quality classification methods, devices and equipment for image quality control networks
CN117152606A (en) A cross-domain small sample classification method for remote sensing images based on confidence dynamic learning
CN112560948A (en) Eye fundus map classification method and imaging method under data deviation
US20240020531A1 (en) System and Method for Transforming a Trained Artificial Intelligence Model Into a Trustworthy Artificial Intelligence Model
CN118674927A (en) Uncertainty perception semi-supervised pancreas segmentation method based on evidence learning
CN112270334A (en) Few-sample image classification method and system based on abnormal point exposure
CN113392890B (en) A method for detecting out-of-distribution abnormal samples based on data enhancement
CN114419379A (en) System and method for improving fairness of deep learning model based on antagonistic disturbance
CN115393867B (en) Text recognition model generation method, text recognition method, device and storage medium
CN117033961A (en) A context-aware multi-modal image and text classification method
CN119780685B (en) Automatic test vector generation method and system based on decision tree
CN115578568A (en) Noise correction algorithm driven by small-scale reliable data set
CN116521863A (en) Tag anti-noise text classification method based on semi-supervised learning
CN119541552A (en) A complex emotion representation and perception method based on emotion vector
CN115063374B (en) Model training, face image quality scoring method, electronic equipment and storage medium
Kamassury et al. Cct: A cyclic co-teaching approach to train deep neural networks with noisy labels
CN117095232A (en) Medical image classification method, device, storage medium and terminal based on deep learning
CN117235518A (en) A communication signal data set pruning method and device based on supporting data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
OL01 Intention to license declared
OL01 Intention to license declared