CN113920291A

CN113920291A - Error correction method and device based on picture recognition result, electronic equipment and medium

Info

Publication number: CN113920291A
Application number: CN202111148610.2A
Authority: CN
Inventors: 卢宁; 姚一鸣; 陈波; 徐亮
Original assignee: OneConnect Financial Technology Co Ltd Shanghai
Current assignee: OneConnect Smart Technology Co Ltd; OneConnect Financial Technology Co Ltd Shanghai
Priority date: 2021-09-29
Filing date: 2021-09-29
Publication date: 2022-01-11

Abstract

The invention is suitable for the technical field of artificial intelligence, and provides an error correction method, an error correction device, electronic equipment and a medium based on a picture identification result, wherein the method comprises the steps of marking a sample picture to obtain a first marking result; performing image transformation processing on the sample picture to obtain a first processing result, identifying text information of the first processing result, and obtaining a first identification result; forming a first sample data set according to the sample picture and the first labeling result, training an initial recognition model by adopting the first sample data set, and acquiring a first model; forming a second sample data set according to the first labeling result and the first recognition result, training an initial error correction model by adopting the second sample data set, and acquiring a second model; sequentially inputting the picture to be recognized into the first model and the second model to obtain an error correction result; the problem of inaccurate picture recognition result in the current artificial intelligence technique is solved.

Description

Error correction method and device based on picture recognition result, electronic equipment and medium

In the technical field of

The invention relates to the technical field of artificial intelligence, in particular to an error correction method and device based on a picture recognition result, electronic equipment and a medium.

Background

With the explosion of optical character recognition technology (OCR), the work of information entry systems is gradually replaced by deep neural network models. In the financial field, various information input links exist, such as bank card information input, information input of bills such as remittance bills and the like. OCR technology can relieve workers from tedious and repetitive labor.

However, the text recognition effect of OCR is affected by the picture quality, and erroneous recognition may occur, which affects the accuracy of information entry, and therefore, it is necessary to perform error correction processing on the result of text recognition. The existing text error correction technology generally adopts an error correction method combining a language model and font similarity. The language model usually needs fine tuning on an error correction corpus according to the recognition result that the text context is not reasonable in error correction, however, building the error correction corpus under a specific scene is a tedious work. Calculating the font similarity of a candidate result output by the language model and a real label based on a font similarity method, and selecting the character with the highest font similarity as an output result; all characters in the corpus need to be subjected to complex font coding, and the situation of unknown words is easy to occur. In addition, the speed of the text recognition and correction system is affected by the high computational complexity of the combination of the language model and the font similarity.

Disclosure of Invention

The invention provides an error correction method, an error correction device, electronic equipment and a medium based on a picture identification result, and aims to solve the problem that the picture identification result is inaccurate in the conventional artificial intelligence technology.

The invention provides an error correction method based on a picture identification result, which comprises the following steps:

acquiring a sample picture of a target field, labeling text information of the sample picture of the target field, and acquiring a first labeling result;

performing image transformation processing on the sample picture in the target field to obtain a first processing result, identifying text information of the first processing result, and obtaining a first identification result;

constructing an initial recognition model, forming a first sample data set according to the sample picture of the target field and the first labeling result, training the initial recognition model by adopting the first sample data set, and acquiring a first model for text recognition;

constructing an initial error correction model, forming a second sample data set according to the first labeling result and the first recognition result, training the initial error correction model by adopting the second sample data set, and acquiring a second model for text error correction;

and acquiring a picture to be recognized, and sequentially inputting the picture to be recognized into the first model and the second model to acquire a target error correction result.

Optionally, the forming a second sample data set according to the first labeling result and the first recognition result specifically includes:

acquiring a new word of the target field, and judging whether the first labeling result comprises the new word;

if not, acquiring a picture containing the new words, labeling the text information of the picture containing the new words, and acquiring a second labeling result;

performing image transformation processing on the picture containing the new words to obtain a second processing result, identifying text information of the second processing result, and obtaining a second identification result;

obtaining a first data set according to the first labeling result and the first identification result;

and obtaining a second data set according to the second labeling result and the second identification result, and forming the second sample data set according to the first data set and the second data set.

Optionally, the forming the second sample data set according to the first data set and the second data set specifically includes:

acquiring marked sample pictures, and carrying out clustering processing on the marked sample pictures to acquire a plurality of classification data sets;

merging the first data set and the second data set to obtain a third data set, and acquiring the domain similarity of the classified data set and the third data set;

if the domain similarity is greater than a similarity threshold, the classification data set is a target data set;

acquiring a target picture according to the target data set, performing image transformation processing on the target picture to acquire a third processing result, identifying the third processing result, and acquiring a third identification result;

and acquiring text information of the target picture to obtain a third labeling result, acquiring a migration data set according to the third labeling result and the third identification result, and merging the migration data set and the third data set to obtain the second sample data set.

Optionally, the obtaining the domain similarity between the classification dataset and the third dataset specifically includes:

obtaining synonymous parameters of the classified data set and the third data set, and obtaining synonymous evaluation parameters according to the synonymous parameters and preset synonymous weight;

acquiring antisense parameters of the classification data and the third data set, and acquiring antisense evaluation parameters according to the antisense parameters and preset antisense weight;

obtaining distance parameters of the classification data and the third data set, and obtaining distance evaluation parameters according to the distance parameters and preset distance weights;

and obtaining the domain similarity of the classification data set and the third data set according to the synonymous evaluation parameter, the synonymous evaluation parameter and the distance evaluation parameter.

Optionally, the forming a first sample data set according to the sample picture of the target field and the first labeling result specifically includes:

acquiring a fourth data set according to the sample picture of the target field and the first labeling result;

and acquiring a fifth data set according to the third labeling result and the target picture, and merging the fourth data set and the fifth data set to obtain the first sample data set.

Optionally, the training the initial recognition model by using the first sample data set to obtain the first model for text recognition specifically includes:

dividing the first sample dataset into a training dataset and a testing dataset;

training the initial recognition model by adopting the training data set to obtain a first model for text recognition, wherein the initial recognition model is a model combining a cyclic neural network and a convolutional neural network;

inputting the test data set into the first model to obtain a test result;

and acquiring errors of the first labeling result and the test result according to a cross entropy loss function to obtain a first error, and updating the first model by adopting the first error back propagation.

Optionally, the training the initial error correction model by using the second sample data set to obtain a second model for text error correction includes:

training the initial error correction model by adopting the second sample data set, and acquiring a prediction result after forward propagation, wherein the initial error correction model is a language model;

and acquiring errors of the prediction result and the labeling result in the second sample data set to obtain a second error, updating the trained initial error correction model by adopting the second error, and acquiring a second model for text error correction.

The invention also provides an error correction device based on the picture identification result, which comprises:

the labeling module is used for obtaining a sample picture of a target field, labeling text information of the sample picture of the target field and obtaining a first labeling result;

the identification module is used for carrying out image transformation processing on the sample picture in the target field to obtain a first processing result, identifying text information of the first processing result and obtaining a first identification result;

the first model establishing module is used for establishing an initial recognition model, forming a first sample data set according to the sample picture of the target field and the first labeling result, training the initial recognition model by adopting the second sample data set, and acquiring a first model for text recognition;

the second model establishing module is used for establishing an initial error correction model, forming a second sample data set according to the first labeling result and the first recognition result, training the initial error correction model by adopting the second sample data set, and acquiring a second model for text error correction;

and the error correction result acquisition module is used for acquiring a picture to be recognized, sequentially inputting the picture to be recognized into the first model and the second model and acquiring a target error correction result.

The invention also provides a computer program which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, so that the electronic equipment executes the error correction method based on the picture recognition result.

The present invention also provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the error correction method based on the picture recognition result as described.

As described above, the present invention provides an error correction method, apparatus, electronic device, and medium based on an image recognition result, which have the following advantageous effects: obtaining a first labeling result by obtaining a sample picture of a target field and labeling the sample picture; and then, image transformation processing is carried out on the sample picture, the purpose of reducing the identification accuracy of the sample picture is achieved, so that the picture with low text information identification rate is obtained, the text information of the sample picture after image transformation processing is identified, more first identification results which do not accord with the first labeling result are obtained, and the data size of a second sample data set formed by the first identification results is larger. The source sample pictures of the first labeling result and the first identification result which is wrong are the same, so that repeated labeling of the wrong identification result is not needed, and by adopting the method, a large amount of sample data sets can be obtained, and meanwhile, a large amount of labeling work is reduced. The second sample data set is adopted to train the language model to obtain the second model, the language model error correction and the font similarity error correction are combined into a text error correction module based on the established second model, and a single font similarity module is not used, so that the steps of constructing a stroke dictionary and calculating an editing distance are omitted, the condition of unknown words is avoided, meanwhile, the error correction speed of the second model is improved, and the method is more suitable for practical scenes. In addition, the method also comprises the steps of judging whether the first labeling result contains the new words or not by obtaining the new words in the target field, if the first labeling result does not contain the new words, obtaining a picture containing the new words, forming a second sample data set according to the picture containing the new words, and obtaining a second model for text error correction according to the first sample data set, so that the problem that the second model cannot correct error recognition results of the new words is solved, and the error correction accuracy of the second model is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed for the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a schematic flow chart of an error correction method based on a picture recognition result according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a method for obtaining a second sample data set according to an embodiment of the present invention;

FIG. 3 is a schematic flowchart of a method for obtaining a second sample data set according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating a method for obtaining domain similarity according to an embodiment of the present invention;

FIG. 5 is a block diagram of an error correction apparatus based on a picture recognition result according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an electronic device in an embodiment of the present invention.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.

It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.

In order to explain the technical means of the present invention, the following description will be given by way of specific examples.

DETAILED DESCRIPTION OF EMBODIMENT (S) OF INVENTION

Fig. 1 is a flowchart illustrating an error correction method based on a picture recognition result according to an embodiment of the present invention.

As shown in fig. 1, the error correction method based on the picture recognition result includes steps S110 to S150:

s110, obtaining a sample picture of a target field, labeling text information of the sample picture of the target field, and obtaining a first labeling result;

s120, performing image transformation processing on the sample picture in the target field to obtain a first processing result, identifying text information of the first processing result, and obtaining a first identification result;

s130, constructing an initial recognition model, forming a first sample data set according to the sample picture of the target field and the first labeling result, training the initial recognition model by adopting the first sample data set, and acquiring a first model for text recognition;

s140, constructing an initial error correction model, forming a second sample data set according to the first labeling result and the first recognition result, training the initial error correction model by adopting the second sample data set, and acquiring a second model for text error correction;

s150, obtaining a picture to be recognized, sequentially inputting the picture to be recognized into the first model and the second model, and obtaining a target error correction result.

In step S110 of this embodiment, a target domain may be determined according to the home domain of the picture to be recognized, and the target domain includes, but is not limited to, a financial domain and a health domain. The sample picture of the target field is a sample picture containing text information, and the sample picture of the target field may be a labeled sample picture or an unlabeled sample picture. Specifically, after a sample picture in the target field is acquired, whether the sample picture in the target field is labeled or not is judged, and if the sample picture in the target field is an unlabeled sample picture, text information of the unlabeled sample picture is labeled to acquire a first labeling result. If the sample picture is the labeled sample picture, the labeled content includes the text information in the sample picture.

Specifically, if the data volume of the labeled sample pictures in the target field meets the preset data volume, a large number of labeled sample pictures can be obtained, so that the problem of large labeling workload is avoided; if the data volume of the labeled sample picture in the target field does not accord with the preset data volume, acquiring the labeled sample picture, determining the data volume of the unlabeled sample picture according to the preset data volume, and acquiring the corresponding unlabeled sample picture; by reasonably determining the data volumes of the marked sample pictures and the unmarked sample pictures, the data volumes of the sample pictures for training the first model and the second model can be met, and the purpose of reducing the marking workload can be achieved. The quantity and the quality of the sample pictures can meet the identification accuracy of the first model and the error correction accuracy of the second model, the problem that an overfitting state can occur in the training process of the first model and the second model due to the fact that the data volume of the first sample data set and the data volume of the second sample data set are too small is solved by obtaining the sample pictures with enough data volume, and the purposes of improving the identification accuracy of the first model and the error correction accuracy of the second model are achieved.

In step S120 of this embodiment, the text information of the image in the actual scene is affected by many factors, which make the text in the actual scene look blurry, shaded and unclear, so it is very difficult to identify the text information of the low-quality image in the actual scene. In order to obtain a low-quality picture which is more similar to an actual scene, the obtained sample picture is subjected to image transformation processing, and then the picture subjected to the image transformation processing is subjected to text information identification, so that the identification accuracy of the text information in the picture is reduced, and more identification results are obtained. Specifically, the image transformation process includes, but is not limited to, adjustment of picture colors, contrast transformation process, perspective transformation process, picture blurring process, and addition of noise.

Some of the image transformation processes only need to perform image transformation processes on text information, so before performing image transformation processes on sample pictures, the sample pictures are classified, and a public image classification model can be used for classifying the sample pictures and dividing the sample pictures into background images and text images, and the public image classification model includes but is not limited to: support vector machine classifier, softmax classifier. And the sample picture is divided into a background image and a text image, so that the subsequent image transformation can be smoothly carried out.

The image transformation process includes color adjustment, contrast transformation, perspective transformation, picture blurring process, and noise addition. The color adjustment of the sample picture comprises the steps of respectively adjusting the color of a background image and the color of a text image in the sample picture after classification processing, obtaining a first color, filling the picture background into the corresponding first color, then obtaining a second color, and filling the text font into the corresponding second color. In order to increase the difficulty of recognizing the text information in the picture, the first color and the second color may be similar or identical.

The contrast conversion processing is specifically to reduce the contrast of the text information in the sample picture, so that the difficulty in identifying the text information in the sample picture is improved. The contrast conversion process may employ a gray scale conversion method including linear gray scale conversion, piecewise linear gray scale conversion, and nonlinear conversion. After the image is processed by adopting linear gray scale transformation, the image is locally distorted. Therefore, in the embodiment, linear gray scale transformation can be adopted for the sample picture to achieve the purpose of reducing the accuracy of text information identification in the picture.

The perspective transformation processing specifically comprises the steps of obtaining coordinates of a text image in a sample picture and coordinates of the sample picture, selecting a perspective transformation matrix, processing the coordinates of the text image by adopting the perspective transformation matrix, obtaining the coordinates of the text image after the perspective transformation, and determining a transformed data enhanced image according to the coordinates of the text image after the perspective transformation and the coordinates of the sample picture, so that the perspective transformation processing of the sample picture is realized.

The image blurring processing is to add motion blur to the sample image, and the image blurring processing includes selecting a blurring amount according to the text image, and then blurring the text image according to the selected blurring amount, where the blurring amount includes, but is not limited to, a translational blurring amount and a translational blurring amount. Selecting the fuzzy quantity according to the text image, namely selecting the fuzzy quantity according to the size and the text content of the text image, wherein if the text information in the text image is an uncommon text, the fuzzy quantity can be selected to be larger than the size of the text image, namely, all the text images are subjected to fuzzy processing, so that the error recognition results of all the text information can be conveniently obtained subsequently; or selecting a text image with a fuzzy quantity smaller than that of the text image, then respectively carrying out fuzzy processing on the text in the text image, and carrying out text information identification on the text image to obtain more error identification results; if a part of text information in the text image is an uncommon text or a professional term, selecting a fuzzy amount according to the size of the part of text image, and then carrying out fuzzy processing on the part of text image.

The added noise may be gaussian noise. The image transformation processing method can process the sample picture in a combined manner, such as performing color adjustment, contrast transformation, perspective transformation, picture blurring processing and noise addition on the sample picture at the same time. When different image transformation processes are combined, different image transformation processes can be performed on the sample picture by adjusting the sequence of the image transformation processes, so that more different recognition results can be obtained.

In order to obtain a picture with text information identification difficulty higher, a shadow effect can be added near the text information in the sample picture, and a font is subjected to distortion processing. The distortion processing of the font can be realized by performing non-equal proportion stretching on the text image.

In an embodiment, the target field may be a financial field, and the text information in the obtained sample picture of the target field is "bank of construction in china", so the first labeling result is "bank of construction in china". After image conversion processing is carried out on a sample picture containing text information 'construction bank', a picture with low text information identification rate, namely a first processing result, is obtained, text information of the first processing result is identified, and a first identification result is obtained. The first recognition result includes, but is not limited to, "no bank in china", and "bank in china". The image conversion processing is carried out on the sample image in the target field, the image (first processing result) with the low text information recognition rate is obtained, then the text information recognition is carried out on the first processing result in the text recognition mode with the low recognition rate, more wrong first recognition results are obtained, and the data size of the second sample data set formed based on the first recognition results is larger. The source sample pictures of the first labeling result and the first identification result are the same, so that a large amount of sample data sets can be obtained without repeatedly labeling the identification result, and a large amount of labeling work is reduced.

Optionally, in this embodiment, the existing recognition model with lower recognition accuracy is used to recognize the plurality of first processing results, so as to obtain more first recognition results, and further form a second sample data set with a larger data size and richer data types, where the error correction accuracy of the second model obtained based on the training is higher. Recognition models with low recognition accuracy include, but are not limited to, convolutional neural network models, two-way long-short term memory networks, cyclic neural network models.

In order to improve the error correction accuracy of the second model, a specific implementation method for forming the second sample data set according to the first labeling result and the first recognition result can refer to fig. 2. Fig. 2 is a flow chart illustrating a method for obtaining a second sample data set according to an embodiment of the present invention.

As shown in fig. 2, the method for acquiring the second sample data set may include the following steps S210-S250:

s210, acquiring a new word of the target field, and judging whether the first labeling result comprises the new word;

s220, if not, acquiring a picture containing the new words, labeling the text information of the picture containing the new words, and acquiring a second labeling result;

s230, performing image transformation processing on the picture containing the new words to obtain a second processing result, identifying text information of the second processing result, and obtaining a second identification result;

s240, obtaining a first data set according to the first labeling result and the first identification result;

and S250, obtaining a second data set according to the second labeling result and the second identification result, and forming a second sample data set according to the first data set and the second data set.

Optionally, the new word is a newly appeared word or a used word in the target field, and the expression form of the new word includes, but is not limited to, chinese and english. The manner of obtaining the picture containing the new word includes, but is not limited to, directly taking or scanning the original image containing the new word. In order to acquire more second identification results, the identification model with low identification accuracy is adopted to identify the text information of the second processing result, so that a second sample data set with larger data volume and including more error text information is acquired. Meanwhile, the problem that the second model cannot correct the error recognition result of the new word is avoided, and the error correction accuracy of the second model is improved.

In order to expand the data size of the second sample data set, a specific implementation method for forming the second sample data set according to the first data set and the second data set may participate in fig. 3. Fig. 3 is another schematic flow chart of a method for obtaining a second sample data set according to an embodiment of the present invention.

As shown in fig. 3, the method for acquiring the second sample data set may include the following steps S310 to S350:

s310, acquiring labeled sample pictures, and clustering the labeled sample pictures to acquire a plurality of classification data sets;

s320, merging the first data set and the second data set to obtain a third data set, and acquiring the domain similarity of the classification data set and the third data set;

s330, if the domain similarity is larger than the similarity threshold, classifying the data set as a target data set;

s340, acquiring a target picture according to the target data set, performing image transformation processing on the target picture to acquire a third processing result, identifying the third processing result, and acquiring a third identification result;

and S350, acquiring text information of the target picture to obtain a third labeling result, acquiring a migration data set according to the third labeling result and the third recognition result, and merging the migration data set and the third data set to obtain a second sample data set.

In step S310 of this embodiment, the labeled sample picture is a labeled sample picture in a non-target field, a classification data set is obtained by obtaining the labeled sample picture in the non-target field and performing clustering processing on the labeled sample picture, the field similarity between the classification data set and the third data set is obtained, and a migration data set is obtained according to the field similarity, so as to expand the second sample data set.

Specifically, the method comprises the steps of obtaining the labeled content of a labeled sample picture, forming a labeled data set according to the labeled content of the labeled sample picture, clustering the labeled data set, and obtaining a plurality of classified data sets, wherein the clustering algorithm comprises but is not limited to a clustering algorithm comprising but not limited to a K-means clustering algorithm and a DBSCAN clustering algorithm. And after obtaining a plurality of classified data sets, judging whether the classified data sets meet preset conditions, if not, adjusting clustering parameters, and repeating the steps of the adjusted clustering parameters to cluster the labeled data sets to obtain the plurality of classified data sets. Different clustering algorithms adjust different clustering parameters, such as adjusting the radius Eps of the DBSCAN algorithm and the K value of the least attribute data number MinPts or K-means clustering algorithm.

In an embodiment, whether the classification data set meets the preset condition or not can be judged by judging whether the clustering result is reasonable or not, the rationality of the clustering result can be judged by the correlation of the fields to which different labeled contents belong in the same classification data set, and if the fields to which different labeled contents belong in the same classification data set are completely uncorrelated, the classification data set does not meet the preset condition. The relevance of the fields to which different labeled contents belong in the same classification data set can be judged according to the distance of the different labeled contents, specifically, the relevance of the different labeled contents can be obtained according to the Euclidean distances of the different standard contents, the larger the Euclidean distance of the different standard contents is, the smaller the relevance of the fields to which the different labeled contents belong is, if the Euclidean distance of the different labeled contents is larger than a preset Euclidean distance threshold value, the fields to which the labeled contents belong are completely irrelevant, and the clustering parameters need to be adjusted correspondingly.

In an embodiment, a specific implementation method for obtaining the domain similarity between the classification dataset and the third dataset can be seen in fig. 4. Fig. 4 is a flowchart illustrating a method for obtaining domain similarity according to an embodiment of the present invention.

As shown in fig. 4, the method for acquiring domain similarity may include the following steps S310 to S350:

s410, obtaining synonymous parameters of the classification data set and the third data set, and obtaining synonymous evaluation parameters according to the synonymous parameters and preset synonymous weight;

s420, acquiring antisense parameters of the classification data and the third data set, and acquiring synonymy evaluation parameters according to the antisense parameters and preset antisense weight;

s430, obtaining the classification data and the distance parameter of the third data set, and obtaining a distance evaluation parameter according to the distance parameter and a preset distance weight;

and S440, acquiring the field similarity of the classification data set and the third data set according to the synonymous evaluation parameter, the synonymous evaluation parameter and the distance evaluation parameter.

Specifically, a synonymous data set of the third data set is obtained, the similarity between the classification data set and the synonymous data set is obtained, the same parameter is obtained, and the similarity between the classification data set and the synonymous data set can be obtained through the distance between the classification data set and the synonymous data set. And acquiring an antisense data set of the third data set, acquiring the similarity of the classification data set and the antisense data set to obtain antisense parameters, wherein the similarity of the classification data set and the antisense data set can be acquired through the distance between the classification data set and the antisense data set. And acquiring the similarity of the classification data set and the third data set to obtain a distance parameter, wherein the similarity of the classification data set and the third data set can be acquired through the distance of the classification data set and the third data set. The preset distance weight, the preset synonymous weight and the preset antisense weight can be set according to actual conditions, in order to improve the accuracy of the field similarity, the set preset distance weight is larger than the preset synonymous weight, and the preset synonymous weight is larger than the preset antisense weight. The synonymy evaluation parameter is the product of the synonymy parameter and a preset synonymy weight, the antisense evaluation parameter is the product of the antisense parameter and a preset antisense weight, and the distance evaluation parameter is the product of the distance parameter and a preset distance weight. By obtaining the synonymous evaluation parameter, the antisense evaluation parameter and the distance evaluation parameter of the classification data set and the third data set, the domain similarity of the classification data set and the third data set is obtained, the obtained domain similarity is more accurate, the attribution domain obtained according to the domain similarity is more consistent with the target domain, and the error correction accuracy of the second model established based on the domain similarity to the picture to be recognized in the target domain is higher.

In order to enlarge the data volume of the first sample data set, the implementation step of forming the first sample data set according to the sample picture of the target field and the first labeling result comprises: acquiring a fourth data set according to the sample picture and the first labeling result of the target field; and acquiring a fifth data set according to the third labeling result and the target picture, and merging the fourth data set and the fifth data set to obtain a first sample data set. The method comprises the steps of obtaining a fifth data set by obtaining a target data set with higher field similarity with a third data set, obtaining a first sample data set, enlarging the data volume of the first sample data set without increasing the marking workload, enabling the identification accuracy of a picture to be identified based on the first model to be higher, enabling the target identification result input into the first model to be more consistent with the real situation by improving the identification accuracy of the picture to be identified based on the first model, and enabling the finally obtained target error correction result to be more correct.

In step S140 of this embodiment, the step of training an initial recognition model by using a first sample data set and obtaining a first model for text recognition includes: dividing the first sample data set into a training data set and a testing data set; training an initial recognition model by adopting a training data set to obtain a first model for text recognition; inputting the test data set into the first model to obtain a test result; and acquiring errors of the first labeling result and the test result according to the cross entropy loss function to obtain a first error, and updating the first model by adopting reverse propagation of the first error. Specifically, the initial recognition model is a model combining a recurrent neural network and a convolutional neural network, wherein the recurrent neural network may be specifically a bidirectional long-term memory network. The method comprises the steps of improving the identification accuracy of a picture to be identified by a first model by updating the first model through first error back propagation, enabling the probability that the obtained target identification result is an error result to be lower, and enabling the probability that the obtained target error correction result is an error text to be lower by inputting the target identification result into a second model based on the error result.

In step S150 of this embodiment, the training of the initial error correction model by using the second sample data set, and the obtaining of the second model for text error correction includes: training an initial error correction model by adopting a second sample data set, and acquiring a prediction result after forward propagation; and acquiring errors between the prediction result and the labeling result in the second sample data set to obtain a second error, updating the trained initial error correction model by adopting the second error, and acquiring a second model for text error correction. The initial error correction model is a language model including, but not limited to, an N-gram model, a recurrent neural network, a bidirectional long-short term memory network, a convolutional neural network. And updating the trained initial error correction model by adopting back propagation so as to obtain a second model with higher error correction accuracy.

The embodiment of the invention provides an error correction method based on a picture identification result, which comprises the steps of obtaining a sample picture in a target field, marking the sample picture and obtaining a first marking result; and then, image transformation processing is carried out on the sample picture, the purpose of reducing the identification accuracy of the sample picture is achieved, so that the picture with low text information identification rate is obtained, the text information of the sample picture after image transformation processing is identified, more first identification results which do not accord with the first labeling result are obtained, and the data size of a second sample data set formed by the first identification results is larger. The source sample pictures of the first labeling result and the wrong first identification result are the same, so that repeated labeling on the wrong identification result is not needed, and by adopting the method, a large amount of sample data sets can be obtained, and a large amount of labeling work is reduced; the second sample data set is adopted to train the language model to obtain the second model, the language model error correction and the font similarity error correction are combined into a text error correction module based on the established second model, and a single font similarity module is not used, so that the steps of constructing a stroke dictionary and calculating an editing distance are omitted, the condition of unknown words is avoided, meanwhile, the error correction speed of the second model is improved, and the method is more suitable for practical scenes.

Based on the same inventive concept as the error correction method based on the picture recognition result, correspondingly, the embodiment also provides an error correction device based on the picture recognition result. Marking the sample picture to obtain a first marking result; and then, image transformation processing is carried out on the sample picture, and text information of the sample picture after the image transformation processing is identified, so that more first identification results which do not conform to the first labeling result are obtained, and the data volume of a second sample data set formed by the first identification results is larger. The source sample pictures of the first labeling result and the first identification result which is wrong are the same, so that repeated labeling of the wrong identification result is not needed, and by adopting the method, a large amount of sample data sets can be obtained, and meanwhile, a large amount of labeling work is reduced.

In this embodiment, the error correction device based on the picture identification result executes the error correction method based on the picture identification result according to any one of the embodiments, and specific functions and technical effects are as described in the embodiments above, and are not described herein again.

Fig. 5 is a block diagram of an error correction apparatus based on a picture recognition result according to the present invention. As shown in fig. 5, the error correction device based on the picture recognition result includes: the system comprises a 51 labeling module, a 52 identifying module, a 53 first model establishing module, a 54 second model establishing module and a 55 error correction result acquiring module.

In some exemplary embodiments, the second model building module comprises:

the judging unit is used for acquiring a new word of the target field and judging whether the first labeling result comprises the new word;

a second labeling result obtaining unit, configured to obtain, if the result is not the same as the first labeling result, a picture including a new word, label text information of the picture including the new word, and obtain a second labeling result;

the second recognition result acquisition unit is used for carrying out image transformation processing on the picture containing the new words to acquire a second processing result, recognizing text information of the second processing result and acquiring a second recognition result;

a first data set obtaining unit, configured to obtain a first data set according to the first labeling result and the first recognition result;

and the second sample set acquisition first unit is used for acquiring a second data set according to the second labeling result and the second identification result and forming the second sample data set according to the first data set and the second data set.

In some exemplary embodiments, the second model building module further comprises:

the classification data set acquisition unit is used for acquiring the labeled sample pictures and clustering the labeled sample pictures to acquire a plurality of classification data sets;

a domain similarity obtaining unit, configured to combine the first data set and the second data set to obtain a third data set, and obtain domain similarities of the classification data set and the third data set;

a target data set obtaining unit, configured to, if the domain similarity is greater than a similarity threshold, determine that the classified data set is a target data set;

a third identification result obtaining unit, configured to obtain a target picture according to the target data set, perform image transformation processing on the target picture, obtain a third processing result, identify the third processing result, and obtain a third identification result;

and the second sample set acquisition second unit is used for acquiring the text information of the target picture to obtain a third labeling result, acquiring a migration data set according to the third labeling result and the third identification result, and merging the migration data set and the third data set to obtain the second sample data set.

In some exemplary embodiments, the domain similarity acquiring unit includes:

a synonymy evaluation parameter obtaining subunit, configured to obtain a synonymy parameter of the classification dataset and the third dataset, and obtain a synonymy evaluation parameter according to the synonymy parameter and a preset synonymy weight;

an antisense evaluation parameter obtaining subunit, configured to obtain antisense parameters of the classification data and the third data set, and obtain antisense evaluation parameters according to the antisense parameters and preset antisense weights;

a distance evaluation parameter obtaining subunit, configured to obtain distance parameters of the classification data and the third data set, and obtain distance evaluation parameters according to the distance parameters and preset distance weights;

and the domain similarity obtaining subunit is configured to obtain the domain similarity of the classification data set and the third data set according to the synonymous evaluation parameter, and the distance evaluation parameter.

In some exemplary embodiments, the first model building module comprises:

a fourth data set obtaining unit, configured to obtain a fourth data set according to the sample picture of the target field and the first labeling result;

and the first sample data set acquisition unit is used for acquiring a fifth data set according to the third labeling result and the target picture, and merging the fourth data set and the fifth data set to obtain the first sample data set.

In some exemplary embodiments, the first model building module further comprises:

a data set dividing unit, configured to divide the first sample data set into a training data set and a test data set;

a first model obtaining unit, configured to train the initial recognition model with the training data set, and obtain a first model for text recognition, where the initial recognition model is a model combining a cyclic neural network and a convolutional neural network;

the test result output unit is used for inputting the test data set into the first model to obtain a test result;

and the first model updating unit is used for acquiring errors of the first labeling result and the test result according to a cross entropy loss function to obtain a first error, and updating the first model by adopting the first error back propagation.

a prediction result obtaining unit, configured to train the initial error correction model by using the second sample data set, and obtain a prediction result after forward propagation, where the initial error correction model is a convolutional neural network;

and the second model updating unit is used for acquiring errors between the prediction result and the labeling result in the second sample data set to obtain a second error, updating the trained initial error correction model by adopting the second error, and acquiring a second model for text error correction.

The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.

The present embodiment also provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements any of the methods in the present embodiments.

In an embodiment, referring to fig. 6, the embodiment further provides an electronic device 600, which includes a memory 601, a processor 602, and a computer program stored on the memory and executable on the processor, and when the processor 602 executes the computer program, the steps of the method according to any one of the above embodiments are implemented.

The computer-readable storage medium in the present embodiment can be understood by those skilled in the art as follows: all or part of the steps for implementing the above method embodiments may be performed by hardware associated with a computer program. The aforementioned computer program may be stored in a computer readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

The electronic device provided by the embodiment comprises a processor, a memory, a transceiver and a communication interface, wherein the memory and the communication interface are connected with the processor and the transceiver and are used for realizing mutual communication, the memory is used for storing a computer program, the communication interface is used for carrying out communication, and the processor and the transceiver are used for operating the computer program to enable the electronic device to execute the steps of the method.

In this embodiment, the Memory may include a Random Access Memory (RAM), and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory.

The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.

In the above-described embodiments, references in the specification to "the present embodiment," "an embodiment," "another embodiment," "in some exemplary embodiments," or "other embodiments" means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least some embodiments, but not necessarily all embodiments. The various appearances of the phrase "the present embodiment," "one embodiment," or "another embodiment" are not necessarily all referring to the same embodiment.

In the embodiments described above, although the present invention has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those skilled in the art in light of the foregoing description. For example, other memory structures (e.g., dynamic ram (dram)) may use the discussed embodiments. The embodiments of the invention are intended to embrace all such alternatives, modifications and variances that fall within the broad scope of the appended claims.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The invention is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The foregoing embodiments are merely illustrative of the principles of the present invention and its efficacy, and are not to be construed as limiting the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims

1. An error correction method based on a picture recognition result is characterized by comprising the following steps:

2. The method according to claim 1, wherein the forming a second sample data set according to the first labeling result and the first recognition result specifically includes:

3. The method according to claim 2, wherein the forming the second sample data set according to the first data set and the second data set specifically includes:

4. The image recognition result-based error correction method according to claim 3, wherein the obtaining of the domain similarity between the classified data set and the third data set specifically comprises:

5. The image recognition result-based error correction method according to claim 3, wherein the forming a first sample data set according to the sample image of the target field and the first labeling result specifically includes:

6. The method of claim 5, wherein the training the initial recognition model with the first sample data set to obtain the first model for text recognition specifically comprises:

inputting the test data set into the first model to obtain a test result;

7. The method according to claim 3, wherein the training of the initial error correction model using the second sample data set to obtain a second model for text error correction comprises:

8. An error correction device based on picture recognition results, characterized in that the error correction device based on picture recognition results comprises:

9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the method of any of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.