[go: up one dir, main page]

CN108595410B - Automatic correction method and device for handwritten composition - Google Patents

Automatic correction method and device for handwritten composition Download PDF

Info

Publication number
CN108595410B
CN108595410B CN201810223663.8A CN201810223663A CN108595410B CN 108595410 B CN108595410 B CN 108595410B CN 201810223663 A CN201810223663 A CN 201810223663A CN 108595410 B CN108595410 B CN 108595410B
Authority
CN
China
Prior art keywords
sentence
word
correction
handwritten
error
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810223663.8A
Other languages
Chinese (zh)
Other versions
CN108595410A (en
Inventor
王岩
宋旸
张绍亮
袁景伟
黄宇飞
程童
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baige Feichi Technology Co ltd
Original Assignee
Xiaochuanchuhai Education Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiaochuanchuhai Education Technology Beijing Co ltd filed Critical Xiaochuanchuhai Education Technology Beijing Co ltd
Priority to CN201810223663.8A priority Critical patent/CN108595410B/en
Publication of CN108595410A publication Critical patent/CN108595410A/en
Application granted granted Critical
Publication of CN108595410B publication Critical patent/CN108595410B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Machine Translation (AREA)
  • Character Discrimination (AREA)

Abstract

The invention provides an automatic correction method and device for handwritten compositions, wherein the method comprises the following steps: acquiring a handwritten composition image to be corrected; removing interference and background lines in the handwritten text image by adopting a connected domain analysis algorithm and a straight line detection algorithm; segmenting, segmenting and identifying the processed handwritten composition image to obtain corresponding text content; segmenting the text content, analyzing the part of speech of each word to obtain the part of speech of each word, analyzing shallow syntax to obtain phrases and types of the phrases in the sentences, further selecting a rule strategy or a depth model of a specific error type, and detecting a specific grammatical error; for each detected grammar error, generating a correction suggestion by integrating the recognition result of the error recognition model and the correction suggestion model, thereby obtaining the correction result of the whole composition; and constructing a convolutional neural network based on the composition text content and the correction result, and realizing intelligent scoring of the composition text.

Description

Automatic correction method and device for handwritten composition
Technical Field
The invention relates to the technical field of homework correction, in particular to an automatic correction method and device for handwritten compositions.
Background
The current composition correcting method is mainly used for automatically correcting text contents, after the text contents are obtained, sentence segmentation, syntactic analysis and other operations are carried out on the text contents, correcting suggestions corresponding to each sentence in the text contents are obtained, and correcting results corresponding to the text contents are further obtained. However, most of the existing compositions are mainly handwritten compositions, which have the problems of word adhesion, line inclination and the like, and if the composition correcting method is applied to the handwritten compositions, the correcting accuracy is low and the correcting efficiency is poor.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, the first objective of the present invention is to provide an automatic correction method for handwritten compositions, which is used to solve the problems of low correction accuracy and poor correction efficiency in the prior art.
The second purpose of the invention is to provide an automatic correction device for handwritten compositions.
The third purpose of the invention is to provide another automatic correction device for handwritten compositions.
A fourth object of the invention is to propose a non-transitory computer-readable storage medium.
A fifth object of the invention is to propose a computer program product.
In order to achieve the above object, an embodiment of a first aspect of the present invention provides an automatic modifying method for a handwritten composition, including:
acquiring a handwritten composition image to be corrected;
processing the handwritten text image by adopting a connected domain analysis algorithm and a straight line detection algorithm, and removing interference and background lines in the handwritten text image to obtain a processed handwritten text image;
segmenting the processed handwritten composition image and segmenting words to obtain a plurality of word image blocks;
recognizing a plurality of word image blocks by adopting a preset word recognition model to obtain text contents corresponding to the handwritten composition images;
and carrying out sentence segmentation, syntactic analysis and sentence correction on the text content to obtain a correction result corresponding to the handwritten composition image.
Further, before segmenting the processed handwritten composition image and segmenting words to obtain a plurality of word image blocks, the method further includes:
and splitting words of the processed handwritten composition image, and disconnecting the adhesion part between two adjacent lines in the handwritten composition image.
Further, before the step of recognizing the plurality of word image blocks by using a preset word recognition model and obtaining the text content corresponding to the handwritten composition image, the method further includes:
and normalizing the word image blocks according to a preset format corresponding to the word recognition model to obtain the processed word image blocks.
Further, before the step of recognizing the plurality of word image blocks by using a preset word recognition model and obtaining the text content corresponding to the handwritten composition image, the method further includes:
obtaining a word recognition sample; the word recognition samples include: handwritten word image samples and corresponding text content;
normalizing the word recognition sample according to a preset format corresponding to the word recognition model to obtain a processed word recognition sample;
and training the word recognition model according to the processed word recognition sample to obtain the preset word recognition model.
Further, the training the word recognition model according to the processed word recognition sample to obtain the preset word recognition model includes:
dividing the word recognition samples according to the word length to obtain short word recognition samples, single word recognition samples and long single word recognition samples;
and training the word recognition model by sequentially adopting the short word recognition sample, the single word recognition sample and the long word recognition sample to obtain the preset word recognition model.
Further, before the sentence segmentation, the syntactic analysis, and the sentence correction are performed on the text content, and the correction result corresponding to the handwritten composition image is obtained, the method further includes:
inputting the text content into a preset error correction model to obtain the text content after error correction; the error correction model is composed of a plurality of language models; the language model is an N-element language model; the value of N is a positive integer.
Further, before inputting the text content into a preset error correction model and acquiring the text content after error correction, the method further includes:
acquiring an error correction training sample; the error correction training samples comprise: text content samples and corresponding error corrected samples;
normalizing the error correction training samples according to a preset format corresponding to the error correction model to obtain the processed error correction training samples;
and training the error correction model according to the processed error correction training sample to obtain the preset error correction training model.
Further, the performing sentence segmentation, syntactic analysis, and sentence correction on the text content to obtain a correction result corresponding to the handwritten composition image includes:
carrying out sentence segmentation on the text content to obtain a plurality of sentences in the text content;
performing syntactic analysis on each sentence of the text content to obtain an analysis result of each sentence; the analysis result comprises: words, phrases, parts of speech of the words and types of the phrases included in the sentence;
for each sentence, selecting a corresponding error recognition model to recognize the sentence according to the analysis result of the sentence, and acquiring error information in the sentence;
inputting error information in the sentence into a preset correction suggestion model to obtain a correction suggestion corresponding to the sentence;
and generating a correction result corresponding to the handwritten composition image according to the correction suggestion corresponding to each sentence.
Further, the selecting, for each sentence, a corresponding error recognition model according to the analysis result of the sentence to recognize the sentence, and acquiring error information in the sentence includes:
acquiring context information corresponding to each word in each sentence aiming at each sentence;
determining a matrix vector corresponding to each word according to the context information corresponding to each word;
selecting an error recognition model corresponding to the part of speech according to the part of speech of each word;
and inputting the matrix vector corresponding to each word in the sentence into the error recognition model corresponding to the part of speech to acquire the error information in the sentence.
Further, before inputting the matrix vector corresponding to each word in the sentence into the misrecognition model corresponding to the part of speech and obtaining the error information in the sentence, the method further includes:
acquiring error recognition training samples corresponding to the parts of speech; the error recognition training sample comprises: the matrix vector corresponding to the word with the part of speech and the error information corresponding to the word;
and aiming at the error recognition models corresponding to the parts of speech, adopting the error recognition training samples corresponding to the parts of speech to train the error recognition models.
Further, the sentence segmentation is performed on the text content to obtain a plurality of sentences in the text content, and the sentence segmentation includes:
acquiring a type corresponding to the text content; the type is used for identifying the accuracy of the text content sentence division;
acquiring the feature to be extracted corresponding to the type;
according to the features to be extracted, feature extraction is carried out on the text content, and segmentation feature information in the text content is obtained;
and carrying out sentence segmentation on the text content according to the segmentation characteristic information to obtain a plurality of sentences in the text content.
Further, before selecting a corresponding error recognition model to recognize the sentence according to the analysis result of the sentence and acquiring error information in the sentence, the method further includes:
aiming at each sentence, comparing the sentence with a preset error mode library to acquire error information in the sentence; the error pattern library comprises: and the regular expressions correspond to various error patterns.
Further, the generating a correction result corresponding to the handwritten composition image according to the correction suggestion corresponding to each sentence includes:
for each sentence, when a plurality of correction suggestions corresponding to the sentence are provided, inputting the plurality of correction suggestions corresponding to the sentence into a preset correction selection model, and acquiring a single correction suggestion corresponding to the sentence;
and generating a correction result corresponding to the handwritten composition image according to the single correction suggestion corresponding to each sentence.
Further, before the inputting the error information in the sentence into a preset correction suggestion model and obtaining the correction suggestion corresponding to the sentence, the method further includes:
acquiring a correction training sample; the correction training sample comprises: the method comprises the steps of sentence samples, error information in the sentence samples and correction suggestions corresponding to the sentence samples;
and training a correction suggestion model according to the correction training sample to obtain the preset correction suggestion model.
Further, after the sentence segmentation, the syntactic analysis, and the sentence correction are performed on the text content, and the correction result corresponding to the handwritten composition image is obtained, the method further includes:
acquiring characteristic information in the text content; the characteristic information comprises any one or more of the following information: vocabulary information, grammar information, sentence information, or correction information;
when the characteristic information is vocabulary information, grammatical information or sentence information, inputting the characteristic information into a corresponding scoring model, and acquiring a score corresponding to the characteristic information;
the characteristic information comprises: and inputting the characteristic information into a preset comprehensive scoring model when the vocabulary information, the grammar information, the sentence information and the correction information are carried out, and acquiring the comprehensive scoring corresponding to the handwritten composition image.
The automatic correction method of the handwritten composition of the embodiment of the invention obtains the image of the handwritten composition to be corrected; removing interference and background lines in the handwritten text image by adopting a connected domain analysis algorithm and a straight line detection algorithm; segmenting, segmenting and identifying the processed handwritten composition image to obtain corresponding text content; segmenting the text content, analyzing the part of speech of each word to obtain the part of speech of each word, analyzing shallow syntax to obtain phrases and types of the phrases in the sentences, further selecting a rule strategy or a depth model of a specific error type, and detecting a specific grammatical error; for each detected grammar error, generating a correction suggestion by integrating the recognition result of the error recognition model and the correction suggestion model, thereby obtaining the correction result of the whole composition; and constructing a convolutional neural network based on the composition text content and the correction result, and realizing intelligent scoring of the composition text, thereby improving the accuracy and the efficiency of correction.
In order to achieve the above object, a second aspect of the present invention provides an automatic correcting device for handwritten compositions, comprising:
the acquisition module is used for acquiring a handwritten composition image to be corrected;
the processing module is used for processing the handwritten composition image by adopting a connected domain analysis algorithm and a straight line detection algorithm, removing interference and background lines in the handwritten composition image and obtaining a processed handwritten composition image;
the segmentation module is used for segmenting the processed handwritten composition image and segmenting words to obtain a plurality of word image blocks;
the recognition module is used for recognizing the word image blocks by adopting a preset word recognition model and acquiring text contents corresponding to the handwritten composition images;
and the correction module is used for carrying out sentence segmentation, syntactic analysis and sentence correction on the text content and obtaining a correction result corresponding to the handwritten composition image.
Further, the wholesale module comprises:
the segmentation unit is used for carrying out sentence segmentation on the text content to obtain a plurality of sentences in the text content;
the analysis unit is used for carrying out syntactic analysis on each sentence of the text content and acquiring an analysis result of each sentence; the analysis result comprises: words, phrases, parts of speech of the words and phrases included in the sentence
A type;
the recognition unit is used for selecting a corresponding error recognition model to recognize each sentence according to the analysis result of the sentence so as to acquire error information in the sentence;
the input unit is used for inputting the error information in the sentence into a preset correction suggestion model and acquiring the correction suggestion corresponding to the sentence;
and the generating unit is used for generating a correction result corresponding to the handwritten composition image according to the correction suggestion corresponding to each sentence.
Furthermore, the identification unit is specifically configured to,
acquiring context information corresponding to each word in each sentence aiming at each sentence;
determining a matrix vector corresponding to each word according to the context information corresponding to each word;
selecting an error recognition model corresponding to the part of speech according to the part of speech of each word;
and inputting the matrix vector corresponding to each word in the sentence into the error recognition model corresponding to the part of speech to acquire the error information in the sentence.
The automatic correction device for the handwritten composition of the embodiment of the invention obtains the image of the handwritten composition to be corrected; removing interference and background lines in the handwritten text image by adopting a connected domain analysis algorithm and a straight line detection algorithm; segmenting, segmenting and identifying the processed handwritten composition image to obtain corresponding text content; segmenting the text content, analyzing the part of speech of each word to obtain the part of speech of each word, analyzing shallow syntax to obtain phrases and types of the phrases in the sentences, further selecting a rule strategy or a depth model of a specific error type, and detecting a specific grammatical error; for each detected grammar error, generating a correction suggestion by integrating the recognition result of the error recognition model and the correction suggestion model, thereby obtaining the correction result of the whole composition; and constructing a convolutional neural network based on the composition text content and the correction result, and realizing intelligent scoring of the composition text, thereby improving the accuracy and the efficiency of correction.
In order to achieve the above object, a third aspect of the present invention provides another automatic correction device for handwritten compositions, including: the device comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, and is characterized in that the processor executes the program to realize the automatic correction method of the handwritten composition.
In order to achieve the above object, a fourth aspect of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the automatic correction method for handwritten compositions as described above.
In order to achieve the above object, a fifth embodiment of the present invention provides a computer program product, wherein when being executed by an instruction processor of the computer program product, a method for automatically correcting a handwritten composition is performed, and the method includes:
acquiring a handwritten composition image to be corrected;
processing the handwritten text image by adopting a connected domain analysis algorithm and a straight line detection algorithm, and removing interference and background lines in the handwritten text image to obtain a processed handwritten text image;
segmenting the processed handwritten composition image and segmenting words to obtain a plurality of word image blocks;
recognizing a plurality of word image blocks by adopting a preset word recognition model to obtain text contents corresponding to the handwritten composition images;
and carrying out sentence segmentation, syntactic analysis and sentence correction on the text content to obtain a correction result corresponding to the handwritten composition image.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a schematic flow chart of an automatic handwritten composition correcting method according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating another method for automatically correcting a handwritten composition according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an automatic handwritten composition correcting device according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of another automatic handwritten composition correcting device according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of another automatic handwritten composition correcting device according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The following describes an automatic handwritten composition correcting method and device according to an embodiment of the present invention with reference to the accompanying drawings.
Fig. 1 is a schematic flow chart of an automatic handwritten composition correcting method according to an embodiment of the present invention. As shown in fig. 1, the automatic correction method for handwritten compositions includes the following steps:
s101, obtaining a handwritten composition image to be corrected.
The execution main body of the automatic correction method of the handwritten composition provided by the invention is an automatic correction device of the handwritten composition, and the automatic correction device of the handwritten composition can be hardware equipment, such as terminal equipment, a server cluster and the like, or software installed on the hardware equipment and the like. The handwritten composition image in this embodiment may be an electronic image obtained by photographing or scanning the handwritten composition.
S102, processing the handwritten text image by adopting a connected domain analysis algorithm and a straight line detection algorithm, and removing interference and background lines in the handwritten text image to obtain a processed handwritten text image.
In this embodiment, the automatic handwritten text modification device may perform binarization processing on the handwritten text image, and then process the handwritten text image by using a connected domain analysis algorithm and a line detection algorithm, so as to remove interference and background lines in the handwritten text image, and obtain a processed handwritten text image.
The binarization processing of the handwritten text image refers to determining the gray value of each pixel point according to the color of each pixel point in the handwritten text image, comparing the gray value of each pixel point with a critical gray value, and setting the gray value of the pixel point on the handwritten text image to be 0 or 255 according to the comparison result, namely, the whole image presents an obvious visual effect only including black and white.
In the embodiment, the handwritten composition image is processed by adopting a connected domain analysis algorithm, and broken lines in a certain area in the handwritten composition image can be connected; the certain area is, for example, an area where a word is located. The handwritten composition image is processed by adopting a connected domain analysis algorithm, and the interference in the handwritten composition image, such as the part highlight condition when the handwritten composition is photographed, can be removed.
In this embodiment, the handwritten text image is processed by using a straight line detection algorithm, so that background lines, such as horizontal lines, four lines, three grids, and the like of a notebook computer, in the handwritten text image can be detected and removed. Because the strokes of the words are often adhered to the background line in the handwritten composition, or the words are embedded into the background line, after the words are segmented, the segmented word image blocks comprise partial background lines, and the recognition accuracy of the words is influenced, so that the background line in the handwritten composition image is detected and removed by adopting a straight line detection algorithm, the recognition accuracy of the words can be improved, and the correction accuracy is improved.
And S103, segmenting the processed handwritten composition image and segmenting words to obtain a plurality of word image blocks.
Further, because there may be adhesion between two adjacent lines of words in the handwritten composition, in order to avoid that two lines of words with adhesion are taken as one line in the line segmentation process, before step 103, the method may further include: the method has the advantages that the word splitting is carried out on the processed handwritten composition image, the adhesion part between two adjacent lines in the handwritten composition image is disconnected, so that two lines of words with adhesion conditions can be split into two lines in the line splitting process, the line splitting accuracy is improved, and the word recognition accuracy is further improved.
And S104, recognizing the plurality of word image blocks by adopting a preset word recognition model, and acquiring text contents corresponding to the handwritten composition images.
In this embodiment, the preset word recognition model may be, for example, a Long Short-Term Memory (LSTM). The process of the automatic written composition correcting device executing step 104 may specifically be that a preset word recognition model is adopted to recognize a plurality of word image blocks, obtain words corresponding to the word image blocks, and integrate the words corresponding to the word image blocks to obtain text contents corresponding to the written composition images.
In this embodiment, in order to improve the recognition accuracy of the word recognition model, the word image block and the word recognition sample may be normalized according to a preset format corresponding to the word recognition model, and therefore, on the basis of the foregoing embodiment, before step 104, the method may further include: and normalizing the word image blocks according to a preset format corresponding to the word recognition model to obtain the processed word image blocks.
Correspondingly, before step 104, the training process for the word recognition model may include: obtaining a word recognition sample; the word recognition samples include: handwritten word image samples and corresponding text content; carrying out normalization processing on the word recognition sample according to a preset format corresponding to the word recognition model to obtain a processed word recognition sample; and training the word recognition model according to the processed word recognition sample to obtain a preset word recognition model.
The normalization process for the word image block may be, for example, to adjust the brightness, color, and the like of the word image block to a format required by the word recognition model.
In the embodiment, in the process of training the word recognition model, in order to improve the training effect of the word recognition model, the word recognition samples can be divided according to the lengths of the words to obtain short word recognition samples, single word recognition samples and long word recognition samples; and training the word recognition model by sequentially adopting the short word recognition sample, the single word recognition sample and the long word recognition sample to obtain a preset word recognition model. Wherein, the length of the handwritten word in the short word recognition sample is less than that of the handwritten word in the single word recognition sample; the length of the handwritten word in the single word recognition sample is less than the length of the handwritten word in the long single word recognition sample.
Furthermore, in the process of word segmentation, over-segmentation and under-segmentation exist, wherein the over-segmentation refers to the segmentation of one word into two or more words; the under-segmentation refers to segmenting two or more words into one word, so that the word identified by the word identification model may be composed of a plurality of words or be a part of a word, and therefore, in order to improve the accuracy of word identification, in step 104, after the word corresponding to the word image block is identified, the word may be compared with the word in the preset dictionary to obtain the word in the dictionary matched with the word, when the word is a part of the word in the dictionary, a second word corresponding to the word image block before or after the word image block is obtained, whether the combination of the word and the second word is the word in the dictionary is judged, and then the over-segmented word is combined according to the judgment result, and the under-segmented word is re-segmented, and the like.
Further, since the recognition accuracy of the word recognition model is not one hundred percent, the recognition result of the word recognition model may have errors, and in order to improve the recognition accuracy of the handwritten composition image, after step 104, the method may further include: inputting the text content into a preset error correction model to obtain the text content after error correction; the error correction model is composed of a plurality of language models; the language model is an N-element language model; the value of N is a positive integer.
Correspondingly, the training process of the error correction model specifically may be to obtain an error correction training sample; the error correction training samples comprise: text content samples and corresponding error corrected samples; normalizing the error correction training samples according to a preset format corresponding to the error correction model to obtain the processed error correction training samples; and training the error correction model according to the processed error correction training sample to obtain a preset error correction training model.
The processing process of the error correction model on the text content may specifically be to acquire a candidate word in the text content, acquire an analogous word of the candidate word, and use the analogous word as the candidate word; performing normalization processing on the candidate words, for example, performing normalization processing on capital and lowercase letters, full half angles, single and plural forms, digital punctuations, inserted spaces among the candidate words, punctuations before and after the candidate words and the like of the candidate words; taking the candidate words and the similar words in the text content as a list of candidate words, calculating the transition probability among the candidate words, acquiring the candidate words with the corresponding transition probability exceeding a first transition probability threshold, acquiring the most appropriate words in the candidate words by combining an optimal path search algorithm, and integrating to obtain the text content.
It should be noted that, in the process of calculating the transition probability between each column of candidate words, a binary language model (bigram) may be first used to calculate the transition probability between two adjacent columns of candidate words, and when the transition probability is smaller than a second transition probability threshold, the corresponding candidate word is deleted; when the transition probability exceeds a second transition probability threshold, the transition probability between three or more continuous columns of candidate words is not calculated by adopting a ternary language model (trigram) or a multivariate language model any more, so that the calculation amount of the transition probability is reduced, and the processing speed of an error correction model on the text content is improved.
And S105, carrying out sentence segmentation, syntactic analysis and sentence correction on the text content to obtain a correction result corresponding to the handwritten composition image.
The automatic correction method of the handwritten composition in the embodiment of the invention obtains the image of the handwritten composition to be corrected; processing the handwritten composition image by adopting a connected domain analysis algorithm and a straight line detection algorithm, and removing interference and background lines in the handwritten composition image to obtain a processed handwritten composition image; segmenting the processed handwritten composition image and segmenting words to obtain a plurality of word image blocks; recognizing the plurality of word image blocks by adopting a preset word recognition model to obtain text contents corresponding to the handwritten composition images; the method comprises the steps of carrying out sentence segmentation, syntactic analysis and sentence correction on text contents, and obtaining a correction result corresponding to a handwritten composition image, so that before correcting the handwritten composition image, interference and background lines in the handwritten composition image are removed, the handwritten composition image is identified, corresponding text contents are obtained, and then the text contents are corrected, and therefore the accuracy and the correction efficiency of correction are improved.
Fig. 2 is a flowchart illustrating another method for automatically correcting a handwritten composition according to an embodiment of the present invention. As shown in fig. 2, based on the embodiment shown in fig. 1, step 105 may specifically include the following steps:
s1051, carrying out sentence segmentation on the text content, and obtaining a plurality of sentences in the text content.
In this embodiment, the process of executing step 1051 by the automatic handwritten composition correcting device may specifically be to obtain a type corresponding to the text content; the type is used for identifying the accuracy of text content sentence division; acquiring features to be extracted corresponding to the types; according to the features to be extracted, feature extraction is carried out on the text content, and segmentation feature information in the text content is obtained; and carrying out sentence segmentation on the text content according to the segmentation characteristic information to obtain a plurality of sentences in the text content.
In this embodiment, the automatic correction device for handwritten compositions may determine the type corresponding to the text content by using a machine learning method. The type can be a handwritten composition image or a common composition specifically; the type of the handwritten composition image means that the text content is a text content obtained by recognizing the handwritten composition image.
For example, in the case that the type is a handwritten composition image, since punctuation marks are mistakenly used in the process of handwriting, in order to improve the accuracy of sentence division, the feature to be extracted corresponding to the type may be an initial letter in capitalization or the like, and the text content is sentence-divided according to the features to obtain a plurality of sentences in the text content. When the type is a common composition, the feature to be extracted corresponding to the type may be a punctuation mark or the like.
Further, in order to improve the accuracy of the correction, before step 1051, the method may further include: the text content is subjected to preprocessing operations, such as encoding, verification, interference filtering, normalization, and the like.
S1052, carrying out syntactic analysis on each sentence of the text content to obtain an analysis result of each sentence; the analysis results include: words, phrases, parts of speech of words, and types of phrases included in sentences.
In this embodiment, the process of the automatic modifying apparatus for handwritten compositions executing step 1052 may specifically be that, for each sentence in the text content, the sentence is subjected to word splitting to obtain a word in the sentence; acquiring the part of speech of the word; and matching the sentence with a preset phrase regular expression to obtain phrases in the sentence and the like. In addition, the automatic correction device for the handwritten composition can also adjust the phrase regular expression according to the matching result.
S1053, aiming at each sentence, selecting a corresponding error recognition model to recognize the sentence according to the analysis result of the sentence, and acquiring error information in the sentence.
In this embodiment, a preset model library is pre-stored with an error recognition model corresponding to each word, part of speech of the word, phrase, or type of the phrase, and for the type of the word, part of speech of the word, phrase, or phrase in the sentence, the model library is queried to obtain the corresponding error recognition model, and the sentence is input into the corresponding error recognition model to obtain error information in the sentence. The error information may be, for example, preposition use error, phrase use error, word use error, or the like.
Specifically, the process of executing step 1053 by the automatic handwritten composition correcting device may specifically be to, for each sentence, obtain context information corresponding to each word in the sentence; determining a matrix vector corresponding to each word according to the context information corresponding to each word; selecting an error recognition model corresponding to the part of speech according to the part of speech of each word; and inputting the matrix vector corresponding to each word in the sentence into the error recognition model corresponding to the part of speech to acquire error information in the sentence.
The context information corresponding to each word in the sentence refers to information of other words within a preset distance of each word in the sentence and information of phrases including the word in the sentence. For example, in the sentence "I am interested in soving", other words within a preset distance of the preposition "in" may include "am", "interested", "soving", for example. The matrix vector corresponding to each Word may be, for example, a matrix vector generated by the Word2Vec model according to context information corresponding to the Word.
Wherein, the part of speech is, for example, noun, verb, preposition, etc. In this embodiment, after the matrix vector corresponding to each word in the sentence is input into the error recognition model corresponding to the part of speech, the error recognition model may determine, according to the matrix vector corresponding to the word, the probability that the word is each word, for example, the probability that the preposition "in" mentioned in the above example is "on", the probability that the preposition "is" at ", the probability that the preposition" is "before", the probability that the after ", the probability that the preposition" is "sine", and the like, and when the preposition with the highest corresponding probability is different from the preposition "in the sentence, the preposition with the highest corresponding probability is determined as the error information in the sentence; and if the preposition with the highest probability is the preposition ' in ', the preposition ' in the sentence is correctly used and has no error information.
Further, in order to further improve the accuracy of obtaining the error information, step 105 may further include: aiming at each sentence, comparing the sentence with a preset error mode library to obtain error information in the sentence; the error pattern library comprises: and the regular expressions correspond to a plurality of error patterns.
S1054, inputting the error information in the sentence into a preset correction suggestion model, and obtaining the correction suggestion corresponding to the sentence.
In this embodiment, after the error information in each sentence is obtained, the error information in the sentence or the sentence including the error information may be input into the preset correction suggestion model, and the correction suggestion corresponding to the sentence is obtained.
The training process of the correction suggestion model may be to obtain a correction training sample, where the correction training sample includes: and training an initial correction suggestion model according to a correction training sample to obtain the preset correction suggestion model. Wherein, the correcting and suggesting model can be a time recurrent neural network.
S1055, generating a correction result corresponding to the handwritten composition image according to the correction suggestion corresponding to each sentence.
In this embodiment, for each sentence, the error information obtained by using the error recognition model for recognition may be repeated with the error information obtained by using the preset error pattern library for comparison, so that a plurality of correction suggestions exist for each sentence, and therefore, the process of executing step 1055 by the automatic correction device for handwritten compositions may be specifically that, for each sentence, when a plurality of correction suggestions corresponding to the sentence are provided, the plurality of correction suggestions corresponding to the sentence are input into the preset correction selection model, and a single correction suggestion corresponding to the sentence is obtained; and generating a correction result corresponding to the handwritten composition image according to the single correction suggestion corresponding to each sentence.
In this embodiment, the modification selection model may score the modification suggestions corresponding to the sentence, and use the modification suggestion with the highest score as the most likely modification suggestion corresponding to the sentence.
Further, on the basis of the above embodiment, the handwritten text image may be scored, and therefore, after step 105, the method may further include:
s106, acquiring characteristic information in the text content; the characteristic information includes any one or more of the following information: lexical information, grammatical information, sentence information, or modification information.
And S107, when the characteristic information is vocabulary information, grammar information or sentence information, inputting the characteristic information into a corresponding scoring model, and acquiring a score corresponding to the characteristic information.
S108, the characteristic information comprises: and inputting the characteristic information into a preset comprehensive scoring model when the vocabulary information, the grammar information, the sentence information and the correction information are carried out, and acquiring the comprehensive scoring corresponding to the handwritten composition image.
For example, when the feature information includes vocabulary information, inputting the vocabulary information into a corresponding vocabulary scoring model to obtain vocabulary scores; under the condition that the characteristic information comprises grammar information, inputting the grammar information into a corresponding grammar scoring model to obtain grammar scoring; when the characteristic information includes sentence information, such as the structure, length, etc. of a sentence, the sentence information is input into a corresponding sentence scoring model, and a sentence score is obtained.
In this embodiment, each word in the text content may be represented by a corresponding unique vector, for example, a one-hot vector. The dimension number of the one-hot vector is the total number of all words, the one-hot vector corresponding to each word is only 1 value in the corresponding dimension, and the other dimensions are all 0 values. For example, when the total number of all words is 5000, and the first word in the text content is the 1000 th word, the number of dimensions of the one-hot vector corresponding to the first word is 5000, the value of the 1000 th dimension in the one-hot vector corresponding to the first word is 1, and the values of the other dimensions are 0. In this embodiment, the vocabulary information in the text content may be specifically represented by a unique vector corresponding to each vocabulary in the text content, that is, the input of the vocabulary scoring model may be a vector set corresponding to the text content. The vector set corresponding to the text content refers to a vector set obtained by replacing each vocabulary in the text content with the corresponding unique vector.
Correspondingly, the training process of the vocabulary scoring model can be to obtain a vector set corresponding to the composition sample and vocabulary scoring corresponding to the composition sample; and inputting the vector set corresponding to the composition sample and the vocabulary scores corresponding to the composition sample into a vocabulary score model, and training the vocabulary score model.
In this embodiment, the grammar information and the sentence information in the text content may also be represented by a vector set corresponding to the text content. Correspondingly, the training process of the grammar score model specifically comprises the steps of obtaining a vector set corresponding to the composition sample and grammar scores corresponding to the composition sample; and inputting the vector set corresponding to the composition sample and the grammar score corresponding to the composition sample into a grammar score model, and training the grammar score model. The training process of the sentence scoring model specifically comprises the steps of obtaining a vector set corresponding to the composition sample and scoring sentences corresponding to the composition sample; and inputting the vector set corresponding to the composition sample and the sentence score corresponding to the composition sample into a sentence score model, and training the sentence score model.
For another example, when the feature information includes vocabulary information, grammar information, sentence information, and correction information, the feature information may be sequentially subjected to the following operations: vectorization operation, inputting the vectorized vector into a convolutional neural network CNN, inputting the output of the convolutional neural network CNN into a time recursive neural network LSTM, inputting the output of the LSTM into attention mechanism attention, and outputting the attention mechanism as a comprehensive score corresponding to the handwritten text image.
In this embodiment, the vocabulary scoring model, the grammar scoring model, the sentence scoring model, the convolutional neural network CNN, the time recursive neural network LSTM, and the attention mechanism attention may be trained according to corresponding training samples, which will not be described in detail herein.
In this embodiment, the correction information refers to the types of errors in the text content and the number of errors of each type of error. The vocabulary information, grammar information, and sentence information may be represented by a set of vectors corresponding to the text content. Correspondingly, the training process of the comprehensive scoring model specifically includes acquiring a vector set corresponding to the composition sample, error types in the composition sample, the number of errors of each error type, and comprehensive scoring corresponding to the composition sample; and inputting the vector set corresponding to the composition sample, the error types in the composition sample, the error quantity of each error type and the comprehensive score corresponding to the composition sample into a comprehensive scoring model, and training the comprehensive scoring model.
The automatic correction method of the handwritten composition of the embodiment of the invention obtains the image of the handwritten composition to be corrected; processing the handwritten composition image by adopting a connected domain analysis algorithm and a straight line detection algorithm, and removing interference and background lines in the handwritten composition image to obtain a processed handwritten composition image; segmenting the processed handwritten composition image and segmenting words to obtain a plurality of word image blocks; recognizing the plurality of word image blocks by adopting a preset word recognition model to obtain text contents corresponding to the handwritten composition images; segmenting the text content, analyzing the part of speech of each word to obtain the part of speech of each word, analyzing shallow syntax to obtain phrases and types of the phrases in the sentences, further selecting a rule strategy or a depth model of a specific error type, and detecting a specific grammatical error; for each detected grammar error, generating a correction suggestion by integrating the recognition result of the error recognition model and the correction suggestion model, thereby obtaining the correction result of the whole composition; based on the composition text content and the correction result, a convolutional neural network is constructed, and intelligent scoring of the composition text is realized, so that the accuracy and the efficiency of correction are improved.
Fig. 3 is a schematic structural diagram of an automatic handwritten composition correcting device according to an embodiment of the present invention. As shown in fig. 3, includes: an acquisition module 31, a processing module 32, a segmentation module 33, an identification module 34 and an approval module 35.
The acquiring module 31 is configured to acquire a handwritten composition image to be corrected;
the processing module 32 is configured to process the handwritten composition image by using a connected domain analysis algorithm and a line detection algorithm, and remove interference and background lines in the handwritten composition image to obtain a processed handwritten composition image;
a segmentation module 33, configured to segment the processed handwritten composition image and segment words to obtain a plurality of word image blocks;
the recognition module 34 is configured to recognize a plurality of word image blocks by using a preset word recognition model, and acquire text contents corresponding to the handwritten composition images;
and a correcting module 35, configured to perform sentence segmentation, syntactic analysis, and sentence correcting on the text content, and obtain a correcting result corresponding to the handwritten composition image.
The automatic correcting device for handwritten compositions provided by the invention can be hardware equipment, such as terminal equipment, a server cluster and the like, or software installed on the hardware equipment and the like. The handwritten composition image in this embodiment may be an electronic image obtained by photographing or scanning the handwritten composition.
In this embodiment, the automatic handwritten text modification device may perform binarization processing on the handwritten text image, and then process the handwritten text image by using a connected domain analysis algorithm and a line detection algorithm, so as to remove interference and background lines in the handwritten text image, and obtain a processed handwritten text image.
In the embodiment, the handwritten composition image is processed by adopting a connected domain analysis algorithm, and broken lines in a certain area in the handwritten composition image can be connected; the certain area is, for example, an area where a word is located. The handwritten text image is processed by adopting a connected component analysis algorithm, and the interference in the handwritten text image, such as the partial highlight condition when the handwritten text is photographed, can be removed.
In this embodiment, the handwritten text image is processed by using a straight line detection algorithm, so that background lines, such as horizontal lines, four lines, three grids, and the like of a notebook computer, in the handwritten text image can be detected and removed. Because in the handwritten composition, strokes of words are often adhered to the background line, or the words are embedded into the background line, after the words are segmented, the segmented word image blocks comprise partial background lines, and the recognition accuracy of the words is influenced, so that the background line in the handwritten composition image is detected and removed by adopting a straight line detection algorithm, the recognition accuracy of the words can be improved, and the correction accuracy is improved.
Further, because there may be adhesion between two adjacent lines of words in the handwritten composition, in order to avoid using two lines of words with adhesion as one line in the line segmentation process, the apparatus may further include: the splitting module is used for splitting words of the processed handwritten composition image before the processed handwritten composition image is subjected to line segmentation, and disconnecting the adhesion part between two adjacent lines in the handwritten composition image, so that two lines of words with adhesion conditions can be segmented into two lines in the line segmentation process, the line segmentation accuracy is improved, and the word recognition accuracy is further improved.
In this embodiment, in order to improve the recognition accuracy of the word recognition model, the word image block and the word recognition sample may be normalized according to a preset format corresponding to the word recognition model, and therefore, on the basis of the above embodiment, the apparatus may further include: and the normalization processing module is used for performing normalization processing on the word image blocks according to the preset format corresponding to the word recognition model to obtain the processed word image blocks.
Correspondingly, the device can further comprise: the training module is used for acquiring a word recognition sample; the word recognition samples include: handwriting word image samples and corresponding text content; carrying out normalization processing on the word recognition sample according to a preset format corresponding to the word recognition model to obtain a processed word recognition sample; and training the word recognition model according to the processed word recognition sample to obtain a preset word recognition model.
The normalization process for the word image block may be, for example, to adjust the brightness, color, and the like of the word image block to a format required by the word recognition model.
In the embodiment, in the process of training the word recognition model, in order to improve the training effect of the word recognition model, the word recognition samples can be divided according to the lengths of the words to obtain short word recognition samples, single word recognition samples and long word recognition samples; and training the word recognition model by sequentially adopting the short word recognition sample, the single word recognition sample and the long word recognition sample to obtain a preset word recognition model. Wherein, the length of the handwritten word in the short word recognition sample is less than that of the handwritten word in the single word recognition sample; the length of the handwritten word in the single word recognition sample is less than the length of the handwritten word in the long single word recognition sample.
Furthermore, in the process of word segmentation, over-segmentation and under-segmentation exist, wherein the over-segmentation refers to the segmentation of one word into two or more words; the under-segmentation refers to the segmentation of two or more words into one word, so that the word recognized by the word recognition model may be composed of a plurality of words or be a part of a word, and therefore, in order to improve the accuracy of word recognition, after the automatic correction device for handwritten composition recognizes the word corresponding to the word image block, the word may be compared with the word in the preset dictionary to obtain the word in the dictionary matched with the word, when the word is a part of the word in the dictionary, the second word corresponding to the word image block before or after the word image block is obtained, whether the combination of the word and the second word is the word in the dictionary is judged, and then the split word is combined according to the judgment result, and the under-segmented word is re-split and the like.
Further, since the recognition accuracy of the word recognition model is not one hundred percent, the recognition result of the word recognition model may have an error, and in order to improve the recognition accuracy of the handwritten composition image, the apparatus may further include: the input module is used for inputting the text content into a preset error correction model and acquiring the text content after error correction; the error correction model is composed of a plurality of language models; the language model is an N-element language model; the value of N is a positive integer.
Correspondingly, the training process of the error correction model specifically may be to obtain an error correction training sample; the error correction training samples comprise: text content samples and corresponding error corrected samples; normalizing the error correction training samples according to a preset format corresponding to the error correction model to obtain the processed error correction training samples; and training the error correction model according to the processed error correction training sample to obtain a preset error correction training model.
The processing process of the error correction model on the text content may specifically be to acquire a candidate word in the text content, acquire an analogous word of the candidate word, and use the analogous word as the candidate word; performing normalization processing on the candidate words, for example, performing normalization processing on capital and lowercase letters, full half angles, single and plural forms, digital punctuations, inserted spaces among the candidate words, punctuations before and after the candidate words and the like of the candidate words; taking the candidate words and the similar words in the text content as a list of candidate words, calculating the transition probability among the candidate words, acquiring the candidate words with the corresponding transition probability exceeding a first transition probability threshold, acquiring the most appropriate words in the candidate words by combining an optimal path search algorithm, and integrating to obtain the text content.
It should be noted that, in the process of calculating the transition probability between each column of candidate words, a binary language model (bigram) may be first used to calculate the transition probability between two adjacent columns of candidate words, and when the transition probability is smaller than a second transition probability threshold, the corresponding candidate word is deleted; when the transition probability exceeds a second transition probability threshold, the transition probability between three or more continuous columns of candidate words is not calculated by adopting a ternary language model (trigram) or a multivariate language model any more, so that the calculation amount of the transition probability is reduced, and the processing speed of an error correction model on the text content is improved.
The automatic correction device for the handwritten composition of the embodiment of the invention obtains the image of the handwritten composition to be corrected; processing the handwritten text image by adopting a connected domain analysis algorithm and a straight line detection algorithm, and removing interference and background lines in the handwritten text image to obtain a processed handwritten text image; segmenting the processed handwritten composition image and segmenting words to obtain a plurality of word image blocks; recognizing the plurality of word image blocks by adopting a preset word recognition model to obtain text contents corresponding to the handwritten composition images; the method comprises the steps of carrying out sentence segmentation, syntactic analysis and sentence correction on text contents, and obtaining a correction result corresponding to a handwritten composition image, so that before correcting the handwritten composition image, interference and background lines in the handwritten composition image are removed, the handwritten composition image is identified, corresponding text contents are obtained, and then the text contents are corrected, and therefore the accuracy and the correction efficiency of correction are improved.
Fig. 4 is a schematic structural diagram of another automatic handwritten composition correcting device according to an embodiment of the present invention. As shown in fig. 4, on the basis of the embodiment shown in fig. 3, the modifying module 35 may specifically include: a segmentation unit 351, an analysis unit 352, a recognition unit 353, an input unit 354, and a generation unit 355.
The segmentation unit 351 is configured to segment sentences of the text content to obtain a plurality of sentences in the text content;
an analyzing unit 352, configured to perform syntactic analysis on each sentence of the text content, and obtain an analysis result of each sentence; the analysis result comprises: words, phrases, parts of speech of the words and types of the phrases included in the sentence;
the recognition unit 353 is configured to select, for each sentence, a corresponding error recognition model according to an analysis result of the sentence to recognize the sentence, and acquire error information in the sentence;
an input unit 354, configured to input error information in the sentence into a preset correction suggestion model, and obtain a correction suggestion corresponding to the sentence;
the generating unit 355 is configured to generate a correction result corresponding to the handwritten composition image according to the correction suggestion corresponding to each sentence.
In this embodiment, the segmentation unit 351 may be specifically configured to obtain a type corresponding to the text content; the type is used for identifying the accuracy of text content sentence division; acquiring features to be extracted corresponding to the types; according to the features to be extracted, feature extraction is carried out on the text content, and segmentation feature information in the text content is obtained; and carrying out sentence segmentation on the text content according to the segmentation characteristic information to obtain a plurality of sentences in the text content.
In this embodiment, the automatic correction device for handwritten compositions may determine the type corresponding to the text content by using a machine learning method. The type can be a handwritten composition image or a common composition specifically; the type of the handwritten text image means that the text content is text content obtained by recognizing the handwritten text image.
For example, in the case that the type is a handwritten composition image, since punctuation marks are mistakenly used in the process of handwriting, in order to improve the accuracy of sentence division, the feature to be extracted corresponding to the type may be an initial letter in capitalization or the like, and the text content is sentence-divided according to the features to obtain a plurality of sentences in the text content. When the type is a common composition, the feature to be extracted corresponding to the type may be a punctuation mark or the like.
Further, in order to improve the accuracy of the modification, before the segmentation unit 351 segments the text content, the apparatus may perform a preprocessing operation on the text content, such as an encoding operation, a verification operation, an interference filtering operation, a normalization operation, and the like.
In this embodiment, the analysis unit 352 may be specifically configured to, for each sentence in the text content, split a word of the sentence, and obtain a word in the sentence; acquiring the part of speech of the word; and matching the sentence with a preset regular expression of phrases to obtain the phrases in the sentence and the like. In addition, the automatic correction device for the handwritten composition can also adjust the phrase regular expression according to the matching result.
In this embodiment, a preset model library stores error recognition models corresponding to words, parts of speech of words, or phrases in advance, the model library is queried for the parts of speech of words, or phrases in a sentence, a corresponding error recognition model is obtained, the sentence is input into the corresponding error recognition model, and error information in the sentence is obtained. The error information may be, for example, a preposition use error, a phrase use error, a word use error, or the like.
Further, the identifying unit 353 is specifically configured to, for each sentence, obtain context information corresponding to each word in the sentence; determining a matrix vector corresponding to each word according to the context information corresponding to each word; selecting an error recognition model corresponding to the part of speech according to the part of speech of each word; and inputting the matrix vector corresponding to each word in the sentence into the error recognition model corresponding to the part of speech to acquire error information in the sentence.
The context information corresponding to each word in the sentence refers to information of other words within a preset distance of each word in the sentence and information of phrases including the word in the sentence. For example, in the sentence "I am interested in soving", other words within a preset distance of the preposition "in" may include "am", "interested", "soving", for example. The matrix vector corresponding to each Word may be, for example, a matrix vector generated by the Word2Vec model according to context information corresponding to the Word.
Wherein, the part of speech is, for example, noun, verb, preposition, etc. In this embodiment, after the matrix vector corresponding to each word in the sentence is input into the error recognition model corresponding to the part of speech, the error recognition model may determine, according to the matrix vector corresponding to the word, the probability that the word is each word, for example, the probability that the preposition "in" mentioned in the above example is "on", the probability that the preposition "is" at ", the probability that the preposition" is "before", the probability that the after ", the probability that the preposition" is "sine", and the like, and when the preposition with the highest corresponding probability is different from the preposition "in the sentence, the preposition with the highest corresponding probability is determined as the error information in the sentence; and if the preposition with the highest probability is the preposition ' in ', the preposition ' in the sentence is correctly used and has no error information.
Further, on the basis of the above embodiment, the modifying module 35 may further include: the comparison unit is used for comparing each sentence with a preset error pattern library to acquire error information in the sentence; the error pattern library comprises: and the regular expressions correspond to various error patterns.
In this embodiment, after obtaining the error information in each sentence, the error information in the sentence, or the sentence including the error information, may be input into the preset correction suggestion model, and the correction suggestion corresponding to the sentence is obtained.
Wherein, the training process of the correction suggestion model can be to obtain a correction training sample, which includes: and training an initial correction suggestion model according to a correction training sample to obtain the preset correction suggestion model. Wherein, the correcting and suggesting model can be a time recurrent neural network.
In this embodiment, the generating unit 355 is specifically configured to, for each sentence, input the multiple correction suggestions corresponding to the sentence into a preset correction selection model when the correction suggestion corresponding to the sentence is multiple, and obtain a single correction suggestion corresponding to the sentence; and generating a correction result corresponding to the handwritten composition image according to the single correction suggestion corresponding to each sentence.
In this embodiment, the modification selection model may score the modification suggestions corresponding to the sentence, and use the modification suggestion with the highest score as the most likely modification suggestion corresponding to the sentence.
Further, on the basis of the above embodiment, the handwritten text image may be scored, and therefore, the apparatus may further include: the scoring module is used for acquiring characteristic information in the text content; the characteristic information includes any one or more of the following information: vocabulary information, grammar information, sentence information, or correction information; when the characteristic information is vocabulary information or grammatical information or sentence information, inputting the characteristic information into a corresponding scoring model, and acquiring a score corresponding to the characteristic information; the characteristic information includes: and inputting the characteristic information into a preset comprehensive scoring model when the vocabulary information, the grammar information, the sentence information and the correction information are carried out, and acquiring a comprehensive score corresponding to the handwritten composition image.
For example, when the feature information includes vocabulary information, inputting the vocabulary information into a corresponding vocabulary scoring model to obtain vocabulary scores; under the condition that the characteristic information comprises grammar information, inputting the grammar information into a corresponding grammar scoring model to obtain grammar scoring; and when the characteristic information comprises sentence information, such as the structure, the length and the like of a sentence, inputting the sentence information into a corresponding sentence scoring model to obtain a sentence score.
In this embodiment, each word in the text content may be represented by a corresponding unique vector, for example, a one-hot vector. The dimension number of the one-hot vector is the total number of all the words, the one-hot vector corresponding to each word has a value of 1 only in the corresponding dimension, and the other dimensions are all values of 0. For example, when the total number of all words is 5000, and the first word in the text content is the 1000 th word, the number of dimensions of the one-hot vector corresponding to the first word is 5000, the value of the 1000 th dimension in the one-hot vector corresponding to the first word is 1, and the values of the other dimensions are 0. In this embodiment, the vocabulary information in the text content may be specifically represented by a unique vector corresponding to each vocabulary in the text content, that is, the input of the vocabulary scoring model may be a vector set corresponding to the text content. The vector set corresponding to the text content refers to a vector set obtained by replacing each vocabulary in the text content with the corresponding unique vector.
Correspondingly, the training process of the vocabulary scoring model can be to obtain a vector set corresponding to the composition sample and vocabulary scoring corresponding to the composition sample; and (4) inputting the vector set corresponding to the composition sample and the vocabulary score corresponding to the composition sample into a vocabulary scoring model, and training the vocabulary scoring model.
In this embodiment, the grammar information and the sentence information in the text content may also be represented by a vector set corresponding to the text content. Correspondingly, the training process of the grammar score model specifically comprises the steps of obtaining a vector set corresponding to the composition sample and grammar scores corresponding to the composition sample; and inputting the vector set corresponding to the composition sample and the grammar score corresponding to the composition sample into a grammar score model, and training the grammar score model. The training process of the sentence scoring model specifically comprises the steps of obtaining a vector set corresponding to the composition sample and scoring sentences corresponding to the composition sample; and inputting the vector set corresponding to the composition sample and the sentence score corresponding to the composition sample into a sentence score model, and training the sentence score model.
For another example, when the feature information includes vocabulary information, grammar information, sentence information, and correction information, the feature information may be sequentially subjected to the following operations: vectorization operation, inputting the vectorized vector into a convolutional neural network CNN, inputting the output of the convolutional neural network CNN into a time recursive neural network LSTM, inputting the output of the LSTM into attention mechanism attention, and outputting the attention mechanism as a comprehensive score corresponding to the handwritten text image.
In this embodiment, the vocabulary scoring model, the grammar scoring model, the sentence scoring model, the convolutional neural network CNN, the time recursive neural network LSTM, and the attention mechanism attention may be trained according to corresponding training samples, which will not be described in detail herein.
In this embodiment, the correction information refers to the types of errors in the text content and the number of errors of each type of error. The vocabulary information, grammar information, and sentence information may be represented by a set of vectors corresponding to the text content. Correspondingly, the training process of the comprehensive scoring model specifically includes acquiring a vector set corresponding to the composition sample, error types in the composition sample, the number of errors of each error type, and comprehensive scoring corresponding to the composition sample; and inputting the vector set corresponding to the composition sample, the error types in the composition sample, the error quantity of each error type and the comprehensive score corresponding to the composition sample into a comprehensive scoring model, and training the comprehensive scoring model.
The automatic correction device for the handwritten composition of the embodiment of the invention obtains the image of the handwritten composition to be corrected; processing the handwritten text image by adopting a connected domain analysis algorithm and a straight line detection algorithm, and removing interference and background lines in the handwritten text image to obtain a processed handwritten text image; segmenting the processed handwritten composition image and segmenting words to obtain a plurality of word image blocks; recognizing the plurality of word image blocks by adopting a preset word recognition model to obtain text contents corresponding to the handwritten composition images; segmenting the text content, analyzing the part of speech of each word to obtain the part of speech of each word, analyzing shallow syntax to obtain phrases and types of the phrases in the sentences, further selecting a rule strategy or a depth model of a specific error type, and detecting a specific grammatical error; for each detected grammar error, generating a correction suggestion by integrating the recognition result of the error recognition model and the correction suggestion model, thereby obtaining the correction result of the whole composition; and constructing a convolutional neural network based on the composition text content and the correction result, and realizing intelligent scoring of the composition text.
Fig. 5 is a schematic structural diagram of another automatic handwritten composition correcting device according to an embodiment of the present invention. The automatic correction device of handwritten composition includes:
memory 1001, processor 1002, and computer programs stored on memory 1001 and executable on processor 1002.
The processor 1002 implements the automatic correction method of the handwritten composition provided in the above embodiments when executing the program.
Further, the automatic correction device for handwritten compositions further comprises:
a communication interface 1003 for communicating between the memory 1001 and the processor 1002.
A memory 1001 for storing computer programs that can be run on the processor 1002.
Memory 1001 may include high-speed RAM memory and may also include non-volatile memory (e.g., at least one disk memory).
The processor 1002 is configured to implement the automatic modifying method for handwritten compositions according to the above embodiments when executing the program.
If the memory 1001, the processor 1002, and the communication interface 1003 are implemented independently, the communication interface 1003, the memory 1001, and the processor 1002 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 5, but this is not intended to represent only one bus or type of bus.
Optionally, in a specific implementation, if the memory 1001, the processor 1002, and the communication interface 1003 are integrated on one chip, the memory 1001, the processor 1002, and the communication interface 1003 may complete communication with each other through an internal interface.
The processor 1002 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present invention.
The present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method for automatic correction of handwritten compositions as described above.
The present invention also provides a computer program product, wherein when executed by an instruction processor of the computer program product, the computer program product performs a method for automatically correcting a handwritten composition, the method comprising:
acquiring a handwritten composition image to be corrected;
processing the handwritten text image by adopting a connected domain analysis algorithm and a straight line detection algorithm, and removing interference and background lines in the handwritten text image to obtain a processed handwritten text image;
segmenting the processed handwritten composition image and segmenting words to obtain a plurality of word image blocks;
recognizing a plurality of word image blocks by adopting a preset word recognition model to obtain text contents corresponding to the handwritten composition images;
and carrying out sentence segmentation, syntactic analysis and sentence correction on the text content to obtain a correction result corresponding to the handwritten composition image.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (13)

1. An automatic correction method of handwritten compositions is characterized by comprising the following steps:
acquiring a handwritten composition image to be corrected;
processing the handwritten text image by adopting a connected domain analysis algorithm and a straight line detection algorithm, and removing interference and background lines in the handwritten text image to obtain a processed handwritten text image;
segmenting the processed handwritten composition image and segmenting words to obtain a plurality of word image blocks;
recognizing a plurality of word image blocks by adopting a preset word recognition model to obtain text contents corresponding to the handwritten composition images;
carrying out sentence segmentation, syntactic analysis and sentence correction on the text content to obtain a correction result corresponding to the handwritten composition image;
before segmenting the processed handwritten composition image and segmenting words to obtain a plurality of word image blocks, the method further comprises the following steps:
splitting words of the processed handwritten composition image, and disconnecting the adhesion part between two adjacent lines in the handwritten composition image;
before the preset word recognition model is adopted to recognize the plurality of word image blocks and the text content corresponding to the handwritten composition image is obtained, the method further comprises the following steps:
obtaining a word recognition sample; the word recognition samples include: handwritten word image samples and corresponding text content;
normalizing the word recognition sample according to a preset format corresponding to the word recognition model to obtain a processed word recognition sample;
training the word recognition model according to the processed word recognition sample to obtain the preset word recognition model;
training the word recognition model according to the processed word recognition sample to obtain the preset word recognition model, including:
dividing the word recognition samples according to the word length to obtain short word recognition samples, single word recognition samples and long word recognition samples;
training the word recognition model by sequentially adopting the short word recognition sample, the single word recognition sample and the long word recognition sample to obtain the preset word recognition model;
after the words corresponding to the word image blocks are identified, the words can be compared with the words in a preset dictionary, the words in the dictionary matched with the words are obtained, when the words are part of the words in the dictionary, the second words corresponding to the word image blocks before or after the word image blocks are obtained, whether the combination of the words and the second words is the words in the dictionary or not is judged, the over-split words are combined according to the judgment result, and the under-split words are split again, wherein the over-splitting refers to splitting one word into two or more words, and the under-splitting refers to splitting two or more words into one word;
the sentence segmentation, syntactic analysis and sentence correction are carried out on the text content, and a correction result corresponding to the handwritten composition image is obtained, and the method comprises the following steps:
carrying out sentence segmentation on the text content to obtain a plurality of sentences in the text content;
performing syntactic analysis on each sentence of the text content to obtain an analysis result of each sentence; the analysis result comprises: words, phrases, parts of speech of the words and types of the phrases included in the sentence;
for each sentence, selecting a corresponding error recognition model to recognize the sentence according to the analysis result of the sentence, and acquiring error information in the sentence;
inputting error information in the sentence into a preset correction suggestion model to obtain a correction suggestion corresponding to the sentence;
generating a correction result corresponding to the handwritten composition image according to the correction suggestion corresponding to each sentence;
the sentence segmentation is performed on the text content to obtain a plurality of sentences in the text content, and the sentence segmentation includes:
acquiring a type corresponding to the text content; the type is used for identifying the accuracy of the text content sentence division;
acquiring the feature to be extracted corresponding to the type;
according to the features to be extracted, feature extraction is carried out on the text content, and segmentation feature information in the text content is obtained;
carrying out sentence segmentation on the text content according to the segmentation characteristic information to obtain a plurality of sentences in the text content;
selecting a corresponding error recognition model to recognize the sentence according to the analysis result of the sentence, and before acquiring error information in the sentence, the method further comprises the following steps:
aiming at each sentence, comparing the sentence with a preset error mode library to acquire error information in the sentence; the error pattern library comprises: regular expressions corresponding to various error modes;
generating a correction result corresponding to the handwritten composition image according to the correction suggestion corresponding to each sentence, wherein the correction result comprises the following steps:
for each sentence, when a plurality of correction suggestions corresponding to the sentence are provided, inputting the plurality of correction suggestions corresponding to the sentence into a preset correction selection model, and acquiring a single correction suggestion corresponding to the sentence;
generating a correction result corresponding to the handwritten composition image according to the single correction suggestion corresponding to each sentence;
the modification selection model can score the modification suggestions corresponding to the sentences, and the modification suggestion with the highest score is used as the most possible modification suggestion corresponding to the sentences.
2. The method according to claim 1, wherein before recognizing the plurality of word image blocks by using a preset word recognition model and acquiring the text content corresponding to the handwritten composition image, the method further comprises:
and normalizing the word image blocks according to a preset format corresponding to the word recognition model to obtain the processed word image blocks.
3. The method according to claim 1, wherein before performing sentence segmentation, syntactic analysis, and sentence modification on the text content and obtaining a modification result corresponding to the handwritten composition image, the method further comprises:
inputting the text content into a preset error correction model to obtain the text content after error correction; the error correction model is composed of a plurality of language models; the language model is an N-element language model; the value of N is a positive integer.
4. The method according to claim 3, wherein before inputting the text content into a preset error correction model and obtaining the corrected text content, the method further comprises:
acquiring an error correction training sample; the error correction training samples comprise: text content samples and corresponding error corrected samples;
normalizing the error correction training samples according to a preset format corresponding to the error correction model to obtain the processed error correction training samples;
and training the error correction model according to the processed error correction training sample to obtain the preset error correction model.
5. The method according to claim 1, wherein for each sentence, selecting a corresponding error recognition model according to the analysis result of the sentence to recognize the sentence, and obtaining error information in the sentence, comprises:
acquiring context information corresponding to each word in each sentence aiming at each sentence;
determining a matrix vector corresponding to each word according to the context information corresponding to each word;
selecting an error recognition model corresponding to the part of speech according to the part of speech of each word;
and inputting the matrix vector corresponding to each word in the sentence into the error recognition model corresponding to the part of speech to acquire the error information in the sentence.
6. The method of claim 5, wherein before inputting the matrix vector corresponding to each word in the sentence into the model for identifying the error according to part of speech and obtaining the error information in the sentence, the method further comprises:
acquiring error recognition training samples corresponding to the parts of speech; the error recognition training sample comprises: the matrix vector corresponding to the word with the part of speech and the error information corresponding to the word;
and aiming at the error recognition models corresponding to the parts of speech, adopting the error recognition training samples corresponding to the parts of speech to train the error recognition models.
7. The method according to claim 1, wherein before inputting the error information in the sentence into a preset modifying suggestion model and obtaining the modifying suggestion corresponding to the sentence, the method further comprises:
acquiring a correction training sample; the correction training sample comprises: the method comprises the steps of sentence samples, error information in the sentence samples and correction suggestions corresponding to the sentence samples;
and training a correction suggestion model according to the correction training sample to obtain the preset correction suggestion model.
8. The method according to claim 1, wherein after performing sentence segmentation, syntactic analysis, and sentence modification on the text content and obtaining a modification result corresponding to the handwritten composition image, the method further comprises:
acquiring characteristic information in the text content; the characteristic information comprises any one or more of the following information: vocabulary information, grammar information, sentence information, or correction information;
when the characteristic information is vocabulary information, grammar information or sentence information, inputting the characteristic information into a corresponding scoring model, and acquiring a score corresponding to the characteristic information;
the characteristic information comprises: and inputting the characteristic information into a preset comprehensive scoring model when the vocabulary information, the grammar information, the sentence information and the correction information are carried out, and acquiring the comprehensive scoring corresponding to the handwritten composition image.
9. An automatic correction device for handwritten compositions, which implements the automatic correction method for handwritten compositions according to any one of claims 1 to 8, comprising:
the acquisition module is used for acquiring a handwritten composition image to be corrected;
the processing module is used for processing the handwritten composition image by adopting a connected domain analysis algorithm and a straight line detection algorithm, removing interference and background lines in the handwritten composition image and obtaining a processed handwritten composition image;
the segmentation module is used for segmenting the processed handwritten composition image and segmenting words to obtain a plurality of word image blocks; before segmenting the processed handwritten composition image and segmenting words to obtain a plurality of word image blocks, the method further comprises the following steps: splitting words of the processed handwritten composition image, and disconnecting the adhesion part between two adjacent lines in the handwritten composition image;
the recognition module is used for recognizing the word image blocks by adopting a preset word recognition model and acquiring text contents corresponding to the handwritten composition images;
and the correction module is used for carrying out sentence segmentation, syntactic analysis and sentence correction on the text content and obtaining a correction result corresponding to the handwritten composition image.
10. The apparatus of claim 9, wherein the wholesale module comprises:
the segmentation unit is used for carrying out sentence segmentation on the text content to obtain a plurality of sentences in the text content;
the analysis unit is used for carrying out syntactic analysis on each sentence of the text content and acquiring an analysis result of each sentence; the analysis result comprises: words, phrases, parts of speech of the words, and types of the phrases included in the sentence;
the recognition unit is used for selecting a corresponding error recognition model to recognize the sentence according to the analysis result of the sentence aiming at each sentence, and acquiring error information in the sentence;
the input unit is used for inputting the error information in the sentence into a preset correction suggestion model and acquiring the correction suggestion corresponding to the sentence;
and the generating unit is used for generating a correction result corresponding to the handwritten composition image according to the correction suggestion corresponding to each sentence.
11. The device according to claim 10, characterized in that the identification unit is specifically configured to,
acquiring context information corresponding to each word in each sentence aiming at each sentence;
determining a matrix vector corresponding to each word according to the context information corresponding to each word;
selecting an error recognition model corresponding to the part of speech according to the part of speech of each word;
and inputting the matrix vector corresponding to each word in the sentence into the error recognition model corresponding to the part of speech to acquire the error information in the sentence.
12. An automatic correcting device for handwritten compositions is characterized by comprising:
memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor implements the method for automatic correction of handwritten compositions according to any of claims 1-8 when executing the program.
13. A non-transitory computer-readable storage medium having stored thereon a computer program, wherein the program, when executed by a processor, implements a method for automatic modification of handwritten compositions according to any of claims 1 to 8.
CN201810223663.8A 2018-03-19 2018-03-19 Automatic correction method and device for handwritten composition Active CN108595410B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810223663.8A CN108595410B (en) 2018-03-19 2018-03-19 Automatic correction method and device for handwritten composition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810223663.8A CN108595410B (en) 2018-03-19 2018-03-19 Automatic correction method and device for handwritten composition

Publications (2)

Publication Number Publication Date
CN108595410A CN108595410A (en) 2018-09-28
CN108595410B true CN108595410B (en) 2023-03-24

Family

ID=63626800

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810223663.8A Active CN108595410B (en) 2018-03-19 2018-03-19 Automatic correction method and device for handwritten composition

Country Status (1)

Country Link
CN (1) CN108595410B (en)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111199801B (en) * 2018-11-19 2023-08-08 零氪医疗智能科技(广州)有限公司 Construction method and application of model for identifying disease types of medical records
CN109670040B (en) * 2018-11-27 2024-04-05 平安科技(深圳)有限公司 Writing assistance method and device, storage medium and computer equipment
CN111737968A (en) * 2019-03-20 2020-10-02 小船出海教育科技(北京)有限公司 Method and terminal for automatically correcting and scoring composition
CN110188274B (en) * 2019-05-30 2021-06-08 口口相传(北京)网络技术有限公司 Search error correction method and device
CN111079500B (en) * 2019-07-11 2023-10-27 广东小天才科技有限公司 Method and system for correcting dictation content
CN110489747A (en) * 2019-07-31 2019-11-22 北京大米科技有限公司 A kind of image processing method, device, storage medium and electronic equipment
CN110765996B (en) * 2019-10-21 2022-07-29 北京百度网讯科技有限公司 Text information processing method and device
CN110851599B (en) * 2019-11-01 2023-04-28 中山大学 Automatic scoring method for Chinese composition and teaching assistance system
CN113361511B (en) * 2020-03-05 2024-10-01 顺丰科技有限公司 Correction model establishing method, device, equipment and computer readable storage medium
CN111950240B (en) * 2020-08-26 2024-12-03 北京高途云集教育科技有限公司 A data correction method, device and system
CN112036161A (en) * 2020-09-02 2020-12-04 中国平安人寿保险股份有限公司 Requirement document processing method, device, equipment and storage medium
CN112149680B (en) * 2020-09-28 2024-01-16 武汉悦学帮网络技术有限公司 Method and device for detecting and identifying wrong words, electronic equipment and storage medium
CN113536743B (en) * 2020-11-06 2024-08-06 腾讯科技(深圳)有限公司 Text processing method and related device
CN112597754B (en) * 2020-12-23 2023-11-21 北京百度网讯科技有限公司 Text error correction method, apparatus, electronic device and readable storage medium
CN112634689A (en) * 2020-12-24 2021-04-09 广州奇大教育科技有限公司 Application method of regular expression in automatic subjective question changing in computer teaching
CN112528651A (en) * 2021-02-08 2021-03-19 深圳市阿卡索资讯股份有限公司 Intelligent correction method, system, electronic equipment and storage medium
CN113836894B (en) * 2021-09-26 2023-08-15 武汉天喻信息产业股份有限公司 Multi-dimensional English composition scoring method and device and readable storage medium
CN114489439A (en) * 2022-01-20 2022-05-13 安徽淘云科技股份有限公司 Article correcting method and related equipment thereof

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW315432B (en) * 1995-08-31 1997-09-11 Nat Univ Tsing Hua The auto debugging and correcting device and method for computer document
JPH09305714A (en) * 1996-05-17 1997-11-28 N T T Data Tsushin Kk System and method for recognizing character
WO2005045786A1 (en) * 2003-10-27 2005-05-19 Educational Testing Service Automatic essay scoring system
WO2011044658A1 (en) * 2009-10-15 2011-04-21 2167959 Ontario Inc. System and method for text cleaning
WO2012039686A1 (en) * 2010-09-24 2012-03-29 National University Of Singapore Methods and systems for automated text correction
WO2016147330A1 (en) * 2015-03-18 2016-09-22 株式会社日立製作所 Text processing method and text processing system
WO2017043130A1 (en) * 2015-09-07 2017-03-16 信也 赤木 Text evaluation device, text evaluation method, and program

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6085206A (en) * 1996-06-20 2000-07-04 Microsoft Corporation Method and system for verifying accuracy of spelling and grammatical composition of a document
US6154579A (en) * 1997-08-11 2000-11-28 At&T Corp. Confusion matrix based method and system for correcting misrecognized words appearing in documents generated by an optical character recognition technique
US6424983B1 (en) * 1998-05-26 2002-07-23 Global Information Research And Technologies, Llc Spelling and grammar checking system
US6950555B2 (en) * 2001-02-16 2005-09-27 Parascript Llc Holistic-analytical recognition of handwritten text
US7031911B2 (en) * 2002-06-28 2006-04-18 Microsoft Corporation System and method for automatic detection of collocation mistakes in documents
CN103294660B (en) * 2012-02-29 2015-09-16 张跃 A kind of english composition automatic scoring method and system
US20140214401A1 (en) * 2013-01-29 2014-07-31 Tencent Technology (Shenzhen) Company Limited Method and device for error correction model training and text error correction
CN103365838B (en) * 2013-07-24 2016-04-20 桂林电子科技大学 Based on the english composition grammar mistake method for automatically correcting of diverse characteristics
IL235565B (en) * 2014-11-06 2019-06-30 Kolton Achiav Location based optical character recognition (ocr)
CN105045779A (en) * 2015-07-13 2015-11-11 北京大学 Deep neural network and multi-tag classification based wrong sentence detection method
CN105183713A (en) * 2015-08-27 2015-12-23 北京时代焦点国际教育咨询有限责任公司 English composition automatic correcting method and system
CN105279149A (en) * 2015-10-21 2016-01-27 上海应用技术学院 A Chinese Text Automatic Correction Method
CN106610930B (en) * 2015-10-22 2019-09-03 科大讯飞股份有限公司 Foreign language writing methods automatic error correction method and system
CN107403130A (en) * 2017-04-19 2017-11-28 北京粉笔未来科技有限公司 A kind of character identifying method and character recognition device
CN107357775A (en) * 2017-06-05 2017-11-17 百度在线网络技术(北京)有限公司 The text error correction method and device of Recognition with Recurrent Neural Network based on artificial intelligence
CN107239449A (en) * 2017-06-08 2017-10-10 锦州医科大学 A kind of English recognition methods and interpretation method
CN107704859A (en) * 2017-11-01 2018-02-16 哈尔滨工业大学深圳研究生院 A kind of character recognition method based on deep learning training framework

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW315432B (en) * 1995-08-31 1997-09-11 Nat Univ Tsing Hua The auto debugging and correcting device and method for computer document
JPH09305714A (en) * 1996-05-17 1997-11-28 N T T Data Tsushin Kk System and method for recognizing character
WO2005045786A1 (en) * 2003-10-27 2005-05-19 Educational Testing Service Automatic essay scoring system
WO2011044658A1 (en) * 2009-10-15 2011-04-21 2167959 Ontario Inc. System and method for text cleaning
WO2012039686A1 (en) * 2010-09-24 2012-03-29 National University Of Singapore Methods and systems for automated text correction
WO2016147330A1 (en) * 2015-03-18 2016-09-22 株式会社日立製作所 Text processing method and text processing system
WO2017043130A1 (en) * 2015-09-07 2017-03-16 信也 赤木 Text evaluation device, text evaluation method, and program

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于长短型记忆递归神经网络的英文手写识别;卫晓欣;《中国优秀硕士学位论文全文数据库信息科技辑》;20150115(第1期);I138-954 *

Also Published As

Publication number Publication date
CN108595410A (en) 2018-09-28

Similar Documents

Publication Publication Date Title
CN108595410B (en) Automatic correction method and device for handwritten composition
CN110569830B (en) Multilingual text recognition method, device, computer equipment and storage medium
CN110046350B (en) Grammar error recognition method, device, computer equipment and storage medium
CN109800414B (en) Method and system for recommending language correction
CN113435186B (en) Chinese text error correction system, method, device and computer readable storage medium
CN110276077A (en) Chinese error correction method, device and equipment
CN107341143B (en) Sentence continuity judgment method and device and electronic equipment
CN111859921A (en) Text error correction method and device, computer equipment and storage medium
RU2641225C2 (en) Method of detecting necessity of standard learning for verification of recognized text
CN114863429B (en) Text error correction method, training method and related equipment based on RPA and AI
CN113657098B (en) Text error correction method, device, equipment and storage medium
JP2000353215A (en) Character recognition device and recording medium where character recognizing program is recorded
US9286527B2 (en) Segmentation of an input by cut point classification
CN107273883B (en) Decision tree model training method, and method and device for determining data attributes in OCR (optical character recognition) result
CN106156017A (en) Information identifying method and information identification system
CN111651978A (en) Entity-based lexical examination method and device, computer equipment and storage medium
CN111368918A (en) Text error correction method, device, electronic device and storage medium
EP2138959B1 (en) Word recognizing method and word recognizing program
US9251412B2 (en) Segmentation of devanagari-script handwriting for recognition
JP5888222B2 (en) Information processing apparatus and information processing program
CN111079736A (en) Dictation content identification method and electronic equipment
CN112528980B (en) OCR recognition result correction method and terminal and system thereof
CN109086272B (en) Sentence pattern recognition method and system
CN112883740A (en) Chinese structure training method and device
JP2004046723A (en) Method for recognizing character, program and apparatus used for implementing the method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20230619

Address after: 6001, 6th Floor, No.1 Kaifeng Road, Shangdi Information Industry Base, Haidian District, Beijing, 100085

Patentee after: Beijing Baige Feichi Technology Co.,Ltd.

Address before: 100085 4001, 4th floor, No.1 Kaifa Road, Shangdi Information Industry base, Haidian District, Beijing

Patentee before: XIAOCHUANCHUHAI EDUCATION TECHNOLOGY (BEIJING) CO.,LTD.

TR01 Transfer of patent right