[go: up one dir, main page]

CN111626287B - Training method and device for recognition network for recognizing Chinese in scene - Google Patents

Training method and device for recognition network for recognizing Chinese in scene

Info

Publication number
CN111626287B
CN111626287B CN201910146791.1A CN201910146791A CN111626287B CN 111626287 B CN111626287 B CN 111626287B CN 201910146791 A CN201910146791 A CN 201910146791A CN 111626287 B CN111626287 B CN 111626287B
Authority
CN
China
Prior art keywords
scene
corpus
chinese
recognition network
chinese characters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910146791.1A
Other languages
Chinese (zh)
Other versions
CN111626287A (en
Inventor
郜业飞
董健
颜水成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201910146791.1A priority Critical patent/CN111626287B/en
Publication of CN111626287A publication Critical patent/CN111626287A/en
Application granted granted Critical
Publication of CN111626287B publication Critical patent/CN111626287B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/28Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet
    • G06V30/287Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet of Kanji, Hiragana or Katakana characters

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Character Discrimination (AREA)

Abstract

本发明提供了一种识别场景内中文的识别网络的训练方法和装置。该方法包括:利用常用汉字随机地生成第一语料样本;将所述第一语料样本与第一背景图像合成得到含有中文文字区域的第一合成场景图像样本;利用所述第一合成场景图像样本对用于识别场景内中文的识别网络进行训练。由于在随机生成的语料样本中,常用汉字的出现概率是趋向于均匀的,进而在利用基于随机生成的语料样本合成的场景图像样本对识别网络进行训练时,识别网络对所有的常用汉字能够见到的频率也是趋于一致的,从而一定程度上解决了中文文字的长尾分布问题,提升了场景中中文文字的识别效果。

This invention provides a training method and apparatus for a recognition network that identifies Chinese characters within a scene. The method includes: randomly generating a first corpus sample using commonly used Chinese characters; synthesizing the first corpus sample with a first background image to obtain a first synthetic scene image sample containing Chinese character regions; and training a recognition network for identifying Chinese characters within a scene using the first synthetic scene image sample. Since the probability of commonly used Chinese characters appearing in the randomly generated corpus sample tends to be uniform, when training the recognition network using the scene image sample synthesized from the randomly generated corpus sample, the frequency with which the recognition network sees all commonly used Chinese characters also tends to be consistent. This solves the long-tail distribution problem of Chinese characters to a certain extent and improves the recognition effect of Chinese characters in a scene.

Description

Training method and device for recognition network for recognizing Chinese in scene
Technical Field
The invention relates to the technical field of image recognition, in particular to a training method for recognizing a Chinese character recognition network in a scene, a training device for recognizing the Chinese character recognition network in the scene, a computer storage medium and computing equipment.
Background
At present, the deep learning technology is widely applied in the field of graphic images. OCR (Optical Character Recognition ) is widely used in a plurality of application scenes such as license plate recognition, street view recognition, network image/video monitoring and the like as a key link for interaction between electronic equipment and external environment in life. And due to the deep learning, the OCR recognition precision is remarkably improved, and the commercial product output of the related technology is promoted.
Nowadays, the application of scene character recognition models based on deep learning in English character recognition is widely studied by domestic and foreign scholars, and good recognition effect is obtained. However, since the Chinese has the characteristics of no special interval between characters, abundant number of characters, close character fonts, long tail distribution of corpus and the like, the English recognition scheme is directly migrated to the Chinese environment to perform Chinese scene character recognition, and the expectation is difficult to achieve.
Therefore, a method for improving the recognition effect of the Chinese characters in the scene by improving the long-tail word problem of the Chinese scene character recognition is needed.
Disclosure of Invention
In view of the foregoing, the present invention has been made to provide a training method of identifying a recognition network of chinese in a scene, a training apparatus of identifying a recognition network of chinese in a scene, a computer storage medium, and a computing device that overcome or at least partially solve the foregoing problems.
According to an aspect of the embodiment of the present invention, there is provided a training method for identifying a recognition network for recognizing chinese in a scene, including:
randomly generating a first corpus sample by using common Chinese characters;
synthesizing the first corpus sample and a first background image to obtain a first synthesized scene image sample containing Chinese character areas;
Training a recognition network for recognizing Chinese in a scene by using the first synthesized scene image sample.
Optionally, in the first corpus sample, the occurrence frequency of each Chinese character is controllable.
Optionally, in the first corpus sample, the occurrence frequencies of all Chinese characters are controlled to be equal.
Optionally, before randomly generating the first corpus sample using the commonly used chinese characters, the method further comprises:
the commonly used Chinese characters are obtained from a codebook used for Chinese character input.
Optionally, the method further comprises:
acquiring corpus with real semantic information;
Synthesizing the corpus with the real semantic information and a second background image to obtain a second synthesized scene image sample containing Chinese text areas;
Training the recognition network using the second composite scene image samples.
Optionally, the first background image is the same as the second background image.
Optionally, obtaining the corpus with real semantic information includes:
and intercepting words with specific length from the text material containing natural semantics as the corpus with real semantic information.
Optionally, the method further comprises:
Acquiring real scene image data;
And carrying out parameter adjustment on the identification network by utilizing the real scene image data.
Optionally, acquiring the real scene image data includes:
And labeling the real scene image, and cutting out a Chinese character area in the real scene image.
Optionally, the recognition network is used for recognizing chinese in natural scenes.
According to another aspect of the embodiment of the present invention, there is also provided a training apparatus for identifying a recognition network for recognizing chinese in a scene, including:
the random corpus generation module is suitable for randomly generating a first corpus sample by using common Chinese characters;
An image sample synthesis module adapted to synthesize the first corpus sample with a first background image to obtain a first synthesized scene image sample containing a Chinese text region, and
And the recognition network training module is suitable for training a recognition network for recognizing Chinese in a scene by using the first synthesized scene image sample.
Optionally, in the first corpus sample, the occurrence frequency of each Chinese character is controllable.
Optionally, in the first corpus sample, the occurrence frequencies of all Chinese characters are controlled to be equal.
Optionally, the random corpus generation module is further adapted to:
The commonly used Chinese characters are obtained from a codebook for Chinese character input before randomly generating a first corpus sample using the commonly used Chinese characters.
Optionally, the apparatus further comprises:
the real corpus acquisition module is suitable for acquiring the corpus with real semantic information;
the image sample synthesis module is further adapted to:
Synthesizing the corpus with the real semantic information and a second background image to obtain a second synthesized scene image sample containing Chinese text areas;
the recognition network training module is further adapted to:
Training the recognition network using the second composite scene image samples.
Optionally, the first background image is the same as the second background image.
Optionally, the real corpus acquisition module is further adapted to:
and intercepting words with specific length from the text material containing natural semantics as the corpus with real semantic information.
Optionally, the apparatus further comprises:
A real scene data acquisition module adapted to acquire real scene image data, and
And the identification network adjustment module is suitable for carrying out parameter adjustment on the identification network by utilizing the real scene image data.
Optionally, the real scene data acquisition module is further adapted to:
And labeling the real scene image, and cutting out a Chinese character area in the real scene image.
Optionally, the recognition network is used for recognizing chinese in natural scenes.
According to yet another aspect of embodiments of the present invention, there is also provided a computer storage medium storing computer program code which, when run on a computing device, causes the computing device to perform a training method of identifying a recognition network of chinese within a scene according to any of the preceding claims.
According to yet another aspect of an embodiment of the present invention, there is also provided a computing device including:
Processor, and
A memory storing computer program code;
The computer program code, when executed by the processor, causes the computing device to perform a training method of identifying a recognition network of chinese within a scene according to any of the above.
According to the training method and device for the recognition network for recognizing the Chinese in the scene, the corpus sample is randomly generated by using the common Chinese characters, the obtained corpus sample is synthesized with the background image to obtain the synthesized scene image sample containing the Chinese character area, and the recognition network is trained by using the synthesized scene image sample. Since only a small portion of the commonly used Chinese characters frequently appear in the natural corpus information, and other Chinese characters rarely or even do not appear (i.e., so-called long-tail distribution), if the recognition network is trained by using the natural corpus information material, a good recognition effect cannot be obtained for the Chinese characters with low occurrence frequency in the corpus. In the randomly generated corpus sample, the occurrence probability of the common Chinese characters tends to be uniform, and when the recognition network is trained by utilizing the scene image sample synthesized based on the randomly generated corpus sample, the frequency that the recognition network can see all the common Chinese characters tends to be consistent, so that the problem of long-tail distribution of Chinese characters is solved to a certain extent, and the recognition effect of the Chinese characters in a scene is improved.
Furthermore, the occurrence frequency of each Chinese character in the corpus sample synthesized randomly is controlled to be equal, so that the problem of long tail distribution of Chinese characters is further effectively solved.
Furthermore, after the first stage training is performed on the recognition network by using the scene image samples synthesized based on the corpus samples generated randomly, the second stage training may be performed on the recognition network by using the scene image samples synthesized based on the corpus with real semantic information, and finally fine tuning is performed on the recognition network by using the real scene image data. Through the multi-stage training strategy, the generalization capability of the recognition network is further improved, and the recognition effect of Chinese characters in the scene is further improved. The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.
The above, as well as additional objectives, advantages, and features of the present invention will become apparent to those skilled in the art from the following detailed description of a specific embodiment of the present invention when read in conjunction with the accompanying drawings.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
FIG. 1 illustrates a flowchart of a training method for identifying a recognition network for Chinese within a scene, according to an embodiment of the invention;
FIG. 2 is a flow chart of a training method for identifying a recognition network for Chinese in a scene according to another embodiment of the invention;
FIG. 3 is a schematic diagram showing a training apparatus for recognizing a recognition network of Chinese in a scene according to an embodiment of the present invention, and
Fig. 4 is a schematic structural diagram of a training apparatus for recognizing a recognition network of chinese in a scene according to another embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The current mainstream scene text recognition scheme is to extract features of an image text region by using a CRNN (Convolutional Recurrent Neural Network, convolutional neural network), and the process combines the extraction of spatial feature information in the image by using a CNN (Convolutional Neural Network ) and the coding capability of the RNN (Recurrent Neural Network, convolutional neural network) on time sequence information. Further, the encoding result of the text region is decoded by using a CTC (Connectionist Temporal Classification, connection timing classification) network to obtain corresponding text information.
In the field, english character recognition is widely studied by students at home and abroad, a plurality of recognition schemes are sequentially put forward, and good recognition results are obtained. For the english scenario, the english letters are only 26, even if digits are added, the total number is only tens, and spaces exist between each word of english. However, for Chinese scenes, chinese shows square words, and in a sentence, the distinction between words is not obvious (especially in the case of close fonts), nor is there a clear space gap. In particular, although there are about 5000-6000 chinese characters commonly used, in natural corpus (for example, a book), usually, 80% of the text appears in hundreds of words with the first frequency of use, while the other thousands of words rarely appear, which is the so-called long-tail distribution of chinese characters. In summary, the Chinese scene character recognition has the characteristics of no special interval among characters, abundant character quantity, close character fonts and long tail distribution of corpus, so that the English recognition scheme is directly migrated to the Chinese environment and is difficult to reach the expectation.
In order to solve the technical problems, the embodiment of the invention provides a training method for identifying a Chinese character recognition network in a scene. FIG. 1 illustrates a flowchart of a training method for identifying a recognition network for Chinese within a scene, according to an embodiment of the invention. Referring to fig. 1, the method may include at least the following steps S102 to S106.
Step S102, randomly generating a first corpus sample by using the common Chinese characters.
Step S104, the first corpus sample and the first background image are synthesized to obtain a first synthesized scene image sample containing Chinese character areas.
Step S106, training a recognition network for recognizing Chinese in the scene by using the first synthesized scene image sample.
In the embodiment of the invention, the recognition network is a deep learning network, and adopts a CRNN combined with CTC architecture, and mainly recognizes Chinese characters in natural scenes.
According to the training method and device for the recognition network for recognizing the Chinese in the scene, the corpus sample is randomly generated by using the common Chinese characters, the obtained corpus sample is synthesized with the background image to obtain the synthesized scene image sample containing the Chinese character area, and the recognition network is trained by using the synthesized scene image sample. Because the occurrence probability of the common Chinese characters in the randomly generated corpus sample tends to be uniform, when the recognition network is trained by utilizing the scene image sample synthesized based on the randomly generated corpus sample, the frequency that the recognition network can see all the common Chinese characters tends to be consistent, so that the problem of long-tail distribution of the Chinese characters is solved to a certain extent, and the recognition effect of the Chinese characters in a scene is improved.
In step S102 above, the first corpus sample is generated by combining the commonly used chinese characters. In order to make the distribution of Chinese characters in the generated corpus sample tend to be uniform, enough Chinese characters in common use can be used, for example, 5000-6000 Chinese characters in common use are used.
Alternatively, commonly used Chinese characters may be obtained from a codebook for Chinese character input (e.g., a codebook of a dog Chinese character input method). Preferably, the frequently used Chinese characters with the previous frequency are selected in the codebook.
In a preferred embodiment, to further solve the problem of long-tail distribution of the corpus, the occurrence frequency of each Chinese character in the randomly generated first corpus sample can be controlled, so that the text distribution in the corpus meets the requirement.
Furthermore, the occurrence frequencies of all Chinese characters in the first corpus sample are controlled to be equal, so that the Chinese characters in the corpus are uniformly distributed.
In step S104, the first background image may be an image of a real scene without text, and the first corpus sample is fused into the first background image to obtain a first synthesized scene image sample.
In step S106, the recognition network is trained by using the obtained first synthesized scene image sample, so that the frequencies of the recognition network for all the commonly used Chinese characters are consistent in the training process, and when the trained recognition network is used for recognizing the Chinese characters in the scene, a better and more accurate recognition effect on the Chinese characters (especially the Chinese characters with lower use frequency) can be achieved.
In an alternative embodiment of the present invention, after training the recognition network using the first synthesized scene image sample synthesized based on the first corpus sample generated randomly, the following steps may be further performed:
First, a corpus with real semantic information is obtained. And then, synthesizing the corpus with the real semantic information with a second background image to obtain a second synthesized scene image sample containing the Chinese text region. Finally, training the recognition network by using the second synthesized scene image sample.
The effect of Chinese recognition can be further improved by training the recognition network by using the scene image sample synthesized based on the corpus sample generated randomly (which may be called as first-stage training), and then training the recognition network by using the scene image sample synthesized based on the corpus with real semantic information (which may be called as second-stage training).
Alternatively, to simplify the composition of the scene image samples and the training operation of the recognition network, the second background image may employ the same image of the real scene as the first background image.
In practical applications, there may be various ways to obtain corpus with real semantic information. For example, text of a particular length may be truncated from text material containing natural semantics as a corpus with real semantic information. The text material may be, for example, news, books, etc.
In an alternative embodiment of the present invention, after training the recognition network with the first synthesized scene image samples synthesized based on the first corpus sample generated randomly or after training the recognition network with the second synthesized scene image samples synthesized based on the corpus with real semantic information, the following steps may be further performed:
and acquiring real scene image data, and further, utilizing the real scene image data to carry out parameter adjustment on the identification network.
Further, the real scene image data may be obtained by:
and labeling the real scene image, and cutting out a Chinese character area in the real scene image.
The parameters of the recognition network are finely adjusted by adopting the data set containing the Chinese real scene image, so that the generalization capability of the recognition network is improved, and the Chinese character recognition effect is further improved.
Having described various implementations of the links of the embodiment shown in fig. 1, the implementation process of the training method for identifying a chinese character recognition network in a scene of the present invention will be described in detail by a specific embodiment.
Fig. 2 is a flow chart of a training method for identifying a chinese character recognition network in a scene according to an embodiment of the present invention. In this embodiment, the recognition network is a deep learning network, and a CRNN-CTC combined architecture is adopted. Referring to fig. 2, the method may include at least the following steps S202 to S216.
Step S202, obtaining common Chinese characters from a codebook for Chinese character input, and randomly generating a first corpus sample by using the common Chinese characters, wherein the occurrence frequencies of all Chinese characters in the first corpus sample are controlled to be equal.
Step S204, the first corpus sample and the first background image are synthesized to obtain a first synthesized scene image sample containing Chinese character areas.
Step S206, training the recognition network for recognizing Chinese in the natural scene in a first stage by using the first synthesized scene image sample.
Step S208, intercepting words with specific length from the text material containing natural semantics as corpus with real semantic information.
The text material is, for example, news material, books, or the like.
Step S210, synthesizing the corpus with the real semantic information and a second background image to obtain a second synthesized scene image sample containing Chinese text areas, wherein the second background image is identical to the first background image.
Step S212, performing second-stage training on the identification network by using the second synthesized scene image sample.
And step S214, labeling the real scene image, and cutting out a Chinese character area in the real scene image to obtain a real scene image data set.
Step S216, performing parameter fine tuning on the identification network by using the real scene image data set.
In the embodiment, through a multi-stage training strategy, the problem of long tail words of Chinese scene character recognition is effectively solved, and the recognition effect of Chinese characters in a natural scene is improved.
Based on the same inventive concept, the embodiment of the invention also provides a training device for identifying the Chinese character in the scene, which is used for supporting the training method for identifying the Chinese character in the scene provided by any one embodiment or combination thereof. Fig. 3 shows a schematic structural diagram of a training apparatus 300 for identifying a recognition network of chinese in a scene according to an embodiment of the present invention. Referring to fig. 3, the apparatus 300 may include at least a random corpus generation module 310, an image sample synthesis module 320, and an identification network training module 330.
The functions of the components or devices of the training apparatus 300 for identifying the Chinese character recognition network in the recognition scene according to the embodiment of the present invention will be described, and the connection relationship between the components:
The random corpus generating module 310 is adapted to randomly generate a first corpus sample using commonly used Chinese characters.
The image sample synthesis module 320 is connected to the random corpus generation module 310, and is adapted to synthesize the first corpus sample with the first background image to obtain a first synthesized scene image sample containing a chinese text region.
The recognition network training module 330 is coupled to the image sample synthesis module 320 and is adapted to train a recognition network for recognizing chinese within a scene using the first synthesized scene image samples.
In an alternative embodiment of the present invention, the frequency of occurrence of each chinese character in the obtained first corpus sample is controllable.
Further, in the obtained first corpus sample, the occurrence frequencies of all Chinese characters are controlled to be equal.
In an alternative embodiment of the invention, the random corpus generation module 310 is further adapted to:
The commonly used Chinese characters are obtained from a codebook for Chinese character input before randomly generating the first corpus sample using the commonly used Chinese characters.
In an alternative embodiment of the present invention, as shown in fig. 4, the training apparatus 300 for identifying the chinese character recognition network in the scene illustrated in fig. 3 may further include a real corpus obtaining module 340. The real corpus acquisition module 340 may be connected to the image sample synthesis module 320 and adapted to acquire a corpus with real semantic information. Accordingly, the image sample synthesis module 320 is further adapted to synthesize a corpus with real semantic information with a second background image to obtain a second synthesized scene image sample containing chinese text regions. The recognition network training module 330 is further adapted to train the recognition network with the second composite scene image samples.
In an alternative embodiment of the invention, the first background image is identical to the second background image.
In an alternative embodiment of the present invention, the real corpus acquisition module 340 is further adapted to:
And intercepting words with specific length from the text material containing natural semantics as corpus with real semantic information.
In an alternative embodiment of the present invention, still referring to fig. 4, the training apparatus 300 for identifying a recognition network of chinese in a scene may further include a real scene data acquisition module 350 and a recognition network adjustment module 360. The real scene data acquisition module 350 is adapted to acquire real scene image data. The recognition network adjustment module 360 may be connected to the real scene data acquisition module 350 and the recognition network training module 330, respectively, and is adapted to perform parameter adjustment on the recognition network using the real scene image data.
In an alternative embodiment of the invention, the real scene data acquisition module 350 is further adapted to:
And labeling the real scene image, and cutting out a Chinese character area in the real scene image.
In an alternative embodiment of the invention, the recognition network is used to recognize chinese within a natural scene.
Based on the same inventive concept, the embodiment of the invention also provides a computer storage medium. The computer storage medium stores computer program code which, when run on a computing device, causes the computing device to perform a training method for identifying a recognition network of chinese within a scene according to any one or a combination of the above embodiments.
Based on the same inventive concept, the embodiment of the invention also provides a computing device. The computing device may include:
Processor, and
A memory storing computer program code;
The computer program code, when executed by a processor, causes the computing device to perform a training method for identifying a recognition network of chinese within a scene according to any one or a combination of the embodiments described above.
According to any one of the optional embodiments or the combination of multiple optional embodiments, the following beneficial effects can be achieved according to the embodiment of the invention:
According to the training method and device for the recognition network for recognizing the Chinese in the scene, the corpus sample is randomly generated by using the common Chinese characters, the obtained corpus sample is synthesized with the background image to obtain the synthesized scene image sample containing the Chinese character area, and the recognition network is trained by using the synthesized scene image sample. Since only a small portion of the commonly used Chinese characters frequently appear in the natural corpus information, and other Chinese characters rarely or even do not appear (i.e., so-called long-tail distribution), if the recognition network is trained by using the natural corpus information material, a good recognition effect cannot be obtained for the Chinese characters with low occurrence frequency in the corpus. In the randomly generated corpus sample, the occurrence probability of the common Chinese characters tends to be uniform, and when the recognition network is trained by utilizing the scene image sample synthesized based on the randomly generated corpus sample, the frequency that the recognition network can see all the common Chinese characters tends to be consistent, so that the problem of long-tail distribution of Chinese characters is solved to a certain extent, and the recognition effect of the Chinese characters in a scene is improved.
Furthermore, the occurrence frequency of each Chinese character in the corpus sample synthesized randomly is controlled to be equal, so that the problem of long tail distribution of Chinese characters is further effectively solved.
Furthermore, after the first stage training is performed on the recognition network by using the scene image samples synthesized based on the corpus samples generated randomly, the second stage training may be performed on the recognition network by using the scene image samples synthesized based on the corpus with real semantic information, and finally fine tuning is performed on the recognition network by using the real scene image data. Through the multi-stage training strategy, the generalization capability of the recognition network is further improved, and the recognition effect of Chinese characters in the scene is further improved.
It will be clear to those skilled in the art that the specific working procedures of the above-described systems, devices and units may refer to the corresponding procedures in the foregoing method embodiments, and are not repeated herein for brevity.
In addition, each functional unit in the embodiments of the present invention may be physically independent, two or more functional units may be integrated together, or all functional units may be integrated in one processing unit. The integrated functional units may be implemented in hardware or in software or firmware.
Those skilled in the art will appreciate that the integrated functional units, if implemented in software and sold or used as a stand-alone product, may be stored on a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in essence or in whole or in part in the form of a software product stored in a storage medium, comprising instructions for causing a computing device (e.g., a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present invention when the instructions are executed. The storage medium includes various media capable of storing program codes, such as a U disk, a mobile hard disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
Or all or part of the steps of implementing the foregoing method embodiments may be implemented by hardware (such as a personal computer, a server, or a computing device such as a network device) associated with program instructions, which may be stored in a computer-readable storage medium, which when executed by a processor of the computing device, performs all or part of the steps of the method of embodiments of the present invention.
It should be noted that the above embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that modifications may be made to the technical solution described in the above embodiments or equivalents may be substituted for some or all of the technical features thereof without departing from the scope of the present invention.

Claims (12)

1.一种识别场景内中文的识别网络的训练方法,包括:1. A method for training a recognition network to identify Chinese characters within a scene, comprising: 利用常用汉字随机地生成第一语料样本,其中,在所述第一语料样本中,每个汉字的出现频率被控制为是均等的;The first corpus sample is generated randomly using commonly used Chinese characters, wherein the frequency of occurrence of each Chinese character in the first corpus sample is controlled to be equal; 将所述第一语料样本与第一背景图像合成得到含有中文文字区域的第一合成场景图像样本;The first corpus sample is combined with the first background image to obtain a first synthetic scene image sample containing Chinese text regions. 利用所述第一合成场景图像样本对用于识别场景内中文的识别网络进行第一阶段训练;The first stage of training is performed on the recognition network used to identify Chinese characters in the scene using the first synthesized scene image samples; 其中,还包括:This also includes: 获取具有真实语义信息的语料;Obtain corpora with real semantic information; 将所述具有真实语义信息的语料与第二背景图像合成得到含有中文文字区域的第二合成场景图像样本;The corpus with real semantic information is combined with the second background image to obtain a second synthetic scene image sample containing Chinese text regions. 利用所述第二合成场景图像样本对所述识别网络进行第二阶段训练;The recognition network is trained in the second stage using the second synthesized scene image samples. 其中,还包括:This also includes: 对真实场景图像进行标注,并裁剪出所述真实场景图像中的中文文字区域;The real-world scene image is annotated, and the Chinese text area in the real-world scene image is cropped out; 通过含有中文的真实场景图像的数据集对第二阶段训练后的识别网络的参数进行精调。The parameters of the recognition network trained in the second stage were fine-tuned using a dataset of real-world scene images containing Chinese characters. 2.根据权利要求1所述的方法,其中,在利用常用汉字随机地生成第一语料样本之前,还包括:2. The method according to claim 1, wherein before randomly generating the first corpus sample using commonly used Chinese characters, it further comprises: 从用于汉字输入的码本中获取所述常用汉字。The commonly used Chinese characters are obtained from the codebook used for Chinese character input. 3.根据权利要求1所述的方法,其中,所述第一背景图像与所述第二背景图像相同。3. The method according to claim 1, wherein the first background image is the same as the second background image. 4.根据权利要求1所述的方法,其中,获取具有真实语义信息的语料,包括:4. The method according to claim 1, wherein obtaining corpus with real semantic information includes: 从含有自然语义的文本素材中截取特定长度的文字作为所述具有真实语义信息的语料。A specific length of text is extracted from text materials containing natural semantics to form the corpus containing real semantic information. 5.根据权利要求1-4中任一项所述的方法,其中,所述识别网络用于识别自然场景内的中文。5. The method according to any one of claims 1-4, wherein the recognition network is used to recognize Chinese characters in a natural scene. 6.一种识别场景内中文的识别网络的训练装置,包括:6. A training device for a recognition network that identifies Chinese characters within a scene, comprising: 随机语料生成模块,适于利用常用汉字随机地生成第一语料样本,其中,在所述第一语料样本中,每个汉字的出现频率被控制为是均等的;The random corpus generation module is suitable for randomly generating a first corpus sample using commonly used Chinese characters, wherein the frequency of occurrence of each Chinese character in the first corpus sample is controlled to be equal; 图像样本合成模块,适于将所述第一语料样本与第一背景图像合成得到含有中文文字区域的第一合成场景图像样本;以及The image sample synthesis module is adapted to synthesize the first corpus sample and the first background image to obtain a first synthesized scene image sample containing Chinese text regions; and 识别网络训练模块,适于利用所述第一合成场景图像样本对用于识别场景内中文的识别网络进行第一阶段训练;The recognition network training module is adapted to perform a first-stage training on the recognition network for recognizing Chinese characters within the scene using the first synthesized scene image samples; 其中,还包括:This also includes: 真实语料获取模块,适于获取具有真实语义信息的语料;The real-text acquisition module is suitable for acquiring texts with real semantic information. 所述图像样本合成模块还适于:The image sample synthesis module is also adapted to: 将所述具有真实语义信息的语料与第二背景图像合成得到含有中文文字区域的第二合成场景图像样本;The corpus with real semantic information is combined with the second background image to obtain a second synthetic scene image sample containing Chinese text regions. 所述识别网络训练模块还适于:The recognition network training module is also adapted to: 利用所述第二合成场景图像样本对所述识别网络进行第二阶段训练;The recognition network is trained in the second stage using the second synthesized scene image samples. 所述识别网络训练模块还适于:对真实场景图像进行标注,并裁剪出所述真实场景图像中的中文文字区域;通过含有中文的真实场景图像的数据集对第二阶段训练后的识别网络的参数进行精调。The recognition network training module is also adapted to: annotate real scene images and crop out the Chinese text regions in the real scene images; and fine-tune the parameters of the recognition network after the second stage of training using a dataset of real scene images containing Chinese characters. 7.根据权利要求6所述的装置,其中,所述随机语料生成模块还适于:7. The apparatus according to claim 6, wherein the random corpus generation module is further adapted to: 在利用常用汉字随机地生成第一语料样本之前,从用于汉字输入的码本中获取所述常用汉字。Before randomly generating the first corpus sample using commonly used Chinese characters, the commonly used Chinese characters are obtained from the codebook used for Chinese character input. 8.根据权利要求6所述的装置,其中,所述第一背景图像与所述第二背景图像相同。8. The apparatus according to claim 6, wherein the first background image is the same as the second background image. 9.根据权利要求6所述的装置,其中,所述真实语料获取模块还适于:9. The apparatus according to claim 6, wherein the real corpus acquisition module is further adapted to: 从含有自然语义的文本素材中截取特定长度的文字作为所述具有真实语义信息的语料。A specific length of text is extracted from text materials containing natural semantics to form the corpus containing real semantic information. 10.根据权利要求6-9中任一项所述的装置,其中,所述识别网络用于识别自然场景内的中文。10. The apparatus according to any one of claims 6-9, wherein the recognition network is used to recognize Chinese characters in a natural scene. 11.一种计算机存储介质,所述计算机存储介质存储有计算机程序代码,当所述计算机程序代码在计算设备上运行时,导致所述计算设备执行根据权利要求1-5中任一项所述的识别场景内中文的识别网络的训练方法。11. A computer storage medium storing computer program code, which, when run on a computing device, causes the computing device to execute a training method for a Chinese recognition network in a recognition scene according to any one of claims 1-5. 12.一种计算设备,包括:12. A computing device, comprising: 处理器;以及Processor; and 存储有计算机程序代码的存储器;A memory that stores computer program code; 当所述计算机程序代码被所述处理器运行时,导致所述计算设备执行根据权利要求1-5中任一项所述的识别场景内中文的识别网络的训练方法。When the computer program code is run by the processor, it causes the computing device to execute the training method of the Chinese recognition network in the recognition scene according to any one of claims 1-5.
CN201910146791.1A 2019-02-27 2019-02-27 Training method and device for recognition network for recognizing Chinese in scene Active CN111626287B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910146791.1A CN111626287B (en) 2019-02-27 2019-02-27 Training method and device for recognition network for recognizing Chinese in scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910146791.1A CN111626287B (en) 2019-02-27 2019-02-27 Training method and device for recognition network for recognizing Chinese in scene

Publications (2)

Publication Number Publication Date
CN111626287A CN111626287A (en) 2020-09-04
CN111626287B true CN111626287B (en) 2025-11-21

Family

ID=72271718

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910146791.1A Active CN111626287B (en) 2019-02-27 2019-02-27 Training method and device for recognition network for recognizing Chinese in scene

Country Status (1)

Country Link
CN (1) CN111626287B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112508108B (en) * 2020-12-10 2024-01-26 西北工业大学 A zero-sample Chinese character recognition method based on radicals
CN114612912A (en) * 2022-03-09 2022-06-10 中译语通科技股份有限公司 Image character recognition method, system and equipment based on intelligent corpus

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106022363A (en) * 2016-05-12 2016-10-12 南京大学 Method for recognizing Chinese characters in natural scene
CN109034050A (en) * 2018-07-23 2018-12-18 顺丰科技有限公司 ID Card Image text recognition method and device based on deep learning
CN109214386A (en) * 2018-09-14 2019-01-15 北京京东金融科技控股有限公司 Method and apparatus for generating image recognition model
CN111950548A (en) * 2020-08-10 2020-11-17 河南大学 A Chinese Character Recognition Method Using Font Library Character Images for Deep Template Matching
CN114998909A (en) * 2022-06-08 2022-09-02 北京云上曲率科技有限公司 Image character language identification method and system

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07129713A (en) * 1993-11-01 1995-05-19 Matsushita Electric Ind Co Ltd Character recognition device
US8509537B2 (en) * 2010-08-05 2013-08-13 Xerox Corporation Learning weights of fonts for typed samples in handwritten keyword spotting
CN108090400B (en) * 2016-11-23 2021-08-24 中移(杭州)信息技术有限公司 A method and device for image text recognition
CN106919920B (en) * 2017-03-06 2020-09-22 重庆邮电大学 Scene Recognition Method Based on Convolutional Features and Spatial Vision Bag of Words Model
CN108288078B (en) * 2017-12-07 2020-09-29 腾讯科技(深圳)有限公司 Method, device and medium for recognizing characters in image
CN108764226B (en) * 2018-04-13 2022-05-03 顺丰科技有限公司 Image text recognition method, device, equipment and storage medium thereof
CN109272043B (en) * 2018-09-21 2021-03-30 京东数字科技控股有限公司 Training data generation method and system for optical character recognition and electronic equipment
US20210124972A1 (en) * 2019-10-29 2021-04-29 Prescient Technologies Inc. Optical character recognition system and method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106022363A (en) * 2016-05-12 2016-10-12 南京大学 Method for recognizing Chinese characters in natural scene
CN109034050A (en) * 2018-07-23 2018-12-18 顺丰科技有限公司 ID Card Image text recognition method and device based on deep learning
CN109214386A (en) * 2018-09-14 2019-01-15 北京京东金融科技控股有限公司 Method and apparatus for generating image recognition model
CN111950548A (en) * 2020-08-10 2020-11-17 河南大学 A Chinese Character Recognition Method Using Font Library Character Images for Deep Template Matching
CN114998909A (en) * 2022-06-08 2022-09-02 北京云上曲率科技有限公司 Image character language identification method and system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
一种面向银行票据文字自动化识别的高效人工智能方法;张振宇;姜贺云;樊明宇;;温州大学学报(自然科学版);20200825(第03期);全文 *
张振宇 ; 姜贺云 ; 樊明宇 ; .一种面向银行票据文字自动化识别的高效人工智能方法.温州大学学报(自然科学版).2020,(第03期),全文. *
深度学习在文字识别领域的应用;李新炜;《电子技术与软件工程》;20190102(第24期);第40页 *

Also Published As

Publication number Publication date
CN111626287A (en) 2020-09-04

Similar Documents

Publication Publication Date Title
CN114596566B (en) Text recognition method and related device
US11288324B2 (en) Chart question answering
Deng et al. Challenges in end-to-end neural scientific table recognition
US10896357B1 (en) Automatic key/value pair extraction from document images using deep learning
Jaderberg et al. Reading text in the wild with convolutional neural networks
CN112633431B (en) A Tibetan-Chinese Bilingual Scene Text Recognition Method Based on CRNN and CTC
CN104735468B (en) A kind of method and system that image is synthesized to new video based on semantic analysis
Tang et al. FontRNN: Generating Large‐scale Chinese Fonts via Recurrent Neural Network
Wilkinson et al. Neural Ctrl-F: segmentation-free query-by-string word spotting in handwritten manuscript collections
CN109753653B (en) Entity name recognition method, entity name recognition device, computer equipment and storage medium
Wang et al. Semantic-guided relation propagation network for few-shot action recognition
Bilkhu et al. Attention is all you need for videos: Self-attention based video summarization using universal transformers
CN114998909B (en) Image text language recognition method and system
Belay et al. Amharic text image recognition: Database, algorithm, and analysis
Valy et al. Data augmentation and text recognition on Khmer historical manuscripts
CN111626287B (en) Training method and device for recognition network for recognizing Chinese in scene
CN114241495B (en) Data enhancement method for off-line handwritten text recognition
Ahn et al. Story visualization by online text augmentation with context memory
CN109168006A (en) The video coding-decoding method that a kind of figure and image coexist
CN110825874A (en) Chinese text classification method and device and computer readable storage medium
CN119066037A (en) Document segmentation processing method, device, computer equipment and readable storage medium
CN111652256B (en) Method and system for acquiring multidimensional data
CN117912005A (en) Text recognition method, system, device and medium using single mark decoding
Sukh Ocr-free document understanding using vision-language models
Housen Lecture2Notes: Summarizing Lecture Videos by Classifying Slides and Analyzing Text

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant