CN113537223B - Training sample generation, model training and image processing method and device - Google Patents
Training sample generation, model training and image processing method and device Download PDFInfo
- Publication number
- CN113537223B CN113537223B CN202010309634.0A CN202010309634A CN113537223B CN 113537223 B CN113537223 B CN 113537223B CN 202010309634 A CN202010309634 A CN 202010309634A CN 113537223 B CN113537223 B CN 113537223B
- Authority
- CN
- China
- Prior art keywords
- image
- chapter
- data
- deep learning
- segmentation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000012549 training Methods 0.000 title claims abstract description 121
- 238000003672 processing method Methods 0.000 title claims abstract description 10
- 230000011218 segmentation Effects 0.000 claims abstract description 223
- 238000013135 deep learning Methods 0.000 claims abstract description 119
- 238000012545 processing Methods 0.000 claims abstract description 98
- 238000000034 method Methods 0.000 claims abstract description 65
- 230000006870 function Effects 0.000 claims abstract description 60
- 238000011084 recovery Methods 0.000 claims abstract description 24
- 239000002131 composite material Substances 0.000 claims description 20
- 238000003786 synthesis reaction Methods 0.000 claims description 20
- 230000015572 biosynthetic process Effects 0.000 claims description 19
- 238000004590 computer program Methods 0.000 claims description 10
- 238000003860 storage Methods 0.000 claims description 9
- 238000013519 translation Methods 0.000 claims description 5
- 230000002194 synthesizing effect Effects 0.000 claims description 4
- 230000008485 antagonism Effects 0.000 claims 1
- 238000012544 monitoring process Methods 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 17
- 238000005516 engineering process Methods 0.000 description 5
- 230000000295 complement effect Effects 0.000 description 4
- 238000003709 image segmentation Methods 0.000 description 4
- 238000012015 optical character recognition Methods 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 230000004927 fusion Effects 0.000 description 2
- 238000004140 cleaning Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The disclosure provides a training sample generation method, a model training method, an image processing method and an image processing device, and relates to the technical field of image processing. The model training method comprises the following steps: inputting the chapter synthetic data into a first deep learning segmentation network for binarization segmentation to obtain a binary segmentation image; according to the binary segmentation image and the seal-free image, monitoring and training a first deep learning segmentation network through a first loss function until loss converges; performing style recovery processing on the binary segmentation image and corresponding chapter synthetic data through a second deep learning segmentation network to obtain a chapter-free prediction image; and according to the seal-free predictive image and the seal-free image, performing supervision training on the second deep learning segmentation network through a second loss function until the loss converges. By the method, two-section training can be performed, the accuracy of image binarization segmentation and recovery is respectively improved, and the accuracy of an image processing model on image seal removal processing is improved.
Description
Technical Field
The disclosure relates to the technical field of image processing, in particular to a training sample generation method, a model training method, an image processing method and an image processing device.
Background
OCR (optical character recognition, character recognition) technology is a technology that converts image-form characters into corresponding character codes, and is widely used for recognition and transcription of various paper documents. As a general bill, an invoice often requires OCR technology to extract important information therein such as date, amount. The current mainstream OCR technology is based on a deep learning framework, training a deep neural network with a large amount of data. The input to the network is typically an R (red) G (green) B (blue) image, and the output of the network is a sequence corresponding to the characters in the image.
However, there are certain problems in practical application. The effective invoice is covered with the official seal, and the position of the official seal is always near the 'amount', so that the official seal can shield the amount field to be identified to different degrees, and the identification model is wrong. From an observation, the recognition model is sensitive to images with official seal, and even if the official seal does not shade the target field, the model can generate some unreasonable predictions. Further cleaning of the image and removal of the region of the official seal will correct many potential recognition errors.
Disclosure of Invention
It is an object of the present disclosure to improve the accuracy of image processing.
According to an aspect of some embodiments of the present disclosure, a training sample generation method is provided, including: acquiring chapter image sample data; acquiring chapter synthetic data according to chapter image sample data and chapter-free images; the chapter-composite data is associated with the chapter-less image based thereon to serve as sample data for the training image processing model.
In some embodiments, acquiring chapter image sample data includes: and extracting a chapter image in the chapter-having image data as chapter image sample data.
In some embodiments, obtaining the chapter-synthesized data from the chapter image sample data and the chapter-less image includes: performing enhancement processing according to the chapter sample image data to obtain enhanced chapter sample data, wherein the enhancement processing comprises one or more of image translation, rotation, scaling or toning; based on the seal-less image, synthesizing enhanced seal sample data, and obtaining seal synthesized data.
By the method, the seal-containing synthesized data can be synthesized and generated based on the seal image sample and the seal-free image, so that a large number of related seal-free images and corresponding seal-containing images can be obtained, the acquisition difficulty of training samples of an image processing model is reduced, the number of training samples is increased, the accuracy of the image model is improved, and the accuracy of image processing is improved.
According to an aspect of some embodiments of the present disclosure, a model training method is provided, including: inputting the chapter synthetic data into a first deep learning segmentation network for binarization segmentation to obtain a binary segmentation image; according to the binary segmentation image and the chapter-free image which is processed based on a preset first strategy and is associated with chapter synthesis data, a first deep learning segmentation network is supervised and trained through a first loss function until loss converges; inputting the binary segmentation image output by the first deep learning segmentation network and corresponding chapter synthetic data into a second deep learning segmentation network to perform style recovery processing on the binary segmentation image, and obtaining a chapter-free prediction image; and according to the seal-free predictive image and the seal-free image associated with the seal-bound synthesized data, performing supervised training on the second deep learning segmentation network through a second loss function until the loss converges.
In some embodiments, the model training method further comprises: any of the training sample generation methods mentioned above generates the signed composite data, and the association of the signed composite data with the non-signed image.
In some embodiments, the signed composite image is a 3-channel RGB image; the binarized segmentation image is a single-channel image; and the chapter-less predictive image is a 3-channel RGB image.
In some embodiments, the first deep learning segmentation network binarizes the segmented composite data comprising: the pixels occupied by the characters are defined as foreground, the rest pixels are defined as background, and the binary segmentation is carried out on the synthesized data with the chapter.
In some embodiments, the predetermined first strategy includes binarizing and skeletonizing the chapter-free image.
In some embodiments, the model training method is consistent with at least one of: the first loss function includes a BCE (binary_cross_ Entropy, binary Cross entropy) loss function and a GAN (GENERATIVE ADVERSARIAL Networks, generated against Networks) loss function; or the second loss function includes an L1 (LeastAbsolute Deviation, minimum absolute deviation) loss function, a perceptual Perceptual loss function, and a GAN loss function.
In some embodiments, inputting the binary segmentation image output by the first deep learning segmentation network and the corresponding chapter-synthesized data into the second deep learning segmentation network for style recovery processing of the binary segmentation image comprises fixing parameters of the first deep learning segmentation network after training convergence of the first deep learning segmentation network, inputting the chapter-synthesized data into the first deep learning segmentation network for binarization segmentation, and obtaining the binary segmentation image; and inputting the binary segmentation image and corresponding chapter synthetic data into a second deep learning segmentation network to perform style recovery processing on the binary segmentation image, and obtaining a chapter-free prediction image.
By the method, the two-section training can be performed on the deep learning segmentation network by utilizing the chapter-combined data and the corresponding chapter-free image, and the accuracy of image binarization segmentation and recovery is respectively improved, so that the training of an image processing model is realized, and the accuracy of the image processing model on image processing is improved.
According to an aspect of some embodiments of the present disclosure, there is provided an image processing method including: inputting the to-be-processed seal image into a first deep learning segmentation network for binarization segmentation to obtain a binary segmentation image; and inputting the binary segmentation image output by the first deep learning segmentation network and the corresponding chapter image into the second deep learning segmentation network for style recovery processing of the binary segmentation image, and obtaining a chapter-free prediction image.
In some embodiments, the image processing method further comprises: training and generating a first deep learning segmentation network and a second deep learning segmentation network through any one of the model training methods.
By the method, the image to be processed can be projected into the binary segmentation space first to be processed as an image segmentation problem, then image details are recovered, character structure information and image style information are separated, and therefore the chapters on the image are processed in a targeted manner, and the accuracy of image processing is improved.
According to an aspect of some embodiments of the present disclosure, there is provided a training sample generation apparatus including: a chapter sample acquisition unit configured to acquire chapter image sample data; a chapter-synthesis-data obtaining unit configured to obtain chapter synthesis data from chapter image sample data and chapter-less images; and a sample data generating unit configured to associate the chapter-resultant data with the chapter-free image based thereon so as to be sample data of the training image processing model.
In some embodiments, the chapter sample acquisition unit is configured to extract a chapter image of the image data having a chapter as chapter image sample data.
In some embodiments, the signed composite data acquisition unit is configured to: performing enhancement processing according to the chapter sample image data to obtain enhanced chapter sample data, wherein the enhancement processing comprises one or more of image translation, rotation, scaling or toning; based on the seal-less image, synthesizing enhanced seal sample data, and obtaining seal synthesized data.
The device can synthesize and generate the seal-forming synthesized data based on the seal image sample and the seal-free image, so that a large number of related seal-free images and corresponding seal-forming images can be obtained, the acquisition difficulty of the training sample of the image processing model is reduced, the number of the training samples is increased, and the accuracy of the image model is improved, thereby improving the accuracy of the image processing.
According to an aspect of some embodiments of the present disclosure, there is provided a model training apparatus, comprising: a first network training unit configured to: inputting the chapter synthetic data into a first deep learning segmentation network for binarization segmentation to obtain a binary segmentation image; according to the binary segmentation image and the chapter-free image which is processed based on a preset first strategy and is associated with chapter synthesis data, a first deep learning segmentation network is supervised and trained through a first loss function until loss converges; a second network training unit configured to: inputting the binary segmentation image output by the first deep learning segmentation network and corresponding chapter synthesis data into a second deep learning segmentation network for style recovery processing of the binary segmentation image, and obtaining a chapter-free prediction image; and according to the seal-free predictive image and the seal-free image associated with the seal-combined data, performing supervision training on the second deep learning segmentation network through a second loss function until the loss converges.
In some embodiments, the model training apparatus further comprises: any of the training sample generating devices mentioned above.
In some embodiments, the signed composite image is a 3-channel RGB image; the binarized segmentation image is a single-channel image; and the chapter-less predictive image is a 3-channel RGB image.
In some embodiments, the first deep learning segmentation network binarizes the segmented composite data comprising: the pixels occupied by the characters are defined as foreground, the rest pixels are defined as background, and the binary segmentation is carried out on the synthesized data with the chapter.
In some embodiments, the predetermined first strategy includes binarizing and skeletonizing the chapter-free image.
In some embodiments, the model training method is consistent with at least one of: the first loss function includes a BCE (binary_cross_ Entropy, binary Cross entropy) loss function and a GAN (GENERATIVE ADVERSARIAL Networks, generated against Networks) loss function; or the second loss function includes an L1 (LeastAbsolute Deviation, minimum absolute deviation) loss function, a perceptual Perceptual loss function, and a GAN loss function.
In some embodiments, inputting the binary segmentation image output by the first deep learning segmentation network and the corresponding chapter-synthesized data into the second deep learning segmentation network for style recovery processing of the binary segmentation image comprises fixing parameters of the first deep learning segmentation network after training convergence of the first deep learning segmentation network, inputting the chapter-synthesized data into the first deep learning segmentation network for binarization segmentation, and obtaining the binary segmentation image; and inputting the binary segmentation image and corresponding chapter synthetic data into a second deep learning segmentation network to perform style recovery processing on the binary segmentation image, and obtaining a chapter-free prediction image.
The device can perform two-section training on the deep learning segmentation network by utilizing the chapter synthesis data and the corresponding chapter-free image, and respectively improve the accuracy of image binarization segmentation and recovery, thereby realizing the training on the image processing model and improving the accuracy of the image processing model on the image processing.
According to an aspect of some embodiments of the present disclosure, there is provided an image processing apparatus including: the first deep learning segmentation network is configured to perform binarization segmentation according to the to-be-processed seal image to obtain a binary segmentation image; the second deep learning segmentation network is configured to perform style recovery processing on the binary segmentation image according to the binary segmentation image output by the first deep learning segmentation network and the corresponding chapter-bearing image, and acquire a chapter-free prediction image.
In some embodiments, the image processing apparatus further includes: any one of the above model training devices.
The device can project the image to be processed into the binary segmentation space, process the image segmentation problem, recover the image details, and separate the character structure information and the image style information, thereby pointedly processing the chapter on the image and improving the accuracy of the image processing.
According to an aspect of some embodiments of the present disclosure, there is provided an image processing apparatus including: a memory; and a processor coupled to the memory, the processor configured to perform any of the methods mentioned above based on instructions stored in the memory.
According to an aspect of some embodiments of the present disclosure, a computer-readable storage medium is presented, on which computer program instructions are stored, which instructions, when executed by a processor, implement the steps of any one of the methods mentioned above.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate and explain the present disclosure, and together with the description serve to explain the present disclosure. In the drawings:
fig. 1A is a flow chart of some embodiments of a training sample generation method of the present disclosure.
Fig. 1B is a schematic diagram of some embodiments of a training sample generation method of the present disclosure.
Fig. 2A is a flow chart of some embodiments of the model training method of the present disclosure.
Fig. 2B is a schematic diagram of some embodiments of the model training method of the present disclosure.
Fig. 3 is a flow chart of some embodiments of the image processing method of the present disclosure.
Fig. 4 is a schematic diagram of some embodiments of a training sample generation apparatus of the present disclosure.
Fig. 5 is a schematic diagram of some embodiments of a model training apparatus of the present disclosure.
Fig. 6 is a schematic diagram of some embodiments of an image processing apparatus of the present disclosure.
Fig. 7 is a schematic diagram of other embodiments of an image processing apparatus of the present disclosure.
Fig. 8 is a schematic diagram of still other embodiments of an image processing apparatus of the present disclosure.
Detailed Description
The technical scheme of the present disclosure is described in further detail below through the accompanying drawings and examples.
Aiming at the problem of official seal removal, the characteristic that most of official seals are red is utilized in the related technology, and the selected area of the official seal is obtained according to the component of a specific channel extracted by RGB, and is removed; or adopting a data driving method, training the neural network by utilizing the neural network and a large amount of matched input and output data, and generating a seal-free image according to the seal-free image.
The inventors have found that data-driven based methods often require a large amount of paired data, i.e. an image with a chapter needs to be accompanied by an image without a red chapter as a supervisory signal. Because capturing both stamped and non-stamped images of the same ticket is extremely difficult and it is difficult to ensure direct alignment of the images, training sample data is difficult to obtain and quality is difficult to ensure.
A flowchart of some embodiments of the training sample generation method of the present disclosure is shown in fig. 1A.
In step 101, chapter image sample data is acquired. In some embodiments, the chapter image sample data may be individual chapter images, or may be chapter image samples extracted from images that contain chapter images. In some embodiments, a chapter image in the chapter-having image data may be extracted as chapter image sample data, thereby enriching a sample library.
In step 102, chapter synthesis data is acquired from chapter image sample data and chapter-less images. In some embodiments, the chapter images may be superimposed on the chapter-less image, and fused to obtain the chapter-specific composite data. In some embodiments, the chapter image sample data may be first subjected to enhancement processing, such as performing one or more of image translation, rotation, scaling, or toning, to obtain enhanced chapter sample data, which in turn synthesizes chapter-synthesized data based on the enhanced chapter sample data and the chapter-free image.
In step 103, the chapter composite data is associated with the chapter-less image based thereon so as to be sample data of the training image processing model. In some embodiments, if one chapter-free image is synthesized with n (n is a positive integer) chapter image samples respectively, n chapter-synthesized images are obtained, and then data of the n chapter-synthesized images are associated with the chapter-free image respectively.
By the method, the seal-containing synthesized data can be synthesized and generated based on the seal image sample and the seal-free image, so that a large number of related seal-free images and corresponding seal-containing images can be obtained, the acquisition difficulty of training samples of an image processing model is reduced, the number of training samples is increased, the accuracy of the image model is improved, and the accuracy of image processing is improved.
In some embodiments, based on the chapter-like image and the chapter-less image that are respectively convenient to obtain and have no association relationship, the association relationship of a large number of chapter-less images and chapter-like synthesized data can be obtained through a flow as shown in fig. 1B:
The seal image sample data is obtained by the seal sample extraction 111 operation of the seal data; the chapter image sample data is subjected to chapter sample enhancement 112 to obtain enhanced chapter sample data; the obtained enhanced chapter sample data and the chapter-free image are subjected to image fusion 113 to obtain chapter-combined data; and then the chapter-combined data and the chapter-free image based on the fusion operation are subjected to image association 114 operation to obtain the association relationship between the chapter-combined data and the chapter-free image.
By the method, a large amount of associated seal-bearing and seal-free image data can be obtained, and the alignment degree of the seal-bearing and seal-free image data is good, so that model training is facilitated, and the accuracy of image data processing for image seal removal is improved.
The inventors have found that in the image de-stamping process, the matching data pairs (i.e. the stamped image data and the corresponding non-stamped image data) are not used directly to fit the network well. Especially for some places with shielding, the model cannot well complement the missing information. A flowchart of some embodiments of the model training method of the present disclosure is shown in fig. 2A.
In step 201, the signed composite data is input into a first deep learning segmentation network for binarization segmentation to obtain a binary segmentation image. In some embodiments, the signed composite image is a 3-channel RGB image; the binarized segmented image is a single channel image. In some embodiments, the first deep learning segmentation network defines pixels occupied by the character as foreground and the remaining pixels as background, and performs binary segmentation on the signed synthesized data.
In step 202, a first deep learning segmentation network is supervised and trained by a first penalty function based on the binary segmentation image and the chapter-less image associated with the chapter-less composite data until the penalty converges.
In some embodiments, the structure information of the characters is difficult to unify due to the difference of fonts and different scales, which may cause difficulty in web learning, so that the seal-less image may be processed, such as binarized and skeletonized, based on a first predetermined policy. After the processing, characters with different fonts and sizes are expressed by extremely fine skeletons, so that the network learning target is more clear. And according to the processed data of the binary segmentation image and the associated chapter-free image, performing supervised training on the first deep learning segmentation network through a first loss function until the loss converges.
In some embodiments, the first loss function may include a BCE loss function and a GAN loss function. The BCE loss function enables each pixel to accord with the segmentation result; the GAN loss function acts as a regularization term to make the segmentation result more reasonable. The first loss function can enable the network to learn the structural information of the characters better, for example, the situation that the characters are blocked by the chapter is complemented, so that the generated characters are more reasonable in vision.
Training of the first deep learning segmentation network is achieved through steps 201, 202 based on a large amount of sample data.
In step 203, the binary segmentation image output by the first deep learning segmentation network and the corresponding chapter synthesis data are input into the second deep learning segmentation network to perform style recovery processing of the binary segmentation image, so as to obtain a chapter-free prediction image. In some embodiments, the chapter-less predictive image is a 3-channel RGB image, thereby enabling mapping of the binarized segmentation result back to RGB space, resulting in a chapter-less image, where the binary segmentation image provides structural information of text and the original chapter-less image mainly provides image style information.
In some embodiments, after the training of the first deep learning segmentation network converges, parameters of the first deep learning segmentation network may be fixed by repeating steps 201 and 202, the signed synthesized data may be input into the first deep learning segmentation network, and the binary segmentation image output by the signed synthesized data may be input into the second deep learning segmentation network, so that the second deep learning segmentation network is trained on the basis of completing the training of the first deep learning segmentation network, and the convergence efficiency is improved.
In step 204, a second deep learning segmentation network is supervised and trained by a second penalty function based on the chapter-free predictive image and the chapter-free image associated with the chapter-synthesized data until the penalty converges.
In some embodiments, the second loss function includes an L1 loss function, perceptual loss functions, and a GAN loss function. The L1 penalty function is primarily responsible for imposing constraints at the pixel level. Perceptual is a VGG feature-based loss function, perceptual focuses more on similarity at the feature level than L1 loss focuses on pixel differences. The GAN loss function can make the generated image more realistic, conforming to the real data distribution.
By the method, the two-section training can be performed on the deep learning segmentation network by utilizing the chapter-combined data and the corresponding chapter-free image, and the accuracy of image binarization segmentation and recovery is respectively improved, so that the training of an image processing model is realized, and the accuracy of the image processing model on image processing is improved.
In some embodiments, the model training method may further include any of the training sample generation methods mentioned above to generate the signed composite data, and an association of the signed composite data with the seal-less image. In some embodiments, the training sample generating operation may be performed first, and after obtaining sufficient sample data, the operation of inputting the training sample into the first deep learning segmentation network is performed, so as to reduce the operation pressure of the device. In other embodiments, the training sample generation operation may be performed in synchronization with the training operation of the deep learning segmentation network, thereby improving the execution efficiency.
By the method, dependence on real pairing data can be reduced, a sample data set with high matching degree is adopted for model training, and the accuracy and efficiency of model training can be improved by improving the quality of the sample data, so that the accuracy of image processing can be improved.
In some embodiments, the two-section image processing model separates the structural information of the characters and the style information of the images, the first deep learning segmentation network focuses on the mapping and the complement of the structural information of the characters, and the second deep learning segmentation network focuses on the restoration of the style information of the images, so that the complement effect of the part of the images covered by the seal can be optimized, and the complement accuracy is improved.
In some embodiments, the image processing model may be trained by a flow as shown in FIG. 2B:
Inputting Unet a composite image with a chapter into the image processing unit 211, and outputting a binary segmentation image; after the non-chapter image associated with the chapter image is subjected to binarization and skeletonization 213, the non-chapter image and the binary segmentation image are input into a first loss function 212, and supervision training is performed on Unet1 until loss converges, so that parameters of Unet 1211 are fixed.
The chapter-synthesized image is input Unet to 1 211, the binary-divided image output by the chapter-synthesized image is input Unet to 221, the prediction result output by Unet to 221 and the chapter-free image corresponding to the chapter-synthesized image input Unet to 211 are input to a second loss function 222, and the supervisory training is performed on Unet to 221 until the loss converges, and parameters of Unet to 221 are fixed.
Through the method, the training of separating the character structure information and the image style information is realized through the data processing and the associated training of the two models, and the training efficiency and the accuracy of the models are improved.
A flowchart of some embodiments of the image processing method of the present disclosure is shown in fig. 3.
In step 301, a segmented image to be processed is input into a first deep learning segmentation network for binarization segmentation to obtain a binary segmentation image. In some embodiments, the first deep learning segmentation network is generated based on the training method of the first deep learning segmentation network in any of the embodiments above.
In step 302, the binary segmentation image output by the first deep learning segmentation network and the corresponding chapter image are input into the second deep learning segmentation network for style recovery processing of the binary segmentation image, and a chapter-free prediction image is obtained. In some embodiments, the second deep learning segmentation network is generated based on the training method of the second deep learning segmentation network in any of the embodiments above.
By the method, the image to be processed can be projected into the binary segmentation space first to be processed as an image segmentation problem, then image details are recovered, character structure information and image style information are separated, and therefore the seal marks on the image are processed in a targeted manner, and the accuracy of image processing is improved.
In some embodiments, the image processing method may further comprise training to generate the first and second deep learning segmentation networks by any of the model training methods above. After model training is completed, parameters are fixed, and then the trained model is used for removing chapters of the image to be processed.
By the method, training for separating character structure information and image style information can be realized through data processing and associated training of the two models, training efficiency and model accuracy are improved, chapters on an image are processed in a targeted manner, and image processing accuracy is improved.
A schematic diagram of some embodiments of the training sample generation apparatus of the present disclosure is shown in fig. 4.
The chapter sample acquisition unit 41 is capable of acquiring chapter image sample data. In some embodiments, the chapter sample acquisition unit 41 may extract chapter images in the image data with chapters as chapter image sample data, thereby enriching a sample library.
The chapter-synthesis-data obtaining unit 42 is capable of obtaining chapter synthesis data from chapter image sample data and chapter-less images. In some embodiments, the chapter images may be superimposed on the chapter-less image, and fused to obtain the chapter-specific composite data. In some embodiments, the chapter image sample data may be first subjected to enhancement processing, such as performing one or more of image translation, rotation, scaling, or toning, to obtain enhanced chapter sample data, which in turn synthesizes chapter-synthesized data based on the enhanced chapter sample data and the chapter-free image.
The sample data generating unit 43 can associate the chapter-resultant data with the chapter-free image based thereon so as to be sample data of the training image processing model. In some embodiments, if one chapter-free image is synthesized with n (n is a positive integer) chapter image samples respectively, n chapter-synthesized images are obtained, and then data of the n chapter-synthesized images are associated with the chapter-free image respectively.
The device can synthesize and generate the seal-forming synthesized data based on the seal image sample and the seal-free image, so that a large number of related seal-free images and corresponding seal-forming images can be obtained, the acquisition difficulty of the training sample of the image processing model is reduced, the number of the training samples is increased, and the accuracy of the image model is improved, thereby improving the accuracy of the image processing.
A schematic diagram of some embodiments of the model training apparatus of the present disclosure is shown in fig. 5.
The first network training unit 51 is capable of inputting the data with the chapter to the first deep learning segmentation network to perform binary segmentation, and acquiring a binary segmentation image. In some embodiments, the signed composite image is a 3-channel RGB image; the binarized segmented image is a single channel image. In some embodiments, the first deep learning segmentation network defines pixels occupied by the character as foreground and the remaining pixels as background, and performs binary segmentation on the signed synthesized data. Further, the first deep learning segmentation network is supervised and trained through a first loss function according to the binary segmentation image and the chapter-free image associated with the chapter-free synthesized data until the loss converges.
The second network training unit 52 can input the binary segmentation image output by the first deep learning segmentation network and the corresponding signed synthesized data into the second deep learning segmentation network to perform style recovery processing of the binary segmentation image, and obtain a chapter-free prediction image. In some embodiments, the chapter-less predictive image is a 3-channel RGB image, thereby enabling mapping of the binarized segmentation result back to RGB space, resulting in a chapter-less image, where the binary segmentation image provides structural information of text and the original chapter-less image mainly provides image style information. In some embodiments, after the first network training unit 51 completes the operation and the training of the first deep learning segmentation network converges, the parameters of the first deep learning segmentation network may be fixed, the signed synthesized data may be input into the first deep learning segmentation network, and the binary segmentation image output by the signed synthesized data may be input into the second deep learning segmentation network, so that the second deep learning segmentation network is trained on the basis of completing the training of the first deep learning segmentation network, and the converging efficiency is improved. Further, according to the seal-less predicted image and the seal-less image associated with the seal-less synthesized data, the second deep learning segmentation network is supervised and trained through a second loss function until the loss converges.
The device can perform two-section training on the deep learning segmentation network by utilizing the chapter synthesis data and the corresponding chapter-free image, and respectively improve the accuracy of image binarization segmentation and recovery, thereby realizing the training on the image processing model and improving the accuracy of the image processing model on the image processing.
In some embodiments, the model training device may include the training sample generating device mentioned above, so as to reduce the sample collection difficulty of model training, improve the sample quality, and improve the effect and efficiency of model training.
In some embodiments, the first network training unit may include: the first deep learning segmentation network, the first supervised data acquisition subunit, and the first loss function subunit.
The first deep learning segmentation network can input the chapter synthetic data into the first deep learning segmentation network to perform binarization segmentation to obtain a binary segmentation image. The first supervised data acquisition sub-unit is capable of processing the seal-less image associated with the seal-less synthesized data based on a predetermined first policy to acquire the first supervised data. The first loss function subunit is capable of performing supervised training on the first deep learning segmentation network according to the binary segmentation image and the first supervision data to obtain loss.
In some embodiments, the second network training unit may include: the second deep learning segmentation network, the second supervised data acquisition subunit, and the second loss function subunit.
The second deep learning segmentation network can input binary segmentation images output by the first deep learning segmentation network and corresponding signed synthesized data into the second deep learning segmentation network to perform style recovery processing of the binary segmentation images, and a seal-free prediction image is obtained. The second supervision data acquisition subunit is capable of acquiring, as second supervision data, an untapered image associated with the signed composition data. The second loss function subunit is capable of training the second deep learning segmentation network according to the seal-less prediction image and the second supervision data to obtain the loss.
The model training device can be designed based on relatively independent function planning, so that the device is convenient to realize; also, the first deep learning segmentation network and the first deep learning segmentation network are conveniently segmented from the image data, and the processing of the to-be-processed seal image is realized.
A schematic diagram of some embodiments of an image processing apparatus of the present disclosure is shown in fig. 6.
The first deep learning segmentation network 61 is capable of performing binary segmentation according to the segmented image to be processed to obtain a binary segmented image. The second deep learning segmentation network 62 can perform style recovery processing on the binary segmentation image according to the binary segmentation image output by the first deep learning segmentation network and the corresponding seal image, and acquire a seal-free prediction image. In some embodiments, the first and second deep learning segmentation networks 61, 62 are generated for training through the model training apparatus mentioned above.
The device can project the image to be processed into the binary segmentation space, process the image segmentation problem, recover the image details, and separate the character structure information and the image style information, thereby pointedly processing the chapter on the image and improving the accuracy of the image processing.
In some embodiments, the image processing device may include any of the model training devices mentioned above, so that parameters are fixed after model training is completed, and then the trained model is used to remove chapters from the image with chapters to be processed, so that training efficiency and accuracy of the model are improved, and chapters on the image are processed in a targeted manner, so that accuracy of image processing is improved.
A schematic structural diagram of one embodiment of an image processing apparatus of the present disclosure is shown in fig. 7. The image processing apparatus includes a memory 701 and a processor 702. Wherein: memory 701 may be a magnetic disk, flash memory, or any other non-volatile storage medium. The memory is used to store instructions in the corresponding embodiments of the methods above. Processor 702 is coupled to memory 701 and may be implemented as one or more integrated circuits, such as a microprocessor or microcontroller. The processor 702 is configured to execute instructions stored in the memory, and can process chapters on an image in a targeted manner, thereby improving accuracy of image processing.
In one embodiment, as also shown in fig. 8, the image processing apparatus 800 includes a memory 801 and a processor 802. The processor 802 is coupled to the memory 801 by a BUS 803. The image processing apparatus 800 may also be connected to an external storage device 805 via a storage interface 804 for invoking external data, and may also be connected to a network or another computer system (not shown) via a network interface 806. And will not be described in detail herein.
In the embodiment, the data instruction is stored in the memory, and then the instruction is processed by the processor, so that the chapter on the image can be processed in a targeted manner, and the accuracy of image processing is improved.
In another embodiment, a computer readable storage medium has stored thereon computer program instructions which, when executed by a processor, implement the steps of the method of any of the method embodiments above. It will be apparent to those skilled in the art that embodiments of the present disclosure may be provided as a method, apparatus, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Thus far, the present disclosure has been described in detail. In order to avoid obscuring the concepts of the present disclosure, some details known in the art are not described. How to implement the solutions disclosed herein will be fully apparent to those skilled in the art from the above description.
The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, firmware. The above-described sequence of steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the sequence specifically described above unless specifically stated otherwise. Furthermore, in some embodiments, the present disclosure may also be implemented as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
Finally, it should be noted that: the above embodiments are merely for illustrating the technical solution of the present disclosure and are not limiting thereof; although the present disclosure has been described in detail with reference to preferred embodiments, those of ordinary skill in the art will appreciate that: modifications may be made to the specific embodiments of the disclosure or equivalents may be substituted for part of the technical features; without departing from the spirit of the technical solutions of the present disclosure, it should be covered in the scope of the technical solutions claimed in the present disclosure.
Claims (14)
1. A model training method, comprising:
inputting chapter synthesis data into a first deep learning segmentation network for binarization segmentation to obtain a binary segmentation image, wherein the chapter synthesis data is obtained after synthesis according to chapter image sample data and chapter-free images;
According to the binary segmentation image and an untaped image which is processed based on a preset first strategy and is associated with the seal-forming synthesized data, the first deep learning segmentation network is supervised and trained through a first loss function until loss converges, and the preset first strategy comprises binarizing and skeletonizing the untaped image;
Inputting the binary segmentation image output by the first deep learning segmentation network and the corresponding chapter synthesis data into a second deep learning segmentation network to perform style recovery processing on the binary segmentation image, and obtaining a chapter-free prediction image;
and according to the seal-free predictive image and the seal-free image associated with the seal-formed synthesized data, performing supervision training on the second deep learning segmentation network through a second loss function until the loss converges.
2. The method of claim 1, further comprising: generating the chapter synthetic data and the association relation between the chapter synthetic data and the chapter-free image comprises the following steps:
Acquiring chapter image sample data;
acquiring chapter synthesis data according to the chapter image sample data and the chapter-free image;
the chapter-synthesis data is associated with a chapter-free image based thereon so as to be sample data of a training image processing model.
3. The method of claim 2, wherein the acquiring chapter image sample data comprises: and extracting a chapter image in the chapter-provided image data as chapter image sample data.
4. A method according to claim 2 or 3, wherein said obtaining the chapter-synthesized data from chapter image sample data and chapter-less images comprises:
performing enhancement processing according to the chapter sample image data to obtain enhanced chapter sample data, wherein the enhancement processing comprises one or more of image translation, rotation, scaling or toning;
and synthesizing the enhanced chapter sample data based on the chapter-free image, and acquiring the chapter-having synthesized data.
5. A method according to any one of claim 1 to 3, wherein,
The signed composite image is a 3-channel RGB image;
The binarized segmentation image is a single-channel image; and
The chapter-less predictive image is a 3-channel RGB image.
6. A method according to any of claims 1-3, wherein the first deep learning segmentation network binarizes the segmented composite data comprising:
And defining pixels occupied by the characters as a foreground, defining the rest pixels as a background, and performing binarization segmentation on the signed synthesized data.
7. A method according to any one of claims 1-3, wherein at least one of the following is met:
The first loss function comprises a binary cross entropy BCE loss function and a generation of an antagonism network GAN loss function; or (b)
The second loss function includes a minimum absolute deviation L1 loss function, a perceived Perceptual loss function, and a GAN loss function.
8. A method according to any one of claims 1-3, wherein the inputting the binary segmented image output by the first deep learning segmentation network and the corresponding signed synthesized data into a second deep learning segmentation network for style recovery processing of the binary segmented image comprises:
After the training of the first deep learning segmentation network converges, fixing parameters of the first deep learning segmentation network, inputting the signed synthesized data into the first deep learning segmentation network for binarization segmentation, and obtaining a binary segmentation image;
And inputting the binary segmentation image and the corresponding chapter synthesis data into a second deep learning segmentation network to perform style recovery processing on the binary segmentation image, and obtaining a chapter-free prediction image.
9. An image processing method, comprising:
Training and generating a first deep learning segmentation network and a second deep learning segmentation network by the model training method according to any one of claims 1-8;
inputting the to-be-processed seal image into a first deep learning segmentation network for binarization segmentation to obtain a binary segmentation image;
And inputting the binary segmentation image output by the first deep learning segmentation network and the corresponding seal image into a second deep learning segmentation network for style recovery processing of the binary segmentation image, and obtaining a seal-free prediction image.
10. A model training apparatus comprising:
a first network training unit configured to:
Inputting chapter synthesis data into a first deep learning segmentation network for binarization segmentation to obtain a binary segmentation image, wherein the chapter synthesis data is data obtained by synthesizing chapter image sample data and chapter-free images;
According to the binary segmentation image and an untaped image which is processed based on a preset first strategy and is associated with the seal-forming synthesized data, the first deep learning segmentation network is supervised and trained through a first loss function until loss converges, and the preset first strategy comprises binarizing and skeletonizing the untaped image;
a second network training unit configured to:
inputting the binary segmentation image output by the first deep learning segmentation network and the corresponding chapter synthesis data into a second deep learning segmentation network for style recovery processing of the binary segmentation image, and obtaining a chapter-free prediction image;
and according to the seal-free predictive image and the seal-free image associated with the seal-formed synthesized data, performing supervision training on the second deep learning segmentation network through a second loss function until the loss converges.
11. The apparatus of claim 10, further comprising:
A chapter sample acquisition unit configured to acquire chapter image sample data;
a chapter-synthesis-data obtaining unit configured to obtain chapter synthesis data from the chapter image sample data and a chapter-less image;
And a sample data generating unit configured to associate the chapter-resultant data with the chapter-free image based thereon so as to be sample data of a training image processing model.
12. An image processing apparatus comprising:
model training apparatus according to claim 10 or 11; and
The first deep learning segmentation network is configured to perform binarization segmentation according to the to-be-processed seal image to obtain a binary segmentation image;
And the second deep learning segmentation network is configured to acquire a chapter-free prediction image according to the binary segmentation image output by the first deep learning segmentation network and the corresponding chapter-free image.
13. An image processing apparatus includes
A memory; and
A processor coupled to the memory, the processor configured to perform the method of any of claims 1-9 based on instructions stored in the memory.
14. A computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the method of any of claims 1 to 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010309634.0A CN113537223B (en) | 2020-04-20 | 2020-04-20 | Training sample generation, model training and image processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010309634.0A CN113537223B (en) | 2020-04-20 | 2020-04-20 | Training sample generation, model training and image processing method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113537223A CN113537223A (en) | 2021-10-22 |
CN113537223B true CN113537223B (en) | 2024-07-19 |
Family
ID=78093536
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010309634.0A Active CN113537223B (en) | 2020-04-20 | 2020-04-20 | Training sample generation, model training and image processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113537223B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109886974A (en) * | 2019-01-28 | 2019-06-14 | 北京易道博识科技有限公司 | A kind of seal minimizing technology |
CN110189336A (en) * | 2019-05-30 | 2019-08-30 | 上海极链网络科技有限公司 | Image generating method, system, server and storage medium |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11003995B2 (en) * | 2017-05-19 | 2021-05-11 | Huawei Technologies Co., Ltd. | Semi-supervised regression with generative adversarial networks |
US10624558B2 (en) * | 2017-08-10 | 2020-04-21 | Siemens Healthcare Gmbh | Protocol independent image processing with adversarial networks |
CN109658418A (en) * | 2018-10-31 | 2019-04-19 | 百度在线网络技术(北京)有限公司 | Learning method, device and the electronic equipment of scene structure |
CN110176006B (en) * | 2019-05-15 | 2021-08-27 | 北京航空航天大学 | Image foreground object segmentation method and device |
CN110322495B (en) * | 2019-06-27 | 2021-11-02 | 电子科技大学 | A scene text segmentation method based on weakly supervised deep learning |
-
2020
- 2020-04-20 CN CN202010309634.0A patent/CN113537223B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109886974A (en) * | 2019-01-28 | 2019-06-14 | 北京易道博识科技有限公司 | A kind of seal minimizing technology |
CN110189336A (en) * | 2019-05-30 | 2019-08-30 | 上海极链网络科技有限公司 | Image generating method, system, server and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN113537223A (en) | 2021-10-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105981051B (en) | Layering for image analysis interconnects multiple dimensioned convolutional network | |
CN106254933B (en) | Subtitle extraction method and device | |
US10134441B2 (en) | Method and system for overlaying image in video stream | |
US9275030B1 (en) | Horizontal and vertical line detection and removal for document images | |
CN112669515B (en) | Bill image recognition method and device, electronic equipment and storage medium | |
KR20210043681A (en) | Binaryization and normalization-based inpainting for text removal | |
JP7026165B2 (en) | Text recognition method and text recognition device, electronic equipment, storage medium | |
KR102142567B1 (en) | Image composition apparatus using virtual chroma-key background, method and computer program | |
JPH055146B2 (en) | ||
Sun et al. | Safl-net: Semantic-agnostic feature learning network with auxiliary plugins for image manipulation detection | |
CN112749696B (en) | Text detection method and device | |
CN106980857B (en) | Chinese calligraphy segmentation and recognition method based on copybook | |
US11113519B2 (en) | Character recognition apparatus, character recognition program, and character recognition method | |
Janani et al. | Recognition and analysis of Tamil inscriptions and mapping using image processing techniques | |
KR20110131949A (en) | Image processing apparatus and method | |
CN107977359B (en) | Method for extracting scene information of movie and television scenario | |
KR20200077037A (en) | Method and apparatus of extracting multiple cuts from a comic book | |
CN108877030B (en) | Image processing method, device, terminal and computer readable storage medium | |
CN113537223B (en) | Training sample generation, model training and image processing method and device | |
CN115270184A (en) | Video desensitization method, vehicle video desensitization method and vehicle-mounted processing system | |
EP4047547B1 (en) | Method and system for removing scene text from images | |
Rahman et al. | Text Information Extraction from Digital Image Documents Using Optical Character Recognition | |
KR20190017635A (en) | Apparatus and method for acquiring foreground image | |
JP2012003480A (en) | Telop character area detector and program | |
JP2012222581A (en) | Image processing device, image processing method, program, and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |