[go: up one dir, main page]

CN110097059B - Document image binarization method, system and device based on generative adversarial network - Google Patents

Document image binarization method, system and device based on generative adversarial network Download PDF

Info

Publication number
CN110097059B
CN110097059B CN201910222323.8A CN201910222323A CN110097059B CN 110097059 B CN110097059 B CN 110097059B CN 201910222323 A CN201910222323 A CN 201910222323A CN 110097059 B CN110097059 B CN 110097059B
Authority
CN
China
Prior art keywords
image
neural network
binary
convolutional neural
original document
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910222323.8A
Other languages
Chinese (zh)
Other versions
CN110097059A (en
Inventor
肖柏华
赵晋媛
贾馥溪
王春恒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201910222323.8A priority Critical patent/CN110097059B/en
Publication of CN110097059A publication Critical patent/CN110097059A/en
Application granted granted Critical
Publication of CN110097059B publication Critical patent/CN110097059B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/28Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

本发明属于图像处理领域,具体涉及一种基于生成对抗网络的文档图像二值化方法、系统、装置,旨在为了解决解决现有二值化方法在文档图片的图像质量参差不齐的情况下其二值化准确度不稳定,鲁棒性较差的问题。本发明方法包括:对原始文档图像进行切分;分基于第一卷积神经网络分别对切分图像、归一化后的原始文档图像进行二值化处理;将得到的二值化图像分别通过拼接、缩放生成原始文档图像尺寸,并与原始文档图像的灰度图进行合并,进行图片切分后通过第二卷积神经网络进二值化,并合并得到的二值化图像块得到最终的二值化图。本发明对于多种类型文档的拍照文档图像可以获取准确度较高的二值化图像,且具有较高的稳定性,鲁棒性强。

Figure 201910222323

The invention belongs to the field of image processing, and in particular relates to a method, system and device for document image binarization based on a generative adversarial network, aiming to solve the problem that the image quality of the document image is uneven in the existing binarization method. Its binarization accuracy is unstable and the robustness is poor. The method of the invention includes: segmenting the original document image; segmenting the segmented image and the normalized original document image respectively based on the first convolutional neural network to binarize the image; The size of the original document image is generated by splicing and scaling, and merged with the grayscale image of the original document image. After the image is segmented, the second convolutional neural network is used for binarization, and the binarized image blocks obtained are merged to obtain the final image. Binarized graph. The present invention can obtain binarized images with high accuracy for photographed document images of various types of documents, and has high stability and strong robustness.

Figure 201910222323

Description

Document image binarization method, system and device based on generation countermeasure network
Technical Field
The invention belongs to the field of image processing, and particularly relates to a document image binarization method, system and device based on a generation countermeasure network.
Background
In recent years, with the rapid development of network technology, people have entered into an information era, and traditional information acquisition methods, such as books, newspapers, periodicals and the like, are inconvenient to carry, require a large amount of space for storage, and are inconvenient to edit, arrange and transmit. There is an increasing tendency to use electronic devices such as magnetic disks for storage, so that it is important to input text information of paper materials into computers quickly, and OCR (Optical Character Recognition) technology is generated thereby. The OCR technology can realize high-speed and automatic input of text information, saves a large amount of human resources, and is widely applied at present.
The success of the OCR technology depends on the preprocessing work of the text image, good binarization processing can be carried out on the image, the accuracy of OCR identification can be greatly improved, and the binarization work has great research value. In practical application, the quality of a text image may be different, and may have troubles such as unclear printing or noise, and the existing binarization method has unstable binarization accuracy and poor robustness under the condition of uneven image quality of a document picture.
Disclosure of Invention
In order to solve the above problems in the prior art, that is, to solve the problems of unstable binarization accuracy and poor robustness of the existing binarization method under the condition of uneven image quality of the document picture, a first aspect of the present invention provides a document image binarization method based on a generation countermeasure network, the method comprising:
step S10, acquiring a plurality of image blocks with a preset first size from an input original document image according to a set step length to serve as a first image block set;
step S20, for the first image block set, obtaining a binary image of each image block through a first convolutional neural network to obtain a second image block set; normalizing the original document image to the first size, and acquiring a binary image of the original document image through the first convolution neural network to serve as a first binary image;
step S30, splicing the image blocks in the second image block set to obtain a second binary image; scaling the first binary image to the size of the original document image as a third binary image; acquiring a gray scale image of the original document image; combining the second binary image, the third binary image and the gray image of the original document image to obtain a three-channel image;
step S40, the three-channel image is segmented by the method of step S10 to obtain a third image block set, and a binary image of each image block is obtained through a second convolutional neural network and is used as a fourth image block set;
step S50, splicing the image blocks in the fourth image block set to obtain a final binary image of the original document image;
and the first convolutional neural network and the second convolutional neural network are cascaded to form a generator for generating a countermeasure network, and parameter optimization is carried out through training.
In some preferred embodiments, the discriminator of the countermeasure network is a patch-based fully convolutional neural network;
the first convolutional neural network and the second convolutional neural network are two semantic segmentation networks with the same structure; the first convolution neural network is used for generating a binarization image according to the context information of the local area; and the second convolutional neural network is used for correcting the output result of the first convolutional neural network according to the difference between the text and the background context information.
In some preferred embodiments, the loss function L during the training of the countermeasure networklossIs composed of
Figure BDA0002004026890000031
LcGAN(G,D)=Ex,y[log D(x,y)]+Ex[log(1-D(x,G(x,z)))]
LL1(G)=Ex,y[||(y-G(x,z))||1]
G, D denotes a generator and a discriminator in the countermeasure network; l iscGAN(G, D) is the penalty on confrontation of the generator and arbiter training, LL1(G) Image and true binary image generated for generatorL1 loss of an image, x is an input picture, z is random noise in a generator, G (x, z) represents a binarization result image generated by the generator using the input image x and the random noise z, y is a true binary image, γ is a weight coefficient corresponding to both losses, and D (x, y) is a discriminator output result corresponding to the input image and the true binarization sample.
In some preferred embodiments, the first convolutional neural network and the second convolutional neural network each comprise five convolutional layers and five deconvolution layers.
In some preferred embodiments, each image block in the first set of image blocks has a second-size area in the center of the image that does not overlap with other image blocks in the first set of image blocks.
In some preferred embodiments, the first dimension is a, and the second dimension is B;
determining the upper left points of four adjacent image blocks of the image block based on the upper left points [ a, b ] of the image blocks, wherein the method comprises the following steps:
the coordinates of the upper left point of the adjacent image block on the left side are [ a-A + (B/2), B ];
the coordinates of the upper left point of the adjacent image block on the right side are [ a + A- (B/2), B ];
the coordinate of the upper left point of the upper adjacent image block is [ a, B-A + (B/2) ];
and the coordinate of the upper left point of the lower adjacent image block is [ a, B + A- (B/2) ].
In some preferred embodiments, the first dimension is 256 × 256 and the second dimension is 128 × 128.
The invention provides a document image binarization system based on a generated countermeasure network, which comprises a segmentation module, a first convolution neural network processing module, a three-channel image acquisition module, a second convolution neural network processing module and a final binarization image acquisition module;
the segmentation module is configured to acquire a plurality of image blocks with a preset first size from an input text image according to a set step length and construct an image block set;
the first convolution neural network processing module is configured to acquire a first image block set from an original document image through the segmentation module, and acquire a binary image of each image block through a first convolution neural network to obtain a second image block set; normalizing the original document image to the first size, and acquiring a binary image of the original document image through the first convolution neural network to serve as a first binary image;
the three-channel image acquisition module is configured to splice all image blocks in the second image block set to obtain a second binary image; scaling the first binary image to the size of the original document image as a third binary image; acquiring a gray scale image of the original document image; combining the second binary image, the third binary image and the gray image of the original document image to obtain a three-channel image;
the second convolutional neural network processing module is configured to acquire a third image block set from the three-channel image through the segmentation module, and acquire a binary image of each image block through a second convolutional neural network to serve as a fourth image block set;
the final binary image acquisition module is configured to splice the image blocks in the fourth image block set to obtain a final binary image of the original document image;
and the first convolutional neural network and the second convolutional neural network are cascaded to form a generator for generating a countermeasure network, and parameter optimization is carried out through training.
In a third aspect of the present invention, a storage device is provided, in which a plurality of programs are stored, the programs being adapted to be loaded and executed by a processor to implement the above-mentioned document image binarization method based on a generative countermeasure network.
In a fourth aspect of the present invention, a processing apparatus is provided, which includes a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; the program is suitable for being loaded and executed by a processor to realize the document image binarization method based on the generation countermeasure network.
The invention has the beneficial effects that:
the invention can obtain the binary image with higher accuracy for the photographed document images of various documents, has higher stability and strong robustness, and simultaneously has good adaptability to the text extraction of the document images by adopting a double convolution neural network mode, and can overcome the interference of non-text noise.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is a schematic flow chart of a document image binarization method based on a generation countermeasure network according to an embodiment of the invention;
FIG. 2 is a diagram illustrating segmentation of an original document image according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a portion of a generator in a generate confrontation network architecture in accordance with an embodiment of the invention;
FIG. 4 is a diagram illustrating the structure of an arbiter in a structure for generating a countermeasure network in accordance with an embodiment of the present invention;
FIG. 5 is an example of results obtained via a first convolutional neural network in one embodiment of the present invention;
FIG. 6 is an example of an input image for a second convolutional neural network in one embodiment of the present invention;
FIG. 7 is an example of obtaining a final binarized map of an original document image according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
For a clearer explanation of the present invention, the following description will discuss parts of an embodiment of the present invention with reference to fig. 1 to 7.
In the invention, two convolutional neural networks are cascaded to carry out binarization processing, and in order to better explain the invention, the constitution and training of the two convolutional neural networks are described in advance, and then the document image binarization method based on the generation countermeasure network is described based on the two trained convolutional neural networks.
1. Construction and training of two convolutional neural networks
And the first convolutional neural network and the second convolutional neural network are cascaded to form a generator for generating the countermeasure network, and the countermeasure network is constructed based on the generator.
(1) Generator
In the designed generation countermeasure network, the first convolutional neural network and the second convolutional neural network are formed by cascading two semantic segmentation networks (U-NETs) with the same structure, wherein each U-NET network comprises five convolutional layers and five anti-convolutional layers so as to ensure that the sizes of input and output pictures are the same. The two U-NETs respectively have the following functions: the first U-NET structure mainly generates a binary image according to the context information of a local area and keeps text details as much as possible. And the second U-NET structure corrects the result image generated by the first part based on the context information difference between the text and the background under different scales so as to further eliminate the background noise. The generator structure is shown in the middle block part of fig. 3, in which G1 is the first convolutional neural network and G2 is the second convolutional neural network.
(2) Distinguishing device
The discriminator is a patch-based full convolution neural network. The purpose is to distinguish which of the binarized image generated by the generator and the original binarized image is more standard. The specific network structure is shown in fig. 4, and the binarized picture generated by the generator and the binarized picture sample corresponding to the input sample are compared and judged, wherein the result of the comparison and judgment between the binary image generated by the generator and the input image is false, and the result of the comparison and judgment between the standard binary image corresponding to the original image and the input image is true.
(3) Loss function
Loss function L in training against networklossIs composed of
Figure BDA0002004026890000071
LcGAN(G,D)=Ex,y[log D(x,y)]+Ex[log(1-D(x,G(x,z)))]
LL1(G)=Ex,y[||(y-G(x,z))||1]
G, D denotes the generator and the discriminator in the countermeasure networkLcGAN(G, D) is the penalty on confrontation of the generator and arbiter training, LL1(G) L1 loss of the image generated for the generator and the true binary image; x is an input picture; z is random noise in the generator; g (x, z) denotes a binarization result image generated by the generator using the input image x and the random noise z, y is a real binary image, γ is a weighting coefficient corresponding to two kinds of losses (γ is 1 in some embodiments), and D (x, y) is a discriminator output result corresponding to the input image and the real binarization sample.
2. The method of the invention
The document image binarization method based on the generation countermeasure network of one embodiment of the invention is as shown in FIG. 1, and comprises the following steps:
step S10, acquiring a plurality of image blocks of a preset first size from the input original document image as a first image block set according to a set step size.
In this embodiment, each image block in the first image block set has a second-size area at the center of the image that is not overlapped with other image blocks in the first image block set.
For example, the first size is a (for example, 256 × 256) and the second size is B × B (for example, 128 × 128), the original photographed document image is cut into image blocks of a × a size according to a certain step size, the B × B areas in the center of each image block do not overlap, and in order to implement non-overlap, the following method may be used to determine the positions of adjacent image blocks:
determining the upper left points of four image blocks adjacent to the image block based on the upper left points [ a, b ] of the image blocks: :
the coordinates of the upper left point of the adjacent image block on the left side are [ a-A + (B/2), B ];
the coordinates of the upper left point of the adjacent image block on the right side are [ a + A- (B/2), B ];
the coordinate of the upper left point of the upper adjacent image block is [ a, B-A + (B/2) ];
and the coordinate of the upper left point of the lower adjacent image block is [ a, B + A- (B/2) ].
For example, in one embodiment, the first size is 256 × 256, the second size is 128 × 128, and the upper left dot of an image block is [ a, b ], then the coordinates of the upper left dot of the corresponding left adjacent image block are [ a-256+64, b ]; the coordinates of the upper left point of the right adjacent image block are [ a +256-64, b ]; the coordinates of the upper left point of the upper adjacent image block are [ a, b-256+64 ]; the coordinates of the upper left point of the lower adjacent image block are [ a, b +256-64 ].
FIG. 2 is a schematic diagram illustrating an example of segmenting an original document image, in which bars indicate the correspondence between the positions of the original document image and the segmented image.
Step S20, for the first image block set, obtaining a binary image of each image block through a first convolutional neural network to obtain a second image block set; normalizing the original document image to the first size, and acquiring a binary image of the original document image through the first convolution neural network to serve as a first binary image.
In the embodiment, each image block in the first image block set is input into a trained first convolution neural network to obtain an initial binarization result image corresponding to each image block, so as to obtain a second image block set; meanwhile, the original document image is entirely normalized to a × a (for example, 256 × 256), and a binarization result thereof is obtained through a first convolution neural network as a first binary image.
Fig. 5 is an example of the result of stitching the binarized image blocks generated by the first convolutional neural network into the original image size in an embodiment of the present invention, where five examples (a), (b), (c), (d), and (e) are given.
Step S30, splicing the image blocks in the second image block set to obtain a second binary image; scaling the first binary image to the size of the original document image as a third binary image; acquiring a gray scale image of the original document image; and combining the second binary image, the third binary image and the gray image of the original document image to obtain a three-channel image.
In this embodiment, the second convolutional neural network input image is composed of three channels, and therefore the three-channel image needs to be obtained in advance in this step, the method includes:
combining and splicing the image blocks in the second image block set obtained in the step S30 by adopting the information segmented in the step S10, and recovering a preliminary binarization result of the original document image as a second binary image, wherein the image is a first channel of a second convolutional neural network input image;
scaling the first binary image obtained in the step S30 to the size of the original document image as a third binary image, which is a second channel of the second convolutional neural network input image;
acquiring a gray scale image of an original document image as a third channel of a second convolution neural network input image;
and combining the second binary image, the third binary image and the gray level image of the original document image to obtain a three-channel image.
Two three-channel image examples are obtained as shown in fig. 6.
And step S40, the three-channel image is segmented by the method of step S10 to obtain a third image block set, and a binary image of the image blocks is obtained through a second convolutional neural network and is used as a fourth image block set.
And step S50, splicing the image blocks in the fourth image block set to obtain a final binary image of the original document image.
In this embodiment, each image block in the fourth image block set obtained in step S40 is combined and spliced with the information segmented in step S10, and is restored to a binarization result image corresponding to the original document image, and the image is used as a final binarization image of the original document image.
The image binarization process of the present invention can also be shown in fig. 3, an input image (original document image) is subjected to image segmentation to obtain a segmented image block set, scale scaling is performed to obtain a normalized original image, a grayscale image of the original document image is obtained through grayscale processing, the segmented image block set is subjected to binary image block merging obtained through G1 to obtain an image (1), the normalized original image is subjected to binarization through G1 to obtain an image (2), the image (1), the image (2) and the grayscale image of the original document image are merged and then subjected to image segmentation again, and then a plurality of binarized images are obtained through G2 and finally the binarized image is obtained after merging.
FIG. 7 is an example of the final binary image of the original document image obtained by one embodiment of the present invention, which includes five result examples (a), (b), (c), (d), and (e), which correspond to the respective graphs in FIG. 5.
The document image binarization system based on the generation countermeasure network comprises a segmentation module, a first convolution neural network processing module, a three-channel image acquisition module, a second convolution neural network processing module and a final binarization image acquisition module.
The segmentation module is configured to acquire a plurality of image blocks with a preset first size from an input text image according to a set step length, and construct an image block set.
The first convolution neural network processing module is configured to acquire a first image block set from an original text image through the segmentation module, and acquire a binary image of each image block through a first convolution neural network to obtain a second image block set; normalizing the original text image to the first size, and acquiring a binary image of the original text image through the first convolution neural network to serve as the first binary image.
The three-channel image acquisition module is configured to splice all image blocks in the second image block set to obtain a second binary image; scaling the first binary image to the size of the original text image to be used as a third binary image; acquiring a gray scale image of the original text image; and combining the second binary image, the third binary image and the gray image of the original text image to obtain a three-channel image.
The second convolutional neural network processing module is configured to acquire a third image block set from the three-channel image through the segmentation module, and acquire a binary image of the image blocks through the second convolutional neural network as a fourth image block set.
And the final binary image acquisition module is configured to splice the image blocks in the fourth image block set to obtain a final binary image of the original text image.
And the first convolutional neural network and the second convolutional neural network are cascaded to form a generator for generating a countermeasure network, and parameter optimization is carried out through training.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the system described above may refer to the corresponding process in the foregoing method embodiments, and will not be described herein again.
It should be noted that, the document image binarization system based on generation countermeasure network provided in the foregoing embodiment is only exemplified by the division of the above functional modules, and in practical applications, the above functions may be allocated to different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are further decomposed or combined, for example, the modules in the foregoing embodiment may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the above described functions. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.
The storage device of an embodiment of the present invention stores a plurality of programs, and the programs are suitable for being loaded and executed by a processor to realize the document image binarization method based on the generation countermeasure network.
The processing device of one embodiment of the invention comprises a processor and a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; the program is suitable for being loaded and executed by a processor to realize the document image binarization method based on the generation countermeasure network.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
Those of skill in the art would appreciate that the various illustrative modules, method steps, and modules described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that programs corresponding to the software modules, method steps may be located in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.
The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (8)

1. A document image binarization method based on a generation countermeasure network is characterized by comprising the following steps:
step S10, acquiring a plurality of image blocks with a preset first size from an input original document image according to a set step length to serve as a first image block set;
step S20, for the first image block set, obtaining a binary image of each image block through a first convolutional neural network to obtain a second image block set; normalizing the original document image to the first size, and acquiring a binary image of the original document image through the first convolution neural network to serve as a first binary image;
step S30, splicing the image blocks in the second image block set to obtain a second binary image; scaling the first binary image to the size of the original document image as a third binary image; acquiring a gray scale image of the original document image; combining the second binary image, the third binary image and the gray image of the original document image to obtain a three-channel image;
step S40, the three-channel image is segmented by the method of step S10 to obtain a third image block set, and a binary image of each image block in the third image block set is obtained through a second convolutional neural network and is used as a fourth image block set;
step S50, splicing the image blocks in the fourth image block set to obtain a final binary image of the original document image;
the first convolutional neural network and the second convolutional neural network are cascaded to form a generator for generating an antagonistic network, the patch-based full convolutional neural network is used as a discriminator for generating the antagonistic network, and parameter optimization is carried out through training;
the first convolutional neural network and the second convolutional neural network are two semantic segmentation networks with the same structure, and each semantic segmentation network comprises five convolutional layers and five deconvolution layers.
2. The document image binarization method based on generation countermeasure network as claimed in claim 1, wherein the first convolution neural network is used for generating a binarization image according to context information of local area; and the second convolutional neural network is used for correcting the output result of the first convolutional neural network according to the difference between the text and the background context information.
3. The document image binarization method based on generation countermeasure network as claimed in claim 2, characterized in that the loss function L in generation countermeasure network traininglossIs composed of
Figure FDA0002944369850000021
LcGAN(G,D)=Ex,y[logD(x,y)]+Ex[log(1-D(x,G(x,z)))]
LL1(G)=Ex,y[||(y-G(x,z))||1]
G, D denotes a generator and a discriminator in the generation countermeasure network respectively; l iscGAN(G, D) is the penalty on confrontation of the generator and arbiter training, LL1(G) L1 loss of the image generated by the generator and the true binary image, x is the input picture, z is the random noise in the generator, G (x, z) represents the binary value generated by the generator using the input image x and the random noise zAnd (3) transforming the result image, wherein y is a real binary image, gamma is a weight coefficient corresponding to two losses, and D (x, y) is a result output by a discriminator corresponding to the input image and the real binary sample.
4. The document image binarization method based on generation countermeasure network as claimed in any one of claims 1-3, wherein each image block in the first image block set has a second size area of image center not overlapping with other image blocks in the first image block set.
5. The document image binarization method based on the generation countermeasure network as claimed in claim 4, wherein the first size is A x A, and the second size is B x B;
determining the upper left points of four adjacent image blocks of the image block based on the upper left points [ a, b ] of the image blocks, wherein the method comprises the following steps:
the coordinates of the upper left point of the adjacent image block on the left side are [ a-A + (B/2), B ];
the coordinates of the upper left point of the adjacent image block on the right side are [ a + A- (B/2), B ];
the coordinate of the upper left point of the upper adjacent image block is [ a, B-A + (B/2) ];
and the coordinate of the upper left point of the lower adjacent image block is [ a, B + A- (B/2) ].
6. The document image binarization method based on generation countermeasure network as claimed in claim 5, wherein the first size is 256 x 256, and the second size is 128 x 128.
7. A document image binarization system based on a generated countermeasure network is characterized by comprising a segmentation module, a first convolution neural network processing module, a three-channel image acquisition module, a second convolution neural network processing module and a final binarization image acquisition module;
the segmentation module is configured to acquire a plurality of image blocks with a preset first size from an input text image according to a set step length and construct an image block set;
the first convolution neural network processing module is configured to acquire a first image block set from an original document image through the segmentation module, and acquire a binary image of each image block through a first convolution neural network to obtain a second image block set; normalizing the original document image to the first size, and acquiring a binary image of the original document image through the first convolution neural network to serve as a first binary image;
the three-channel image acquisition module is configured to splice all image blocks in the second image block set to obtain a second binary image; scaling the first binary image to the size of the original document image as a third binary image; acquiring a gray scale image of the original document image; combining the second binary image, the third binary image and the gray image of the original document image to obtain a three-channel image;
the second convolutional neural network processing module is configured to acquire a third image block set from the three-channel image through the segmentation module, and acquire a binary image of each image block in the third image block set through a second convolutional neural network as a fourth image block set;
the final binary image acquisition module is configured to splice the image blocks in the fourth image block set to obtain a final binary image of the original document image;
the first convolutional neural network and the second convolutional neural network are cascaded to form a generator for generating an antagonistic network, the patch-based full convolutional neural network is used as a discriminator for generating the antagonistic network, and parameter optimization is carried out through training;
the first convolutional neural network and the second convolutional neural network are two semantic segmentation networks with the same structure, and each semantic segmentation network comprises five convolutional layers and five deconvolution layers.
8. A storage device having stored therein a plurality of programs, wherein the programs are adapted to be loaded and executed by a processor to implement the method for binarizing document images based on generative countermeasure networks according to any one of claims 1 to 6.
CN201910222323.8A 2019-03-22 2019-03-22 Document image binarization method, system and device based on generative adversarial network Active CN110097059B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910222323.8A CN110097059B (en) 2019-03-22 2019-03-22 Document image binarization method, system and device based on generative adversarial network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910222323.8A CN110097059B (en) 2019-03-22 2019-03-22 Document image binarization method, system and device based on generative adversarial network

Publications (2)

Publication Number Publication Date
CN110097059A CN110097059A (en) 2019-08-06
CN110097059B true CN110097059B (en) 2021-04-02

Family

ID=67443030

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910222323.8A Active CN110097059B (en) 2019-03-22 2019-03-22 Document image binarization method, system and device based on generative adversarial network

Country Status (1)

Country Link
CN (1) CN110097059B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110516202B (en) * 2019-08-20 2023-05-30 Oppo广东移动通信有限公司 Document generator acquisition method, document generation device and electronic equipment
CN110717523A (en) * 2019-09-20 2020-01-21 湖北工业大学 A low-quality document image binarization method based on D-LinkNet
CN110895828B (en) * 2019-12-03 2023-04-18 武汉纺织大学 Model and method for generating MR (magnetic resonance) image simulating heterogeneous flexible biological tissue
CN111695596A (en) * 2020-04-30 2020-09-22 华为技术有限公司 Neural network for image processing and related equipment
CN112949646B (en) * 2021-02-26 2023-12-19 平安科技(深圳)有限公司 Semantic segmentation method, device, equipment and medium for electron microscopic fault data
CN112837329B (en) * 2021-03-01 2022-07-19 西北民族大学 A method and system for image binarization of Tibetan ancient book documents

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101021905A (en) * 2006-02-15 2007-08-22 中国科学院自动化研究所 File image binaryzation method
CN106203434A (en) * 2016-07-08 2016-12-07 中国科学院自动化研究所 Based on the symmetric file and picture binary coding method of stroke structure
CN109190722A (en) * 2018-08-06 2019-01-11 大连民族大学 Font style based on language of the Manchus character picture migrates transform method
CN109460735A (en) * 2018-11-09 2019-03-12 中国科学院自动化研究所 Document binary processing method, system, device based on figure semi-supervised learning

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2659745C1 (en) * 2017-08-28 2018-07-03 Общество с ограниченной ответственностью "Аби Продакшн" Reconstruction of the document from document image series
CN108986067B (en) * 2018-05-25 2020-08-14 上海交通大学 A cross-modality-based method for lung nodule detection
CN109190684B (en) * 2018-08-15 2022-03-04 西安电子科技大学 SAR image sample generation method based on sketch and structure generative adversarial network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101021905A (en) * 2006-02-15 2007-08-22 中国科学院自动化研究所 File image binaryzation method
CN106203434A (en) * 2016-07-08 2016-12-07 中国科学院自动化研究所 Based on the symmetric file and picture binary coding method of stroke structure
CN109190722A (en) * 2018-08-06 2019-01-11 大连民族大学 Font style based on language of the Manchus character picture migrates transform method
CN109460735A (en) * 2018-11-09 2019-03-12 中国科学院自动化研究所 Document binary processing method, system, device based on figure semi-supervised learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"A hybrid approach for document image binarization";A. Sakila等;《2017 International Conference on Inventive Computing and Informatics (ICICI)》;20180528;第645-650页 *
"An Effective Binarization Method for Disturbed Camera-Captured Document Images";Jinyuan Zhao等;《2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR)》;20181220;第339-344页 *
"文档图像二值化算法VFCM";童立靖等;《文档图像二值化算法VFCM》;20090731;第30卷(第13期);第3216-3218、3243页 *

Also Published As

Publication number Publication date
CN110097059A (en) 2019-08-06

Similar Documents

Publication Publication Date Title
CN110097059B (en) Document image binarization method, system and device based on generative adversarial network
US9235759B2 (en) Detecting text using stroke width based text detection
US8494273B2 (en) Adaptive optical character recognition on a document with distorted characters
US10699109B2 (en) Data entry from series of images of a patterned document
RU2659745C1 (en) Reconstruction of the document from document image series
CN107798321A (en) A kind of examination paper analysis method and computing device
CN105930159A (en) Image-based interface code generation method and system
RU2621601C1 (en) Document image curvature eliminating
CN112101386B (en) Text detection method, device, computer equipment and storage medium
CN108154132A (en) Method, system and equipment for extracting characters of identity card and storage medium
CN111915635A (en) Test question analysis information generation method and system supporting self-examination paper marking
CN112508000B (en) Method and equipment for generating OCR image recognition model training data
CN113850238A (en) Document detection method and device, electronic equipment and storage medium
CN113628113A (en) Image splicing method and related equipment thereof
CN112418199A (en) Multi-modal information extraction method and device, electronic equipment and storage medium
CN109685079B (en) Method and device for generating characteristic image category information
Hidayatullah et al. License plate detection and recognition for Indonesian cars
CN110135426B (en) Sample labeling method and computer storage medium
JP2000187705A (en) Document reader, document reading method and storage medium
Shweka et al. Automatic extraction of catalog data from digital images of historical manuscripts
CN116050379A (en) Document comparison method and storage medium
CN112837329B (en) A method and system for image binarization of Tibetan ancient book documents
CN116798041A (en) Image recognition method and device and electronic equipment
CN108230538A (en) A kind of bank note identification method, device, equipment and storage medium
JP4116377B2 (en) Image processing method and image processing apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant