CN101567049B

CN101567049B - Method for processing noise of half tone document image

Info

Publication number: CN101567049B
Application number: CN2009100226986A
Authority: CN
Inventors: 宋永红; 肖桂林; 孟高峰; 张元林; 雷冬冬
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2009-05-26
Filing date: 2009-05-26
Publication date: 2011-11-16
Anticipated expiration: 2029-05-26
Also published as: CN101567049A

Abstract

The invention relates to the fields of computer vision, graphics and image processing, and discloses a noise processing method for halftone document images. It includes the following steps: firstly, halftone document images are divided into four categories; secondly, for halftone document images with light background and dark text, the method based on connected region labeling is used to remove background noise; for light background and light For halftone document images with dark text, the method based on Gaussian smoothing filtering is used to remove background noise; for halftone document images with dark background and dark text, halftone document images with dark background and light text, Wiener-based Filtering method to remove background noise; finally, observe the halftone document image with background noise removed, construct text burr removal template and defect repair template, and remove text burrs and repair text defects through template matching.

Description

A Noise Processing Method for Halftone Document Image

技术领域 technical field

本发明涉及计算机视觉和图形、图像处理领域，特别涉及一种半色调文档图像的噪声处理方法，可应用于扫描文档图像、传真图像等各种半色调文档图像的文字去噪和文字提取，可进一步应用于半色调图像文档的文字识别及全文检索领域。The present invention relates to the fields of computer vision, graphics and image processing, in particular to a noise processing method for halftone document images, which can be applied to text denoising and text extraction of various halftone document images such as scanned document images and fax images, and can be used It is further applied to the fields of text recognition and full-text retrieval of halftone image documents.

背景技术 Background technique

随着打印机、扫描仪及传真机的普及，半色调文档图像广泛地应用于社会生活的各个领域。因此，为了实现半色调文档图像的文字识别及全文检索，针对半色调文档图像的文字去噪及文字提取技术显得非常有必要。一种传统的方法是对半色调图像进行逆半色调化，将其转化成对应的灰度图，但是，经这种方法处理后，图像中通常会产生较严重的模糊，使得文字部分的细节丢失，从而降低文字的识别率。另一种传统方法是对图像中的半色调点直接进行处理，可有效地避免产生文字模糊现象，然而，如何在去除半色调点图像中的图像部分、图形部分的同时，又保证尽量不丢失文字部分的细节，成为该技术中需要解决的一个关键问题。With the popularization of printers, scanners and facsimile machines, halftone document images are widely used in various fields of social life. Therefore, in order to realize text recognition and full-text retrieval of halftone document images, text denoising and text extraction technologies for halftone document images are very necessary. A traditional method is to inverse halftone the halftone image and convert it into the corresponding grayscale image. However, after this method, the image usually produces serious blurring, making the details of the text part blurred. Lost, thereby reducing the text recognition rate. Another traditional method is to directly process the halftone dots in the image, which can effectively avoid the blurring of text. However, how to remove the image part and graphic part in the halftone dot image while ensuring that the image is not lost as much as possible? The details of the text part become a key problem to be solved in this technology.

发明内容 Contents of the invention

本发明的目的在于提供一种半色调文档图像的噪声处理方法，能够去除半色调文档图像中的背景噪声，消除文字的毛刺噪声，保持文字笔画的完整性以及光顺性。The object of the present invention is to provide a noise processing method for a halftone document image, which can remove background noise in the halftone document image, eliminate burr noise of characters, and maintain the integrity and smoothness of character strokes.

为了达到上述目的，本发明采用以下技术方案予以实现。In order to achieve the above object, the present invention adopts the following technical solutions to achieve.

一种半色调文档图像的噪声处理方法，其特征在于，包括以下步骤：A noise processing method for a halftone document image, comprising the following steps:

首先，将半色调文档图像分为四类：浅色背景和深色文字、浅色背景和浅色文字、深色背景和深色文字、深色背景和浅色文字；First, divide halftone document images into four categories: light background and dark text, light background and light text, dark background and dark text, and dark background and light text;

其次，去除半色调文档图像的背景噪声：对于浅色背景和深色文字的半色调文档图像，采用基于连通区域标记的方法去除背景噪声；对于浅色背景和浅色文字的半色调文档图像，采用基于高斯平滑滤波的方法去除背景噪声；对于深色背景和深色文字的半色调文档图像、深色背景和浅色文字的半色调文档图像，采用基于维纳滤波的方法去除背景噪声；Secondly, remove the background noise of the halftone document image: for the halftone document image with light background and dark text, the method based on connected region labeling is used to remove the background noise; for the halftone document image with light background and light text, Use a method based on Gaussian smoothing filtering to remove background noise; for halftone document images with dark background and dark text, and halftone document images with dark background and light text, use a method based on Wiener filtering to remove background noise;

最后，观察去除背景噪声的半色调文档图像，构造文字的毛刺去除模板和缺陷修复模板，通过模板匹配去除文字毛刺和修复文字缺陷。Finally, observe the halftone document image with background noise removed, construct text deburring templates and defect repair templates, and remove text burrs and repair text defects through template matching.

本发明的进一步改进和特点在于：Further improvements and features of the present invention are:

(1)所述将半色调文档图像分为四类，具体为：对半色调文档图像进行高斯平滑滤波，计算高斯平滑滤波后半色调文档图像的Otsu阈值及灰度直方图，根据Otsu阈值估计背景的半色调点密度，根据灰度直方图估计文字的半色调点密度，并根据背景的半色调点密度和文字的半色调点密度，将半色调文档图像分为四类，即浅色背景和深色文字、浅色背景和浅色文字、深色背景和深色文字、深色背景和浅色文字。(1) The described halftone document image is divided into four classes, specifically: carry out Gaussian smoothing filter to halftone document image, calculate the Otsu threshold value and the gray level histogram of halftone document image after Gaussian smoothing filter, estimate according to Otsu threshold value The halftone dot density of the background, the halftone dot density of the text is estimated according to the gray histogram, and the halftone document image is divided into four categories according to the halftone dot density of the background and the halftone dot density of the text, namely light background and dark text, light background and light text, dark background and dark text, dark background and light text.

(2)所述文字的毛刺去除模板用表格表示，包括：(2) The deburring template for the text is expressed in a table, including:

其中，单元格代表像素，单元格之间的位置关系代表像素间的位置关系；单元格中，1表示文字像素，0表示背景像素，①表示需要去除的文字像素，即需要把文字像素1变为背景像素0；空单元格表示为背景像素0或文字像素1。Among them, a cell represents a pixel, and the positional relationship between cells represents the positional relationship between pixels; in a cell, 1 represents a text pixel, 0 represents a background pixel, and ① represents a text pixel that needs to be removed, that is, the text pixel 1 needs to be changed to is background pixel 0; empty cells are represented as background pixel 0 or text pixel 1.

(3)所述缺陷修复模板用表格表示，包括以下表格：(3) The defect repair template is represented by a table, including the following table:

其中，单元格代表像素，单元格之间的位置关系代表像素间的位置关系；单元格中，1表示文字像素，0表示背景像素，

表示需要修复的文字像素，即需要把背景像素0变为文字像素1；空单元格表示为背景像素0或文字像素1。Among them, the cell represents the pixel, and the positional relationship between the cells represents the positional relationship between the pixels; in the cell, 1 represents the text pixel, 0 represents the background pixel,

Indicates the text pixel that needs to be repaired, that is, the background pixel 0 needs to be changed to text pixel 1; an empty cell is represented as background pixel 0 or text pixel 1.

本发明通过将半色调文档图像分成四类别，再对这四个类别自适应地选择最合适的背景噪声的去除方法，能在进行半色调文档图像初步去噪的同时，尽可能多的保持文字的细节，然后，通过构造的一系列文字的毛刺去除模板和缺陷修复模板，通过模板匹配，对初步去噪的半色调文档图像的文字进行毛刺去除以及缺陷修复，使得进一步对文字的OCR识别变得更加容易。The present invention divides the halftone document image into four categories, and then adaptively selects the most suitable background noise removal method for these four categories, and can keep as many characters as possible while performing preliminary denoising of the halftone document image. Then, through the construction of a series of text deburring templates and defect repair templates, through template matching, deburring and defect repair are performed on the text of the preliminary denoised halftone document image, so that further OCR recognition of text becomes easier. more easily.

本发明对于基于误差扩散的半色调文档图像更有效，一方面在于基于误差扩散的半色调技术可以用一个近似的线性模型来模拟，因此维纳滤波的方法对于该类半色调图像更有效；另一方面在于基于误差扩散的半色调技术上利用半色调点的密度而非半色调点的大小来把灰度图像转化成半色调图像的，因此基于联通区域标记的方法对于该类半色调图像更有效。The present invention is more effective for halftone document images based on error diffusion. On the one hand, the halftone technology based on error diffusion can be simulated by an approximate linear model, so the method of Wiener filtering is more effective for such halftone images; On the one hand, the halftone technology based on error diffusion uses the density of halftone dots instead of the size of halftone dots to convert grayscale images into halftone images, so the method based on Unicom region marking is more effective for this type of halftone images. efficient.

附图说明 Description of drawings

下面结合附图和具体实施方式对本发明做进一步详细说明。The present invention will be described in further detail below in conjunction with the accompanying drawings and specific embodiments.

图1是半色调文档图像的噪声处理方法的总体流程图。FIG. 1 is an overall flowchart of a noise processing method for a halftone document image.

图2是高斯平滑后的半色调文档图像的灰度直方图，其中，横坐标为灰度级别，范围为0-255，纵坐标为直方图的累计值。FIG. 2 is a grayscale histogram of a halftone document image after Gaussian smoothing, wherein the abscissa is the grayscale level ranging from 0 to 255, and the ordinate is the cumulative value of the histogram.

图3是基于连通区域标记的半色调文档图像的背景去噪的流程图。Fig. 3 is a flowchart of background denoising of a halftone document image based on connected region labeling.

图4是基于高斯平滑滤波的半色调文档图像的背景去噪的流程图。FIG. 4 is a flow chart of background denoising of a halftone document image based on Gaussian smoothing filtering.

图5是基于维纳滤波滤波的半色调文档图像的背景去噪的流程图。FIG. 5 is a flow chart of background denoising of a halftone document image based on Wiener filtering.

图6是半色调文档图像中的背景和文字的颜色深度组合图，(a)深色背景和深色文字，(b)深色背景和浅色文字，(c)浅色背景和深色文字，(d)浅色背景和浅色文字。Figure 6 is a graph of color depth combinations of background and text in a halftone document image, (a) dark background and dark text, (b) dark background and light text, (c) light background and dark text , (d) light background and light text.

图7是对应图6的背景去噪结果图。FIG. 7 is a background denoising result map corresponding to FIG. 6 .

图8(a)是去除背景噪声后的半色调文档图像的反色图(主要包含毛刺)，图8(b)是图5(a)的局部A的放大(800％)示意图，图8(c)是图8(a)的局部B的放大(800％)示意图。Fig. 8(a) is an inverse image of a halftone document image after removing background noise (mainly including glitches), Fig. 8(b) is an enlarged (800%) schematic diagram of part A in Fig. 5(a), Fig. 8( c) is an enlarged (800%) schematic diagram of part B of Fig. 8(a).

图9(a)、图9(b)、图9(c)分别是的对应图8(a)、图8(b)、图8(c)去除文字噪声的效果图。Figure 9(a), Figure 9(b), and Figure 9(c) are the effect diagrams corresponding to Figure 8(a), Figure 8(b), and Figure 8(c), respectively, for text noise removal.

图10(a)是去除背景噪声后的半色调文档图像的反色放大(300％)图(既包含毛刺又包含缺陷)；图10(b)是图10(a)去除文字噪声的效果图。Figure 10(a) is an inverse enlarged (300%) image of a halftone document image after background noise has been removed (contains both burrs and defects); Figure 10(b) is the rendering of Figure 10(a) with text noise removed .

具体实施方式 Detailed ways

参照图1，本发明半色调文档图像的噪声处理方法，主要包括以下步骤。Referring to FIG. 1 , the method for processing noise of a halftone document image according to the present invention mainly includes the following steps.

(1)、对半色调文档图像进行高斯平滑滤波，计算高斯平滑滤波后半色调文档图像的Otsu阈值及灰度直方图，根据Otsu阈值估计背景的半色调点密度，根据灰度直方图估计文字的半色调点密度，并根据背景的半色调点密度和文字的半色调点密度，将半色调文档图像分为四类：浅色背景和深色文字、浅色背景和浅色文字、深色背景和深色文字、深色背景和浅色文字。(1), Gaussian smoothing filtering is performed on the halftone document image, the Otsu threshold and the grayscale histogram of the halftone document image after the Gaussian smoothing filter are calculated, the halftone point density of the background is estimated according to the Otsu threshold, and the text is estimated according to the grayscale histogram halftone dot density, and according to the halftone dot density of the background and the halftone dot density of the text, halftone document images are divided into four categories: light background and dark text, light background and light text, dark Background and dark text, dark background and light text.

具体为：对半色调图像采用5×5(像素)模板的高斯平滑滤波，计算对高斯平滑滤波后的图像的Otsu阈值及灰度直方图。Specifically, a Gaussian smoothing filter of a 5×5 (pixel) template is used for the halftone image, and an Otsu threshold and a grayscale histogram of the image after the Gaussian smoothing filter are calculated.

根据Otsu阈值估计背景的半色调点密度，将半色调文档图像按照背景的颜色深度分为深色背景和浅色背景两类。关于Otsu阈值以及利用Otsu阈值进行灰度图像的二值化，参考以下文献：N.Otsu，“A Threshold SelectionMethod from Grey Level Histograms”，IEEE TRANSACTIONS ON SYSTREMS，MAN，AND CYBERNETICS，VOL.SMC-9，NO.1，JANUARY 1979。发明人通过大量实验验证，得到表1所述的背景颜色深度与二值化的Otsu阈值之间的关系：The halftone dot density of the background is estimated according to the Otsu threshold, and the halftone document image is divided into dark background and light background according to the color depth of the background. For the Otsu threshold and the binarization of grayscale images using the Otsu threshold, refer to the following literature: N.Otsu, "A Threshold Selection Method from Gray Level Histograms", IEEE TRANSACTIONS ON SYSTREMS, MAN, AND CYBERNETICS, VOL.SMC-9, NO.1, JANUARY 1979. The inventor has verified through a large number of experiments, and obtained the relationship between the background color depth described in Table 1 and the Otsu threshold of binarization:

表1Table 1

背景灰度值(0-255) Background gray value (0-255) 128 128 160 160 192 192 225 225 240 240 250 250 Otsu阈值 Otsu Threshold 65 65 86 86 107 107 134 134 145 145 153 153

优选的Otsu判断阈值为105。The preferred Otsu judgment threshold is 105.

根据灰度直方图估计文字的半色调点密度，将半色调文档图像按照文字的颜色深度分为深色文字和浅色文字两类。发明人通过大量实验验证，发现在0-255量值范围下，灰度判断值优选20，能够满足大部分半色调文档图像的分类。高斯平滑滤波后半色调文档图像的灰度直方图如图2所示，可以看到其中有两个显著的峰值：灰度值较小的峰值对应于图像中的文字部分(即文字峰值)，而灰度值较大的峰值对应于图像的背景部分(背景峰值)。通过计算高斯平滑滤波后半色调文档图像的文字峰值在、灰度级别上的位置，可以得到半色调图像的文字的大致颜色深度。The halftone dot density of the text is estimated according to the gray histogram, and the halftone document image is divided into dark text and light text according to the color depth of the text. Through a large number of experimental verifications, the inventor found that in the range of 0-255, the gray scale judgment value is preferably 20, which can satisfy the classification of most halftone document images. The grayscale histogram of the halftone document image after Gaussian smoothing filtering is shown in Figure 2. It can be seen that there are two significant peaks: the peak with a smaller grayscale value corresponds to the text part in the image (ie, the text peak), And the peak with larger gray value corresponds to the background part of the image (background peak). The approximate color depth of the text in the half-tone image can be obtained by calculating the position of the peak value of the text in the half-tone document image after the Gaussian smoothing filter on the gray level.

综上所述，对于一个具体的半色调文档图像，背景颜色的深浅根据Otsu阈值来判断，若Otsu阈值大于105，则认为背景为浅色背景，否则为深色背景；文字颜色的深度根据灰度直方图中文字峰值的位置来判断，若该峰值所对应的灰度值小于20，则认为文字为深色，否则认为文字为浅色。综合两个分类方法，将半色调文档图像分为四类：浅色背景和深色文字、浅色背景和浅色文字、深色背景和深色文字、深色背景和浅色文字。To sum up, for a specific halftone document image, the depth of the background color is judged according to the Otsu threshold. If the Otsu threshold is greater than 105, the background is considered to be a light background, otherwise it is a dark background; the depth of the text color is determined according to the gray If the gray value corresponding to the peak value is less than 20, the text is considered to be dark, otherwise the text is considered to be light. Combining the two classification methods, halftone document images are divided into four categories: light background and dark text, light background and light text, dark background and dark text, dark background and light text.

(2)、去除半色调文档图像的背景噪声：对于浅色背景和深色文字的半色调文档图像，采用基于连通区域标记的方法去除背景噪声；对于浅色背景和浅色文字的半色调文档图像，采用基于高斯平滑滤波的方法去除背景噪声；对于深色背景和深色文字的半色调文档图像、深色背景和浅色文字的半色调文档图像，采用基于维纳滤波的方法去除背景噪声。(2), remove the background noise of the halftone document image: for the halftone document image with the light background and dark text, use the method based on the connected region mark to remove the background noise; for the halftone document with the light background and light text For images, use a method based on Gaussian smoothing filtering to remove background noise; for halftone document images with dark background and dark text, and halftone document images with dark background and light text, use a method based on Wiener filtering to remove background noise .

参照图3，对于浅色背景和深色文字的半色调文档图像，采用基于连通区域标记的方法去除背景噪声。首先对半色调文档图进行反色处理，并标记连通区域，然后依次提取并计所有连通区域的面积。若连通区域面积大于规定的阈值(设为3个像素)，则保留该连通区域，否则去除该连通区域，最后再对图像进行反色，即去除了浅色背景和深色文字的半色调文档图像背景噪声。Referring to Figure 3, for a halftone document image with a light background and dark text, a method based on connected region marking is used to remove background noise. Firstly, the halftone document map is reversed, and the connected regions are marked, and then the areas of all connected regions are sequentially extracted and counted. If the area of the connected region is greater than the specified threshold (set to 3 pixels), keep the connected region, otherwise remove the connected region, and finally reverse the image, that is, remove the halftone document with light background and dark text Image background noise.

参照图4，对于浅色背景和浅色文字的半色调文档图像，采用基于高斯平滑滤波的方法去除背景噪声。首先对半色调文档图像进行5×5(像素)模板的高斯平滑滤波，然后计算Otsu阈值，采用Otsu方法进行图像二值化，即去除了浅色背景和浅色文字的半色调文档图像的背景噪声。Referring to FIG. 4 , for a halftone document image with a light background and light text, a method based on Gaussian smoothing filtering is used to remove background noise. Firstly, the Gaussian smoothing filter of 5×5 (pixel) template is performed on the halftone document image, and then the Otsu threshold is calculated, and the Otsu method is used for image binarization, that is, the background of the halftone document image with light background and light text is removed noise.

参照图5，对于深色背景和深色文字的半色调文档图像、深色背景和浅色文字的半色调文档图像，采用基于维纳滤波的方法去除背景噪声。首先对半色调文档图像进行3×3(像素)模板的高斯平滑滤波，然后将该高斯平滑后的图像作为参考图，对原来的半色调文档图像进行维纳滤波，接着对维纳滤波后的图像再次进行3×3(像素)模板的高斯平滑滤波，最后计算Otsu阈值，通过Otsu方法进行图像二值化，即去除了深色背景和深色文字的半色调文档图像、深色背景和浅色文字的半色调文档图像的背景噪声。Referring to FIG. 5 , for a halftone document image with a dark background and dark text, and a halftone document image with a dark background and light text, a method based on Wiener filtering is used to remove background noise. First, carry out Gaussian smoothing filtering of a 3×3 (pixel) template on the halftone document image, and then use the Gaussian smoothed image as a reference image to perform Wiener filtering on the original halftone document image, and then perform Wiener filtering on the Wiener filtered image. The image is subjected to Gaussian smoothing filtering of the 3×3 (pixel) template again, and finally the Otsu threshold is calculated, and the image is binarized by the Otsu method, that is, the halftone document image, dark background and light text are removed from the dark background and dark text. Background noise for halftone document images with colored text.

经过大量实验发现：对于深色背景和深色文字，深色背景和浅色文字的半色调图像，用基于维纳滤波的方法效果较好，对于浅色背景和深色文字的半色调图像，用基于连通区域标记的方法比较合适，而对于浅色背景和浅色文字的半色调图像，用基于高斯平滑滤波的方法比较合适。After a lot of experiments, it is found that for halftone images with dark background and dark text, and halftone images with dark background and light text, the method based on Wiener filtering is better. For halftone images with light background and dark text, It is more appropriate to use the method based on connected region marking, and for halftone images with light background and light text, it is more appropriate to use the method based on Gaussian smoothing filter.

参照图6、7，根据半色调文档图像背景以及文字的颜色深度，自适应的选择不同的背景去噪方法，可以看到，半色调背景噪声基本上被除掉。Referring to Figures 6 and 7, according to the color depth of the halftone document image background and text, different background denoising methods are adaptively selected. It can be seen that the halftone background noise is basically removed.

(3)、观察去除背景噪声的半色调文档图像，构造文字的毛刺去除模板和缺陷修复模板，通过模板匹配去除文字毛刺和修复文字缺陷。(3) Observing the halftone document image with background noise removed, constructing a deburring template and defect repairing template for text, removing text burrs and repairing text defects through template matching.

a、经过仔细观察和大量的实验验证，采用以下优选的文字的毛刺去除模板进行模板匹配去除文字毛刺，具有良好的去噪效果，其优选的文字的毛刺去除模板用表格表示，包括：a. After careful observation and a large number of experimental verifications, the following preferred text deburring templates are used for template matching to remove text burrs, which has a good denoising effect. The preferred text deburring templates are expressed in tables, including:

其中，单元格代表像素，单元格之间的位置关系代表像素间的位置关系；单元格中，1表示文字像素，0表示背景像素，①表示需要去除的文字像素，即需要把该像素由1变为0；空单元格表示像素为0或1，即：既可为0也可为1，换言之，可以是文字像素，也可以为背景像素。Among them, a cell represents a pixel, and the positional relationship between cells represents the positional relationship between pixels; in a cell, 1 represents a text pixel, 0 represents a background pixel, and ① represents a text pixel that needs to be removed, that is, the pixel needs to be changed from 1 to becomes 0; an empty cell indicates that the pixel is 0 or 1, that is, it can be either 0 or 1, in other words, it can be a text pixel or a background pixel.

b、经过仔细观察和大量的实验验证，采用以下优选的文字的缺陷修复模板进行模板匹配修复文字缺陷，具有良好的去噪效果，其优选的文字的缺陷修复模板用表格表示，包括以下表格：b. After careful observation and a large number of experimental verifications, the following preferred text defect repair templates are used for template matching to repair text defects, which has a good denoising effect. The preferred text defect repair templates are expressed in tables, including the following tables:

表示需要修复的文字像素，即需要把该像素由0变为1；空单元格表示像素为0或1，即：既可为0也可为1，换言之，可以是文字像素，也可以为背景像素。Among them, the cell represents the pixel, and the positional relationship between the cells represents the positional relationship between the pixels; in the cell, 1 represents the text pixel, 0 represents the background pixel,

Indicates the text pixel that needs to be repaired, that is, the pixel needs to be changed from 0 to 1; an empty cell indicates that the pixel is 0 or 1, that is, it can be either 0 or 1, in other words, it can be a text pixel or a background pixels.

参照图8、图9，对于去除背景噪声的半色调文档图像，文字周围存在很多毛刺，而且该图中的文字本身的笔画也比较细(最细的笔画宽度为2个像素)，用毛刺去除模板对其进行处理，得到图6中的去除掉了毛刺的文字，而同时，文字的细节也尽可能的保留了下来。Referring to Figure 8 and Figure 9, for the halftone document image with background noise removed, there are many burrs around the text, and the strokes of the text itself in the figure are relatively thin (the thinnest stroke width is 2 pixels), and the burrs are used to remove them. The template processes it to obtain the text in Figure 6 with the burrs removed, while at the same time, the details of the text are preserved as much as possible.

参照图10(a)、10(b)，对于去除背景噪声的半色调文档图像，文字周围不仅仅存在很多毛刺，还存在很多残缺点，用毛刺去除模板以及缺陷修复模板对其进行处理，得到去除毛刺并补全残缺区域的文字，文字笔画变得光顺。Referring to Figures 10(a) and 10(b), for the halftone document image with background noise removed, there are not only many burrs around the text, but also many defects, which are processed with the burr removal template and the defect repair template to obtain Remove burrs and complete the text in the incomplete area, and the strokes of the text become smooth.

Claims

1. A noise processing method of halftone document image, is characterized in that, comprises the following steps:

First, divide halftone document images into four categories: light background and dark text, light background and light text, dark background and dark text, and dark background and light text;

Secondly, remove the background noise of the halftone document image: for the halftone document image with light background and dark text, the method based on connected region labeling is used to remove the background noise; for the halftone document image with light background and light text, Use a method based on Gaussian smoothing filtering to remove background noise; for halftone document images with dark background and dark text, and halftone document images with dark background and light text, use a method based on Wiener filtering to remove background noise;

Finally, observe the halftone document image with background noise removed, construct a text burr removal template and a defect repair template, and remove text burrs and repair text defects through template matching;

The halftone document image is divided into four categories, specifically: performing Gaussian smoothing filtering on the halftone document image, calculating the Otsu threshold and grayscale histogram of the halftone document image after Gaussian smoothing filtering, and estimating the halftone of the background according to the Otsu threshold Tone dot density, estimate the halftone dot density of the text according to the grayscale histogram, and divide the halftone document image into four categories according to the halftone dot density of the background and the halftone dot density of the text, namely light background and dark color Text, light background and light text, dark background and dark text, dark background and light text.

2. the noise processing method of a kind of halftone document image according to claim 1, is characterized in that, the deburring template of described text is represented by table, comprises:

Among them, a cell represents a pixel, and the positional relationship between cells represents the positional relationship between pixels; in a cell, 1 represents a text pixel, 0 represents a background pixel, and ① represents a text pixel that needs to be removed, that is, the text pixel 1 needs to be changed to is background pixel 0; empty cells are represented as background pixel 0 or text pixel 1.

3. The noise processing method of a kind of halftone document image according to claim 1, characterized in that, the defect repair template is represented by a table, comprising the following table:

Among them, the cell represents the pixel, and the positional relationship between the cells represents the positional relationship between the pixels; in the cell, 1 represents the text pixel, 0 represents the background pixel,