CN107133929B

CN107133929B - Low-quality document image binarization method based on background estimation and energy minimization

Info

Publication number: CN107133929B
Application number: CN201710289747.7A
Authority: CN
Inventors: 熊炜; 徐晶晶; 李敏; 熊子婕; 王改华; 刘敏; 赵楠; 王鑫睿; 冯川
Original assignee: Hubei University of Technology
Current assignee: Hubei University of Technology
Priority date: 2017-04-27
Filing date: 2017-04-27
Publication date: 2019-06-11
Anticipated expiration: 2037-04-27
Also published as: CN107133929A

Abstract

The invention discloses a low-quality document image binarization method based on background estimation and energy minimization. First, grayscale preprocessing is performed on a color document image, bilateral filtering is used to perform noise reduction processing on the image, image background estimation, and background reduction are performed. Divide and image enhancement, construct energy function, construct network graph, and finally use the graph cut algorithm based on augmented path to minimize the energy function. The invention significantly improves the binarization effect of document images under complex backgrounds, and can be applied to document image binarization with complex backgrounds such as writing in multiple colors, stroke gradients, ink infiltration, pages with stains or textures, uneven lighting, and low contrast. processing.

Description

Low-quality document image binarization method based on background estimation and energy minimization

技术领域technical field

本发明属于数字图像处理、模式识别与机器学习技术领域，特别是涉及一种基于背景估计和能量最小化的低质量文档图像二值化方法。The invention belongs to the technical fields of digital image processing, pattern recognition and machine learning, in particular to a low-quality document image binarization method based on background estimation and energy minimization.

背景技术Background technique

文档分析与识别(DAR)技术已广泛应用于古籍数字化、版面分析与文字识别、视频字幕提取、文本信息检索等领域，主要包括图像的采集、二值化、歪斜校正、字符分割与识别等过程。图像二值化是其中一个关键预处理环节，它是将灰度图像转换成二进制图像，从而实现字符前景与文档背景的分离。二值化算法的效果直接影响整个DAR系统的性能，因此近年来很多学者对此进行了研究，并提出了很多算法；然而，受图像对比度差、墨迹浸润、页面污渍或光照不均等因素的影响，使得低质量文档图像二值化仍是一个挑战。Document Analysis and Recognition (DAR) technology has been widely used in ancient book digitization, layout analysis and text recognition, video subtitle extraction, text information retrieval and other fields, mainly including image acquisition, binarization, skew correction, character segmentation and recognition and other processes . Image binarization is one of the key preprocessing steps, which converts grayscale images into binary images, so as to separate the foreground of characters from the background of the document. The effect of the binarization algorithm directly affects the performance of the entire DAR system, so many scholars have studied it in recent years and proposed many algorithms; however, it is affected by factors such as poor image contrast, ink infiltration, page stains or uneven lighting , binarizing low-quality document images remains a challenge.

二值化算法可粗略分为全局阈值法和局部阈值法。全局阈值法采用单一的阈值将文档图像分为字符(前景)与背景两大类，如Otsu算法利用图像的灰度直方图选择一个最优阈值，使得经阈值分割后的前景与背景像素的类间方差最大。全局阈值法对于前景和背景差别较大，即直方图具有显著双峰特征的图像具有较好的分割效果，但在处理低质量文档图像时，会丢失部分甚至全部前景细节。Binarization algorithm can be roughly divided into global threshold method and local threshold method. The global threshold method uses a single threshold to divide the document image into two categories: character (foreground) and background. For example, the Otsu algorithm uses the grayscale histogram of the image to select an optimal threshold, so that the foreground and background pixels after threshold segmentation are classified into two categories. the largest variance. The global threshold method has a good segmentation effect for images with a large difference between the foreground and the background, that is, the histogram has a significant bimodal feature, but when dealing with low-quality document images, some or even all foreground details will be lost.

局部阈值法(也称为自适应阈值法)则通过滑动窗口与文档图像的卷积，从而实现在图像不同部分设定不同阈值，如Niblack、Sauvola、Wolf等算法利用像素邻域内的灰度均值和方差来构建阈值分割曲面，其算法性能有赖于滑动窗口的尺寸及字符笔画的粗细等。针对不同质量的文档图像需动态调整窗口尺寸，以获得最佳的阈值处理结果；当图像对比度较低时，会产生大量噪声点或造成误判。The local threshold method (also known as the adaptive threshold method) uses the convolution of the sliding window and the document image to set different thresholds in different parts of the image, such as Niblack, Sauvola, Wolf and other algorithms use the gray mean value in the pixel neighborhood and variance to construct the threshold segmentation surface, the performance of the algorithm depends on the size of the sliding window and the thickness of the character strokes. For document images of different quality, the window size needs to be dynamically adjusted to obtain the best threshold processing results; when the image contrast is low, a large number of noise points will be generated or misjudgment will be caused.

此外，国内外研究人员还提出了很多更为复杂的算法，如局部对比度法、背景估计与笔画边缘检测法、拉普拉斯能量法、卷积神经网络法等。然而，以上这些方法都不能很好地解决在低对比度、墨迹浸润、渐变光照、带污迹和纹理等复杂文档背景下的图像二值化。In addition, domestic and foreign researchers have also proposed many more complex algorithms, such as local contrast method, background estimation and stroke edge detection method, Laplace energy method, convolutional neural network method, etc. However, none of the above methods can well solve the image binarization in complex document backgrounds such as low contrast, inking, gradient lighting, smudges and textures.

发明内容SUMMARY OF THE INVENTION

为了解决上述技术问题，本发明提出了一种基于背景估计和能量最小化的低质量文档图像二值化方法，显著提高了复杂背景下的文档图像二值化效果，能够适用于多种颜色书写、笔画渐变、墨迹浸润、页面有污渍或纹理、光照不均、对比度低等复杂背景的文档图像二值化处理。In order to solve the above technical problems, the present invention proposes a low-quality document image binarization method based on background estimation and energy minimization, which significantly improves the document image binarization effect under complex backgrounds and is suitable for writing in multiple colors. , stroke gradient, ink soaking, pages with stains or textures, uneven lighting, low contrast and other complex background document image binarization processing.

本发明所采用的技术方案是：一种基于背景估计和能量最小化的低质量文档图像二值化方法，其特征在于，包括以下步骤：The technical solution adopted in the present invention is: a low-quality document image binarization method based on background estimation and energy minimization, which is characterized in that it includes the following steps:

步骤1：对彩色文档图像进行灰度预处理；Step 1: Perform grayscale preprocessing on the color document image;

步骤2：采用双边滤波对图像进行降噪处理；Step 2: Use bilateral filtering to denoise the image;

步骤3：图像背景估计，具体包括以下子步骤：Step 3: Image background estimation, which includes the following sub-steps:

步骤3.1：针对步骤2处理后的图像，进行笔画宽度变换；Step 3.1: for the image processed in step 2, perform stroke width transformation;

步骤3.2：计算模拟距离和成像高度；Step 3.2: Calculate the simulated distance and imaging height;

步骤3.3：针对步骤2处理后的图像，通过两次形态学闭操作削弱文档图像中的暗特征；Step 3.3: For the image processed in step 2, the dark features in the document image are weakened by two morphological closing operations;

步骤3.4：结合步骤3.2和步骤3.3的结果，进行图像降采样和升采样；Step 3.4: Combine the results of Step 3.2 and Step 3.3 to perform image downsampling and upsampling;

步骤4：背景减除与图像增强，具体包括以下子步骤：Step 4: Background subtraction and image enhancement, including the following sub-steps:

步骤4.1：背景减除；Step 4.1: Background subtraction;

计算步骤2中的双边滤波图像与步骤3中的背景估计图像间的绝对差值，差值图像中灰度为零的像素点属于高置信背景像素点，并将其灰度值设为255；Calculate the absolute difference between the bilateral filtered image in step 2 and the background estimated image in step 3, the pixels with zero grayscale in the difference image belong to high-confidence background pixels, and set their grayscale value to 255;

步骤4.2：直方图均衡；Step 4.2: Histogram equalization;

对背景减除图像中非零像素点进行取反，得到该点对应的灰度值，然后对整幅图像进行直方图均衡化，增大图像前景和背景的对比度；Invert the non-zero pixel points in the background subtraction image to obtain the corresponding gray value of the point, and then perform histogram equalization on the entire image to increase the contrast between the foreground and background of the image;

步骤5：构造能量函数；Step 5: Construct the energy function;

步骤6：构造网络图；Step 6: Construct the network diagram;

步骤7：采用基于增广路径的图割算法实现能量函数的最小化。Step 7: Use the augmented path-based graph cut algorithm to minimize the energy function.

本发明与现有算法相比，其显著优点在于：Compared with the existing algorithm, the present invention has the following significant advantages:

(1)本发明采用最小均值法对彩色文档图像进行灰度预处理，所得灰度图像具有彩色无关性，既能增大前景与背景像素间的对比度，又能减小前景像素间的灰度方差；(1) The present invention uses the minimum mean value method to perform grayscale preprocessing on the color document image, and the obtained grayscale image has color independence, which can not only increase the contrast between foreground and background pixels, but also reduce the grayscale between foreground pixels. variance;

(2)本发明采用非线性双边滤波算法实现图像降噪处理，由于同时考虑了图像的空间邻近度和灰度相似性，从而达到了保边去噪的目的；(2) The present invention adopts the nonlinear bilateral filtering algorithm to realize image noise reduction processing, because the spatial proximity and grayscale similarity of the image are considered at the same time, so as to achieve the purpose of edge preservation and denoising;

(3)本发明采用笔画宽度变换的方法来估计文档图像中的笔画宽度，其优势在于，笔画特征基本上是属于文字独有的特征(当然也不排除某些退化因素的干扰，需要后续操作加以剔除)，对于不同语言的文本具有普适性；(3) The present invention adopts the method of stroke width transformation to estimate the stroke width in the document image, and its advantage is that the stroke feature is basically a unique feature of the text (of course, the interference of some degradation factors is not excluded, and subsequent operations are required. be eliminated), which is universal to texts in different languages;

(4)本发明基于视觉灵敏度测试模型，采用形态学闭操作实现图像背景估计，并对背景减除图像进行直方图均衡化，有效抑制了退化因素的影响，同时增强了图像的局部对比度；(4) Based on the visual sensitivity test model, the present invention adopts morphological closing operation to realize image background estimation, and performs histogram equalization on the background subtracted image, effectively suppressing the influence of degradation factors, and simultaneously enhancing the local contrast of the image;

(5)本发明基于最大流/最小割的组合优化算法实现文档图像二值化，该图割算法通用性强，可行性高，运行速度快(接近实时性能)，并且适用于多种退化类型的低质量文档图像。(5) The present invention realizes document image binarization based on the combined optimization algorithm of maximum flow/minimum cut. The graph cut algorithm has strong versatility, high feasibility, fast running speed (close to real-time performance), and is suitable for various degradation types. low-quality document images.

附图说明Description of drawings

图1：为本发明实施例的流程图；Fig. 1: is the flow chart of the embodiment of the present invention;

图2：为本发明实施例的视力测试模型的角度分辨率示意图。FIG. 2 is a schematic diagram of the angular resolution of a vision test model according to an embodiment of the present invention.

具体实施方式Detailed ways

为了便于本领域普通技术人员理解和实施本发明，下面结合附图及实施例对本发明作进一步的详细描述，应当理解，此处所描述的实施示例仅用于说明和解释本发明，并不用于限定本发明。In order to facilitate the understanding and implementation of the present invention by those of ordinary skill in the art, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the embodiments described herein are only used to illustrate and explain the present invention, but not to limit it. this invention.

本发明主要思想是：当目标图像离观察者距离较远时，能观测到的目标图像的细节(笔画)信息越来越少，但感知到的背景灰度和深度不受距离的影响，因此可以通过模拟远距离观测图像的场景，估计出图像的大致背景，再对剔除估计背景后的图像构造能量函数，采用图割算法实现图像二值化。The main idea of the present invention is: when the target image is far away from the observer, the detail (stroke) information of the target image that can be observed becomes less and less, but the perceived background grayscale and depth are not affected by the distance, so The approximate background of the image can be estimated by simulating the scene of long-distance observation of the image, and then the energy function can be constructed for the image after excluding the estimated background, and the image binarization can be realized by using the graph cut algorithm.

请见图1，本发明提供的一种基于背景估计和能量最小化的低质量文档图像二值化方法，包括以下步骤：Referring to Fig. 1, a method for binarizing low-quality document images based on background estimation and energy minimization provided by the present invention includes the following steps:

步骤1：最小均值灰度化；Step 1: minimum mean grayscale;

本发明采用最小均值法对彩色文档图像f(x,y)进行灰度预处理，具体计算公式为：The invention adopts the minimum mean value method to perform grayscale preprocessing on the color document image f(x, y), and the specific calculation formula is:

其中，f_i(x,y)分别为R、G、B彩色分量图像，f_gray(x,y)为变换后的灰度图像。Among them, f _i (x, y) are R, G, and B color component images, respectively, and f _gray (x, y) is the transformed grayscale image.

所得灰度图像具有彩色无关性，即灰度图像中，前景与背景像素间具有较大的对比度，同时前景像素间的灰度差异性较小。The obtained grayscale image has color independence, that is, in the grayscale image, the contrast between foreground and background pixels is relatively large, and the grayscale difference between foreground pixels is small.

步骤2：双边滤波去噪；Step 2: Denoising by bilateral filtering;

本发明采用非线性双边滤波算法进行图像降噪处理，其输出像素值依赖于邻域S内像素值f(k,l)的加权组合，具体计算公式为：The present invention adopts nonlinear bilateral filtering algorithm to process image noise reduction, and its output pixel value Depending on the weighted combination of pixel values f(k,l) in the neighborhood S, the specific calculation formula is:

其中，权重系数w(i,j,k,l)取决于定义域核和值域核的乘积，即和分别表示高斯距离方差和高斯灰度方差。Among them, the weight coefficient w(i,j,k,l) depends on the domain kernel and range kernel the product of , that is and represent the Gaussian distance variance and the Gaussian grayscale variance, respectively.

由于双边滤波器同时考虑了图像的空间邻近度和灰度相似性，可以达到保边去噪的目的。Since the bilateral filter considers the spatial proximity and grayscale similarity of the image at the same time, the purpose of edge-preserving denoising can be achieved.

步骤3：图像背景估计；Step 3: Image background estimation;

步骤3.1笔画宽度变换(SWT)：采用Canny算子对双边滤波后的灰度图像进行边缘检测，并对每一个边缘像素点p按其梯度方向查找与之对应的另一个边缘像素点q，两点间的欧式距离||p-q||即为[p,q]路径上所有像素点的笔画宽度估计，除非该像素点已经被指定了一个更小的宽度值，则图像的笔画宽度SWE为所有非零像素点笔画宽度估计的数学期望，具体计算公式为：Step 3.1 Stroke Width Transformation (SWT): Use Canny operator to perform edge detection on the grayscale image after bilateral filtering, and search for another edge pixel q corresponding to each edge pixel p according to its gradient direction. The Euclidean distance between points ||p-q|| is the stroke width estimate of all pixels on the [p,q] path, unless the pixel has been assigned a smaller width value, then the stroke width SWE of the image is all Mathematical expectation of stroke width estimation for non-zero pixels, the specific calculation formula is:

其中，n为笔画宽度变换输出图像s(x,y)中非零值像素点总数。Among them, n is the total number of non-zero value pixels in the output image s(x, y) of the stroke width transformation.

步骤3.2计算模拟距离和成像高度：基于视觉灵敏度测试模型，人眼的最小分辨角(1′的角度)所能感知的即为最小图像，如图2所示。由于低质量文档图像的对比度通常都低于视力表上的二值图像，对应目标的最小视角也通常大于视力测试的最小视角，并且图像的笔画越粗，不能感知到笔画细节所需的观测距离就会越远，因此，本发明将文档图像的笔画宽度对应的分辨角假定为3′，并根据步骤3.1估计得到的笔画宽度确定模拟观测距离d₀，具体计算公式为：Step 3.2 Calculate the simulated distance and imaging height: Based on the visual sensitivity test model, the smallest image that can be perceived by the human eye at the smallest resolution angle (angle of 1'), as shown in Figure 2. Since the contrast of low-quality document images is usually lower than that of the binary image on the eye chart, the minimum viewing angle of the corresponding target is usually greater than the minimum viewing angle of the vision test, and the thicker the strokes of the image, the less the observation distance required to perceive the details of the strokes. Therefore, in the present invention, the resolution angle corresponding to the stroke width of the document image is assumed to be 3′, and the simulated observation distance d ₀ is determined according to the stroke width estimated in step 3.1. The specific calculation formula is:

d₀＝SWE×cotθ，d ₀ =SWE×cotθ,

其中，θ为观测分辨角，此处为3′视角。Among them, θ is the observation resolution angle, here is the 3' viewing angle.

由于人眼的晶状体类似于凸透镜，根据透镜成像规律和焦距方程，可得到在距离目标图像为d₀时视网膜上的成像高度h_i，具体计算公式为：Since the lens of the human eye is similar to a convex lens, according to the lens imaging law and the focal length equation, the imaging height h _i on the retina when the distance from the target image is d ₀ can be obtained. The specific calculation formula is:

其中，f为人眼晶状体与视网膜间距，即透镜焦距(约17mm)，h₀为目标图像原始高度。Among them, f is the distance between the human eye lens and the retina, that is, the focal length of the lens (about 17mm), and h ₀ is the original height of the target image.

步骤3.3形态学闭操作：通过两次形态学闭操作削弱文档图像中的暗特征(字符笔画)，两次闭操作均采用圆形结构元素。本发明将第一次结构元素的直径设置为图像的笔画宽度，第二次结构元素的直径则比图像的笔画宽度大12个像素。Step 3.3 Morphological closing operation: The dark features (character strokes) in the document image are weakened by two morphological closing operations, both of which use circular structural elements. In the present invention, the diameter of the first structural element is set as the stroke width of the image, and the diameter of the second structural element is 12 pixels larger than the stroke width of the image.

步骤3.4图像降采样和升采样：距离目标图像为d₀时观测到的图像高度为h_i，因此，将形态学闭操作后的图像通过双线性降采样缩放到h_i高度；然后采用双线性内插法将缩放后的图像恢复到原始尺寸大小，得到的图像即为估计的背景图像。在进行图像缩放时，保持图像宽高比不变。Step 3.4 Image downsampling and upsampling: the observed image height is h _i when the distance from the target image is d ₀ , therefore, the image after the morphological closing operation is scaled to the h _i height by bilinear downsampling; Linear interpolation restores the scaled image to its original size, and the resulting image is the estimated background image. When doing image scaling, keep the image aspect ratio unchanged.

步骤4：背景减除与图像增强；Step 4: Background subtraction and image enhancement;

步骤4.1背景减除：计算双边滤波图像与背景估计图像间的绝对差值，差值图像中灰度为零的像素点属于高置信背景像素点，并将其灰度值设为255(白色)。Step 4.1 Background subtraction: Calculate the absolute difference between the bilateral filtered image and the background estimated image. The pixels with zero grayscale in the difference image belong to the high-confidence background pixels, and set their grayscale value to 255 (white) .

步骤4.2直方图均衡：对背景减除图像中非零像素点进行取反，得到该点对应的灰度值，然后对整幅图像进行直方图均衡化，增大图像前景和背景的对比度。Step 4.2 Histogram equalization: Invert the non-zero pixel points in the background subtraction image to obtain the corresponding gray value of the point, and then perform histogram equalization on the entire image to increase the contrast between the foreground and background of the image.

步骤5：构造能量函数；Step 5: Construct the energy function;

拉普拉斯能量函数的具体形式为：The specific form of the Laplace energy function is:

其中，数据项表示给像素点赋予某个标签的代价，如是指给像素p_ij赋予标签0(1)的代价；边界项表示相邻像素不连续的代价，即将两相邻像素赋予不同标签时的代价。Among them, the data item represents the cost of assigning a certain label to the pixel, such as It refers to the cost of assigning the label 0(1) to the pixel p _ij ; the boundary term represents the cost of discontinuous adjacent pixels, that is, the cost of assigning different labels to two adjacent pixels.

图像的拉普拉斯变换可以反映图像灰度突变的地方，当图像中某像素点的拉普拉斯值符号为正时，对应的像素点一般位于灰度图的波谷处(暗)；反之，当图像某像素点的拉普拉斯值符号为负时，对应的像素点就位于灰度图的波峰处(亮)。因此，本发明定义拉普拉斯能量函数的数据项具体表示为：The Laplace transform of the image can reflect the sudden change of the gray level of the image. When the sign of the Laplace value of a pixel in the image is positive, the corresponding pixel is generally located at the valley (dark) of the grayscale image; otherwise , when the sign of the Laplacian value of a pixel in the image is negative, the corresponding pixel is located at the peak (bright) of the grayscale image. Therefore, the data item defining the Laplace energy function in the present invention is specifically expressed as:

其中，表示像素p_ij处的拉普拉斯值；in, represents the Laplacian value at pixel p _ij ;

边界项可分为水平方向的边界项和竖直方向的边界项本发明采用Canny边缘检测算子来确定边界项，位于边缘附近的像素不连续的可能性较大，可以直接将位于边缘两侧的像素间的不连续代价置为零，具体表示为：Boundary items can be divided into horizontal boundary items and vertical boundary terms The present invention uses the Canny edge detection operator to determine the boundary term, the pixels located near the edge are more likely to be discontinuous, and the discontinuity cost between the pixels located on both sides of the edge can be directly set to zero, specifically expressed as:

其中，E_ij表示像素点p_ij处的边缘检测结果，I_ij表示像素p_ij处的灰度值，c为任意常数(>0)。Among them, E _ij represents the edge detection result at the pixel p _ij , I _ij represents the gray value at the pixel p _ij , and c is an arbitrary constant (>0).

步骤6：构造网络图；Step 6: Construct the network diagram;

图像的每个像素点p_ij构成了网络图的中间节点，另外附加两个终端节点s和t。连接中间节点的边称为nlink，其权值由能量函数的边界项确定；连接中间节点与终端节点的边称为tlink，其权值由能量函数的数据项确定。边(p_ij,s)的权值为边(p_ij,t)的权值为边(p_ij,p_i+1,j)的权值为边(p_ij,p_i,j+1)的权值为 Each pixel p _ij of the image constitutes the intermediate node of the network graph, and two additional terminal nodes s and t are attached. The edge connecting the intermediate nodes is called nlink, and its weight is determined by the boundary term of the energy function; the edge connecting the intermediate node and the terminal node is called tlink, and its weight is determined by the data item of the energy function. The weight of the edge (p _ij ,s) is The weight of the edge (p _ij ,t) is The weight of the edge (p _ij ,p _i+1,j ) is The weight of the edge (pi _ij ,pi _,j+1 ) is

步骤7：采用基于增广路径的图割算法实现能量函数的最小化；Step 7: Use the augmented path-based graph cut algorithm to minimize the energy function;

基于网络图建立两颗搜索树S和T，树的根节点分别位于源点s和汇点t，将搜索树的节点分为两类：主动节点和被动节点，主动节点可以由非饱和边将自由节点扩展为主动节点，实现树的生长。Two search trees S and T are established based on the network graph. The root nodes of the trees are located at the source point s and the sink point t, respectively. The nodes of the search tree are divided into two categories: active nodes and passive nodes. Active nodes can be connected by unsaturated edges. Free nodes are expanded to active nodes to realize tree growth.

步骤7.1生长阶段：两棵树不断生长，直到两棵树的主动节点相遇便找到了一条从源点到汇点的路径；Step 7.1 Growth stage: The two trees continue to grow until the active nodes of the two trees meet and find a path from the source point to the sink point;

步骤7.2增广阶段：对步骤7.1获得的路径进行增广，增广会形成至少一条饱和边，连接该边的子节点就变成了孤立节点，树S和T则被拆分为多颗子树；Step 7.2 Augmentation phase: the path obtained in step 7.1 is augmented, the augmentation will form at least one saturated edge, the child nodes connecting the edge become isolated nodes, and the trees S and T are split into multiple children. Tree;

步骤7.3收养阶段：为每一个孤立节点寻找父节点，如果没有满足条件的父节点，将其变为自由节点，直至所有的孤立节点都被处理。Step 7.3 Adoption phase: Find the parent node for each isolated node. If there is no parent node that meets the conditions, turn it into a free node until all isolated nodes are processed.

重复执行上面三个步骤，直至两棵树不再生长，被饱和边分开，便求出了图的最小割即能量函数的最小值，从而实现了图像的最终二值化。The above three steps are repeated until the two trees no longer grow and are separated by saturated edges, and the minimum cut of the graph, that is, the minimum value of the energy function, is obtained, thereby realizing the final binarization of the image.

应当理解的是，本说明书未详细阐述的部分均属于现有技术。It should be understood that the parts not described in detail in this specification belong to the prior art.

应当理解的是，上述针对较佳实施例的描述较为详细，并不能因此而认为是对本发明专利保护范围的限制，本领域的普通技术人员在本发明的启示下，在不脱离本发明权利要求所保护的范围情况下，还可以做出替换或变形，均落入本发明的保护范围之内，本发明的请求保护范围应以所附权利要求为准。It should be understood that the above description of the preferred embodiments is relatively detailed, and therefore should not be considered as a limitation on the protection scope of the patent of the present invention. In the case of the protection scope, substitutions or deformations can also be made, which all fall within the protection scope of the present invention, and the claimed protection scope of the present invention shall be subject to the appended claims.

Claims

1. a low-quality document image binarization method based on background estimation and energy minimization, is characterized in that, comprises the following steps:

Step 1: Perform grayscale preprocessing on the color document image;

Step 2: Use bilateral filtering to denoise the image;

Step 3: Image background estimation, which includes the following sub-steps:

Step 3.1: for the image processed in step 2, perform stroke width transformation;

Step 3.2: Calculate the simulated distance and imaging height;

Step 3.3: For the image processed in step 2, the dark features in the document image are weakened by two morphological closing operations;

Step 3.4: Combine the results of Step 3.2 and Step 3.3 to perform image downsampling and upsampling;

Step 4: Background subtraction and image enhancement, including the following sub-steps:

Step 4.1: Background subtraction;

Calculate the absolute difference between the bilateral filtered image in step 2 and the background estimated image in step 3, the pixels with zero grayscale in the difference image belong to high-confidence background pixels, and set their grayscale value to 255;

Step 4.2: Histogram equalization;

Invert the non-zero pixel points in the background subtraction image to obtain the corresponding gray value of the point, and then perform histogram equalization on the entire image to increase the contrast between the foreground and background of the image;

Step 5: Construct the energy function;

The specific form of the Laplace energy function is:

Among them, the data item represents the cost of assigning a certain label to the pixel, refers to the cost of assigning the label 0/1 to the pixel p _ij ; ▽ ² I _ij represents the Laplacian value at the pixel p _ij ; the boundary term represents the cost of the discontinuity of adjacent pixels, that is, when two adjacent pixels are assigned different labels The cost of ; the boundary terms are divided into horizontal boundary terms and vertical boundary terms E _ij represents the edge detection result at the pixel p _ij , I _ij represents the gray value at the pixel p _ij , c is an arbitrary constant, and c>0;

Step 6: Construct the network diagram;

Each pixel p _ij of the image constitutes the intermediate node of the network graph, and two additional terminal nodes s and t are attached; the edge connecting the intermediate node is called nlink, and its weight is determined by the boundary term of the energy function; connecting the intermediate node with The edge of the terminal node is called tlink, and its weight is determined by the data item of the energy function; the weight of the edge (p _ij , s) is The weight of the edge (p _ij ,t) is The weight of the edge (p _ij ,p _i+1,j ) is The weight of the edge (pi _ij ,pi _,j+1 ) is

Step 7: Use the augmented path-based graph cut algorithm to minimize the energy function;

Two search trees S and T are established based on the network graph. The root nodes of the trees are located at the source point s and the sink point t, respectively. The nodes of the search tree are divided into two categories: active nodes and passive nodes. Active nodes can be connected by unsaturated edges. Free nodes are expanded to active nodes to realize tree growth;

Step 7.1: Growth stage;

The two trees continue to grow until the active nodes of the two trees meet and find a path from the source to the sink;

Step 7.2, the augmentation stage;

Augment the path obtained in step 7.1, the augmentation will form at least one saturated edge, the child node connecting the edge becomes an isolated node, and the trees S and T are split into multiple subtrees;

Step 7.3: Adoption Phase;

Find a parent node for each isolated node, if there is no parent node that meets the conditions, turn it into a free node until all isolated nodes are processed;

Step 7.4: Repeat the above three steps until the two trees no longer grow and are separated by saturated edges, then the minimum cut of the graph, that is, the minimum value of the energy function, is obtained, thus realizing the final binarization of the image.

2. The low-quality document image binarization method based on background estimation and energy minimization according to claim 1, characterized in that: in step 1, a minimum mean method is used to perform grayscale on the color document image f(x, y). Preprocessing, where the preprocessing formula is:

Among them, f _i (x, y) are R, G, and B color component images, respectively, and f _gray (x, y) is the transformed grayscale image.

3. the low-quality document image binarization method based on background estimation and energy minimization according to claim 1, is characterized in that: adopt nonlinear bilateral filtering algorithm to carry out image noise reduction processing in step 2, and its output pixel value Depending on the weighted combination of pixel values f(k,l) in the neighborhood S, the specific calculation formula is:

Among them, the weight coefficient w(i,j,k,l) depends on the domain kernel and range kernel the product of , that is and represent the Gaussian distance variance and the Gaussian grayscale variance, respectively.

4. the low-quality document image binarization method based on background estimation and energy minimization according to claim 1, is characterized in that: adopt Canny operator to carry out edge detection to the grayscale image after bilateral filtering in step 3.1, and For each edge pixel p, find another edge pixel q corresponding to it according to its gradient direction, and the Euclidean distance between the two points ||p-q|| is the stroke width estimate of all pixels on the path of [p,q] , unless the pixel has been assigned a smaller width value, the stroke width SWE of the image is the mathematical expectation of the estimated stroke width of all non-zero pixels. The specific calculation formula is:

Among them, n is the total number of non-zero value pixels in the output image s(x, y) of the stroke width transformation.

5. The low-quality document image binarization method based on background estimation and energy minimization according to claim 1, characterized in that: in step 3.2, and according to the stroke width SWE estimated in step 3.1, the simulated observation distance d is _determined . , the specific calculation formula is:

d ₀ =SWE×cotθ,

Among them, θ is the observation resolution angle;

According to the lens imaging law and the focal length equation, the imaging height h _i on the retina when the distance from the target image is d ₀ is obtained. The specific calculation formula is:

Among them, f is the distance between the human eye lens and the retina, that is, the focal length of the lens, and h ₀ is the original height of the target image.

6. The low-quality document image binarization method based on background estimation and energy minimization according to claim 1, characterized in that: in step 3.3, both closing operations adopt circular structural elements; The diameter is set to the stroke width of the image, and the diameter of the second structuring element is 12 pixels larger than the stroke width of the image.

7. The low-quality document image binarization method based on background estimation and energy minimization according to claim 1, characterized in that: in step 3.4, the observed image height is h _i when the distance target image is d ₀ , Therefore, the image after the morphological closing operation is scaled to the height h _i by bilinear downsampling; then the scaled image is restored to the original size by bilinear interpolation, and the obtained image is the estimated background image ; keep the image aspect ratio unchanged when doing image scaling.