CN102496021A

CN102496021A - Wavelet transform-based thresholding method of image

Info

Publication number: CN102496021A
Application number: CN2011103763913A
Authority: CN
Inventors: 王恺; 杨巨峰; 李娇凤; 焦姣
Original assignee: Nankai University
Current assignee: Nankai University
Priority date: 2011-11-23
Filing date: 2011-11-23
Publication date: 2012-06-13

Abstract

The invention provides a wavelet transform-based thresholding method of an image, and belongs to the field of image processing. In the method, a gray scale of a complete natural scenery image is subjected to wavelet decomposition by utilizing the excellent denoising characteristic of wavelet; foreground characters in the image are removed as noise by virtue of low-pass filter so as to acquire approximate background distribution and foreground distribution; a global threshold is calculated according to the foreground distribution, the global threshold and the background distribution are superposed to form a local threshold which is finally used for image thresholding. The thresholding method provided by the invention can be used for quickly and effectively separating the character part as the foreground to eliminate interference of a complex background, and provides advantages to subsequent character cutting and identifying work. According to the thresholding method provided by the invention, the problem of OCR (optical character recognition) of the natural scenery image can be effectively solved.

Description

Image binarization method based on wavelet transform

技术领域 technical field

本发明属于图像处理技术领域，具体涉及一种基于小波变换的图像二值化方法。The invention belongs to the technical field of image processing, and in particular relates to an image binarization method based on wavelet transform.

背景技术 Background technique

对于含有文字的图像来说，二值化的目的通常是将文字部分作为前景分离出来。二值化的效果直接影响到后续的文字分割与识别。相比于文档图像二值化方法，对自然场景图像进行二值化要求方法的适应性更强，并且能够同时处理多种复杂情况。For images containing text, the purpose of binarization is usually to separate the text part as the foreground. The effect of binarization directly affects the subsequent text segmentation and recognition. Compared with the document image binarization method, the binarization of natural scene images requires the method to be more adaptable and able to handle multiple complex situations at the same time.

近年来对图像二值化的研究不断深入，但是专门针对自然场景图像二值化的研究还非常少，仅仅有人对单字符的自然场景图像进行了二值化。但是，对于一幅完整的自然场景图像而言，如何定位到文字区域本身就是一个难题。因此研究如何对完整的图像进行二值化更具有普遍意义。In recent years, the research on image binarization has continued to deepen, but there are still very few studies on binarization of natural scene images. Only some people have binarized single-character natural scene images. However, for a complete natural scene image, how to locate the text area itself is a difficult problem. Therefore, it is more general to study how to binarize the complete image.

小波理论(Wavelet theory)被认为是近年来在数学分析和方法上的重大突破，在不同学科领域的科学家共同努力下，如今已经有了坚实的数学理论基础和广泛的应用背景。在数学界，小波分析被看作是Fourier分析发展史上的里程碑，它是泛函分析、Fourier分析、样条分析、调和分析、数值分析的完美结合。小波分析优于Fourier变换的地方在于，它在时域和频域都具有良好的局部化特性，通过改变取样步长，可以聚焦到对象的任何细节，使人们既可以看见“森林”，又可以看见“树木”，所以被称为“数学显微镜”。小波变换对于信号的高频成分使用逐渐尖锐的时间分辨率以便移近观察信号的快变成分，对于低频成分使用逐渐尖锐的频率分辨率以便移远观察信号的慢变成分(整体变化趋势)。小波这种“既见树木又见森林”的信号分析表示特征对分析非平稳信号是非常有效的。目前，小波变换被广泛应用于信号处理的各个领域：如语音信号处理，数字图像处理，数字视频处理，非线性信号处理等，现已成为科学研究和实际应用中强有力的工具。Wavelet theory is considered to be a major breakthrough in mathematical analysis and methods in recent years. With the joint efforts of scientists in different disciplines, it now has a solid mathematical theoretical foundation and a wide range of application backgrounds. In the field of mathematics, wavelet analysis is regarded as a milestone in the development history of Fourier analysis. It is a perfect combination of functional analysis, Fourier analysis, spline analysis, harmonic analysis and numerical analysis. The advantage of wavelet analysis over Fourier transform is that it has good localization characteristics in both time domain and frequency domain. By changing the sampling step size, it can focus on any details of the object, so that people can see both the "forest" and the Seeing "trees", so it is called "mathematical microscope". The wavelet transform uses gradually sharper time resolution for the high-frequency components of the signal to move closer to the fast-changing components of the observed signal, and uses gradually sharper frequency resolution for the low-frequency components to move away from the slow-changing components of the observed signal (the overall change trend ). The wavelet signal analysis feature of "seeing both the trees and the forest" is very effective for analyzing non-stationary signals. At present, wavelet transform is widely used in various fields of signal processing: such as speech signal processing, digital image processing, digital video processing, nonlinear signal processing, etc., and has become a powerful tool in scientific research and practical application.

小波滤波是小波变换在图像处理领域中的重要应用。小波滤波方法的基本思想是：对原始图像经过多层小波变换，代表原始图像信息的小波系数的绝对值较大，而代表噪声信号的小波系数的绝对值相对较小。通过设置阈值，将绝对值小于或者大于阈值的小波系数过滤，从而达到滤波的效果。本研究正是利用小波的优良去噪特性，将图像中的前景文字作为噪音去除。自然场景图像中的文字识别可以借助于现有的OCR技术，但是与文档不同的是，这种文字嵌入在复杂背景当中，如何更好地消除复杂背景干扰是自然场景图像二值化要解决的一个关键问题。Wavelet filtering is an important application of wavelet transform in the field of image processing. The basic idea of the wavelet filtering method is: the original image undergoes multi-layer wavelet transformation, the absolute value of the wavelet coefficient representing the original image information is relatively large, and the absolute value of the wavelet coefficient representing the noise signal is relatively small. By setting the threshold, the wavelet coefficients whose absolute value is smaller or larger than the threshold are filtered, so as to achieve the effect of filtering. In this study, the foreground text in the image is removed as noise by using the excellent denoising characteristics of wavelet. Text recognition in natural scene images can rely on existing OCR technology, but unlike documents, such text is embedded in complex backgrounds, how to better eliminate complex background interference is the problem of natural scene image binarization a key question.

发明内容 Contents of the invention

本发明的目的是专门针对自然场景图像二值化的研究，进一步探索如何对完整的图像进行二值化，提出一种基于小波变换的图像二值化方法。The purpose of the present invention is specifically aimed at the research on binarization of natural scene images, to further explore how to binarize complete images, and to propose an image binarization method based on wavelet transform.

本发明利用小波变换将图像中的前景文字作为噪音去除，从而得到近似的背景分布和前景分布，再根据前景分布计算全局二值化阈值、并将全局阈值与背景分布叠加后形成局部阈值，最终用于图像二值化。The present invention uses wavelet transform to remove the foreground text in the image as noise, thereby obtaining approximate background distribution and foreground distribution, then calculates the global binarization threshold according to the foreground distribution, and forms a local threshold after superimposing the global threshold and the background distribution, and finally for image binarization.

本发明方法的具体生成过程包括如下步骤：The concrete generation process of the inventive method comprises the following steps:

步骤1、读入一幅自然场景彩色图像，将其转换为灰度图。Step 1. Read in a color image of a natural scene and convert it into a grayscale image.

步骤2、背景分布近似，先对灰度图作L层小波分解，得到第L层近似系数LL和三个方向的细节系数，分别为水平细节系数HL、垂直细节系数LH和对角细节系数HH。根据大量实验，小波的分解层数L取6层效果最好；再通过低通滤波平滑文字部分的图像，并做1层小波重构，得到背景分布的缩略图；最后利用图像插值将背景分布缩略图放大至原始图像大小，即得到近似的背景分布图。Step 2. Background distribution approximation. Firstly, L-level wavelet decomposition is performed on the grayscale image to obtain the L-level approximation coefficient LL and the detail coefficients in three directions, which are the horizontal detail coefficient HL, the vertical detail coefficient LH and the diagonal detail coefficient HH respectively. . According to a large number of experiments, 6 layers of wavelet decomposition layers L is the best; then the image of the text part is smoothed by low-pass filtering, and a layer of wavelet reconstruction is performed to obtain a thumbnail of the background distribution; The thumbnail is enlarged to the original image size, and an approximate background distribution map is obtained.

步骤3、前景分布近似，将背景分布图与步骤1中的原灰度图作差运算得到的差图像即为前景分布图。Step 3, foreground distribution approximation, the difference image obtained by subtracting the background distribution map and the original grayscale image in step 1 is the foreground distribution map.

步骤4、选取一种全局二值化方法计算差图像上的阈值即全局阈值。Step 4. Select a global binarization method to calculate the threshold on the difference image, that is, the global threshold.

步骤5、将全局阈值与步骤2得到的背景分布图叠加，即可得到原灰度图中每一个像素点的二值化局部阈值。Step 5. Superimpose the global threshold value with the background distribution map obtained in step 2 to obtain the binarized local threshold value of each pixel in the original grayscale image.

步骤6、根据步骤5得到的二值化阈值将步骤1中的原灰度图像转换为二值图像。Step 6. Convert the original grayscale image in step 1 into a binary image according to the binarization threshold obtained in step 5.

其中，步骤1所述的将彩色图像转为灰度图GRAY采用如下公式：Wherein, the color image described in step 1 is converted into a grayscale image GRAY using the following formula:

GRAY(x，y)＝0.2989*R(I(x，y))+0.5870*G(I(x，y))+0.1140*B(I(x，y))GRAY(x,y)=0.2989*R(I(x,y))+0.5870*G(I(x,y))+0.1140*B(I(x,y))

其中，R(.)、G(.)和B(.)分别表示取红色、绿色和蓝色分量。Among them, R(.), G(.) and B(.) represent red, green and blue components respectively.

步骤2所述的背景分布近似采用反对称双正交小波对灰度图像做多层小波分解之后，根据由Dohono提出的visu thrink方法设定阈值，只对最高层分解结果中的文字信息进行处理，然后对最高层分解结果进行重构，排除噪声和背景因素的干扰，最后采用双三次插值将重构得到的背景缩略图放大至原始图像大小，即得到近似的背景分布图BG。The background distribution described in step 2 is approximated by using anti-symmetric biorthogonal wavelet to decompose the grayscale image by multi-layer wavelet, and the threshold is set according to the visu thin method proposed by Dohono, and only the text information in the top-level decomposition result is processed , and then reconstruct the top-level decomposition results to eliminate the interference of noise and background factors, and finally use bicubic interpolation to enlarge the reconstructed background thumbnail to the original image size, and obtain the approximate background distribution map BG.

步骤3所述的前景分布近似由于考虑到小波滤噪本质上是对图像做平滑处理，有些像素点(尤其是文字周围的背景)在滤噪时会变深(即像素点的值变大)，在作差运算时有可能出现负值，此时将这种出现负值的情况都直接置为0；因此，采用如下公式计算前景分布FG：The foreground distribution described in step 3 is approximate because wavelet noise filtering is essentially smoothing the image, and some pixels (especially the background around the text) will become darker during noise filtering (that is, the value of the pixel will become larger) , there may be a negative value in the difference calculation, at this time, the negative value is directly set to 0; therefore, the foreground distribution FG is calculated using the following formula:

$FG FG ((x x,, y the y)) = = \{\begin{matrix} BG BG ((x x,, y the y)) - - GRAY GRAY ((x x,, y the y)),, ifGRAY ifGRAY ((x x,, y the y)) < < BG BG ((x x,, y the y)) \\ 00,, otherwise otherwise \end{matrix} . .$

步骤4所述的全局阈值计算，采用一个比较简单的全局阈值计算方法：首先计算前景分布中所有像素点的均值μ和标准差σ；再在[|μ-σ|，|μ+σ|]区间上找具有最小投影数的灰度值作为前景分布的全局阈值GT，其计算方法如下列公式所示，The global threshold calculation described in step 4 adopts a relatively simple global threshold calculation method: first calculate the mean value μ and standard deviation σ of all pixels in the foreground distribution; Find the gray value with the minimum number of projections on the interval as the global threshold GT of the foreground distribution, and its calculation method is shown in the following formula,

$μ μ = = {Σ Σ}_{i i = = 00}^{255255} i i * * H h ((i i)) / / ((M m * * N N))$

$σ σ = = \sqrt{\frac{11}{M m * * N N - - 11} {Σ Σ}_{i i = = 00}^{255255} i i * * {((H h ((i i)) - - μ μ))}^{22}}$

$GT GT = = arg arg {min min}_{i i = = μ μ - - σ σ}^{μ μ + + σ σ} ((H h ((i i))))$

其中，M和N分别为前景分布图的行数和列数，H(i)表示前景分布灰度直方图，按下式计算：Among them, M and N are the number of rows and columns of the foreground distribution map respectively, and H(i) represents the gray level histogram of the foreground distribution, which is calculated according to the following formula:

$H h ((i i)) = = {Σ Σ}_{y the y = = 11}^{N N} {Σ Σ}_{x x = = 11}^{M m} A A ((i i,, FG FG ((x x,, y the y)))),, i i &Element; &Element; [[0,255 0,255]],, A A ((m m,, n no)) = = \{\begin{matrix} 11,, ifm ifm = = n no \\ 00,, otherwise otherwise \end{matrix} . .$

步骤5所述的局部阈值计算按照下式将全局阈值与背景分布叠加，得到原灰度图中每一个像素点(x，y)的二值化局部阈值LT(x，y)，The local threshold value calculation described in step 5 superimposes the global threshold value and the background distribution according to the following formula to obtain the binarized local threshold value LT(x, y) of each pixel point (x, y) in the original grayscale image,

LT(x，y)＝GT+BG(x，y)。LT(x,y)=GT+BG(x,y).

步骤6所述的灰度图像的二值化根据步骤5计算得到的局部二值化阈值，针对原灰度图中的每一个像素点(x，y)，若灰度值小于该点的局部二值化阈值LT(x，y)，则该点被二值化为前景点，否则该点被二值化为背景点。The binarization of the grayscale image described in step 6 is based on the local binarization threshold calculated in step 5. For each pixel point (x, y) in the original grayscale image, if the grayscale value is less than the local Binarization threshold LT(x, y), the point is binarized as a foreground point, otherwise the point is binarized as a background point.

本发明的优点和积极效果：Advantage and positive effect of the present invention:

本发明提出的方法，对完整的自然场景图像的灰度图进行小波分解，滤除文字得到近似的前景分布，再根据前景分布计算全局二值化阈值、并将全局阈值与背景分布叠加后形成最终用于图像二值化的局部阈值。为了验证本发明方法的有效性，选取了Otsu，Pavlidis，Liu，Bernsen，Niblack，Sauvola，Gatos，Multi-Scale，block等9种重要的二值化方法进行对比试验。这些二值化方法既包括全局方法，也有局部方法；既有较早的得到同类研究公认的方法，也有研究者最新提出二值化方法。方法选择时覆盖全面。试验在ICDAR2003样本集上进行，结果显示，本发明方法在召回率、准确率、F-measure三项指标上均优于上述9种方法，且单幅图像的处理时间在1秒左右，满足实际应用要求。可见本发明提出的基于小波变换的二值化方法，更加适应于背景复杂的场景图像。The method proposed by the present invention performs wavelet decomposition on the grayscale image of the complete natural scene image, filters out the text to obtain an approximate foreground distribution, calculates the global binarization threshold according to the foreground distribution, and superimposes the global threshold and the background distribution to form The final local threshold used for image binarization. In order to verify the effectiveness of the method of the present invention, 9 important binarization methods such as Otsu, Pavlidis, Liu, Bernsen, Niblack, Sauvola, Gatos, Multi-Scale, and block were selected for comparative experiments. These binarization methods include both global methods and local methods; there are earlier methods recognized by similar studies, and researchers have recently proposed binarization methods. Comprehensive coverage in method selection. The test was carried out on the ICDAR2003 sample set, and the results show that the method of the present invention is superior to the above nine methods in terms of recall rate, accuracy rate and F-measure, and the processing time of a single image is about 1 second, which meets the actual application requirements. It can be seen that the binarization method based on wavelet transform proposed by the present invention is more suitable for scene images with complex backgrounds.

附图说明 Description of drawings

图1为基于小波变换的二值化过程示意图；Fig. 1 is a schematic diagram of the binarization process based on wavelet transform;

图2为行扩展矩阵；Fig. 2 is row extension matrix;

图3为隔行采样矩阵。Figure 3 is an interlaced sampling matrix.

图4为二值化过程示意图，其中，(a)为转换后的灰度图，(b)单层小波分解结果，(c)单层小波分解图，(d)背景分布图，(e)前景分布图，(f)前景分布灰度直方图，(g)二值图。Figure 4 is a schematic diagram of the binarization process, where (a) is the converted grayscale image, (b) the single-layer wavelet decomposition result, (c) the single-layer wavelet decomposition map, (d) the background distribution map, (e) Foreground distribution map, (f) gray histogram of foreground distribution, (g) binary image.

具体实施方式 Detailed ways

图1给出了本发明的具体流程，现结合本发明实施例进一步详细说明：Fig. 1 has provided the concrete flow process of the present invention, now further elaborates in conjunction with the embodiment of the present invention:

1.彩色图像转为灰度图像1. Convert color image to grayscale image

取一幅宽和高分别为W＝388和H＝543的彩色自然场景图像I，先根据以下公式将其转为灰度图像GRAY。对于

y∈[1，H]有：Take a color natural scene image I whose width and height are W=388 and H=543 respectively, and first convert it into a grayscale image GRAY according to the following formula. for

y ∈ [1, H] has:

转换后的灰度图如图4中(a)所示。The converted grayscale image is shown in Figure 4(a).

2.背景分布近似2. Background distribution approximation

先对灰度图像GRAY作L层小波分解，得到第L层近似系数LL和三个方向的细节系数，分别为水平细节系数HL、垂直细节系数LH和对角细节系数HH，如图4中(b)所示。根据大量实验，小波的分解层数L取6层效果最好，本实施例中采用反对称双正交小波，分解层数L为6；再通过低通滤波平滑文字部分的图像，并做1层小波重构，得到背景分布的缩略图；最后利用图像插值将背景分布缩略图放大至原始图像大小，即得到近似的背景分布图BG，如图4中(d)所示。各步的具体处理方法如下：First, L-level wavelet decomposition is performed on the grayscale image GRAY to obtain the L-th layer approximation coefficient LL and the detail coefficients in three directions, which are respectively the horizontal detail coefficient HL, the vertical detail coefficient LH and the diagonal detail coefficient HH, as shown in Figure 4 ( b) as shown. According to a large number of experiments, the wavelet decomposition layer number L is 6 layers, and the effect is the best. In this embodiment, the antisymmetric biorthogonal wavelet is used, and the decomposition layer number L is 6; then the image of the text part is smoothed by low-pass filtering, and 1 Layer wavelet reconstruction is used to obtain the thumbnail of the background distribution; finally, image interpolation is used to enlarge the thumbnail of the background distribution to the size of the original image, and an approximate background distribution map BG is obtained, as shown in Figure 4(d). The specific processing method of each step is as follows:

1)小波分解1) Wavelet decomposition

对灰度图像进行1层小波分解，即将灰度图像分解为近似系数和三个方向的细节系数，如图4中(b)所示；对灰度图像作L层小波分解就是对L-1层结果中的近似系数LL进行分解，得到新的近似系数和细节系数。本发明利用反对称双正交小波作小波分解，其对应的低通和高通滤波器系数矩阵分别为

和

设本实施例中待分解矩阵f的尺寸为M*N，M和N分别对应图像的高和宽，M＝H＝543，N＝W＝388。小波变换的分解步骤如下：Perform 1-level wavelet decomposition on the gray-scale image, that is, decompose the gray-scale image into approximate coefficients and detail coefficients in three directions, as shown in (b) in Figure 4; L-level wavelet decomposition on the gray-scale image is the L-1 The approximation coefficient LL in the layer result is decomposed to obtain new approximation coefficients and detail coefficients. The present invention utilizes the antisymmetric biorthogonal wavelet for wavelet decomposition, and its corresponding low-pass and high-pass filter coefficient matrices are respectively

and

Assuming that the size of the matrix f to be decomposed in this embodiment is M*N, M and N correspond to the height and width of the image respectively, M=H=543, N=W=388. The decomposition steps of wavelet transform are as follows:

图2为行扩展矩阵

图3为隔行采样矩阵

按以下公式将矩阵f(1层分解时f＝GRAY)左右各扩展一列得到M行(N+2)列的矩阵f_ar：Figure 2 is the row expansion matrix

Figure 3 is an interlaced sampling matrix

According to the following formula, the matrix f (f=GRAY during 1-layer decomposition) is extended by one column on the left and right to obtain the matrix f _ar of M rows (N+2) columns:

${f f}_{ar ar} = = f f * * {(({E E.}_{r r}^{N N}))}^{T T}$

我们取待分解矩阵f中的部分像素点矩阵f^*操作做具体说明，如下所示为列扩展的过程，以下均以右上角带“*”的符号表示取出的部分像素点矩阵f^*的处理过程。We take the part of the pixel matrix f ^* in the matrix f to be decomposed as a specific description, as shown below is the process of column expansion, and the symbol with "*" in the upper right corner indicates the processing of the extracted part of the pixel matrix f ^* process.

${f f}^{* *} = = [\begin{matrix} 9797 & 9696 & 9595 & 9494 \\ 9797 & 9696 & 9595 & 9494 \\ 9696 & 9696 & 9595 & 9494 \end{matrix}]$

${f f}_{ar ar}^{* *} = = [\begin{matrix} 9797 & 9797 & 9696 & 9595 & 9494 & 9494 \\ 9797 & 9797 & 9696 & 9595 & 9494 & 9494 \\ 9696 & 9696 & 9696 & 9595 & 9494 & 9494 \end{matrix}]$

按如下公式将f_ar分别与

和

作卷积，得到M行(N+1)列的矩阵f_H和f_G：According to the following formula, f _ar and

and

Do convolution to get the matrix f _H and f _G of M rows (N+1) columns:

$f_{H} = f_{ar} &CircleTimes; {\tilde{H}}^{T},$ 即 $f_{H} (x - 2, y - 1) = Σ_{k = x - 2}^{x - 1} f_{ar} (k, y - 1) \tilde{H} (x - k, 1)$ $f_{h} = f_{ar} &CircleTimes; {\tilde{h}}^{T},$ Right now $f_{h} (x - 2, the y - 1) = Σ_{k = x - 2}^{x - 1} f_{ar} (k, the y - 1) \tilde{h} (x - k, 1)$

$f_{G} = f_{ar} &CircleTimes; {\tilde{G}}^{T},$ 即 $f_{G} (x - 2, y - 1) = Σ_{k = x - 2}^{x - 1} f_{ar} (k, y - 1) \tilde{G} (x - k, 1)$ $f_{G} = f_{ar} &CircleTimes; {\tilde{G}}^{T},$ Right now $f_{G} (x - 2, the y - 1) = Σ_{k = x - 2}^{x - 1} f_{ar} (k, the y - 1) \tilde{G} (x - k, 1)$

${f f}_{H h}^{* *} = = [\begin{matrix} 137.2 137.2 & 136.5 136.5 & 135.1 135.1 & 133.6 133.6 & 132.9 132.9 \\ 137.2 137.2 & 136.5 136.5 & 135.1 135.1 & 133.6 133.6 & 132.9 132.9 \\ 135.8 135.8 & 135.8 135.8 & 135.1 135.1 & 133.6 133.6 & 132.9 132.9 \end{matrix}]$

${f f}_{G G}^{* *} = = [\begin{matrix} 00 & 0.7 0.7 & 0.7 0.7 & 0.7 0.7 & 00 \\ 00 & 0.7 0.7 & 0.7 0.7 & 0.7 0.7 & 00 \\ 00 & 00 & 0.7 0.7 & 0.7 0.7 & 00 \end{matrix}]$

按如下公式分别取f_H和f_G中的偶数列并作行扩展，得到(M+2)行(N+1)/2列的矩阵f_EH和f_EG：Take the even-numbered columns in f _H and f _G respectively according to the following formula and perform row expansion to obtain the matrix f _EH and f _EG of (M+2) rows (N+1)/2 columns:

${f f}_{EH EH} = = {E E.}_{r r}^{M m} * * {f f}_{H h} * * {(({S S}_{r r}^{N N + + 11}))}^{T T}$

${f f}_{EH EH}^{* *} = = [\begin{matrix} 136.5 136.5 & 133.6 133.6 \\ 136.5 136.5 & 133.6 133.6 \\ 136.5 136.5 & 133.6 133.6 \\ 135.8 135.8 & 133.6 133.6 \\ 135.8 135.8 & 133.6 133.6 \end{matrix}]$

${f f}_{EG EG} = = {E E.}_{r r}^{M m} * * {f f}_{G G} * * {(({S S}_{r r}^{N N + + 11}))}^{T T}$

${f f}_{EG EG}^{* *} = = [\begin{matrix} 0.7 0.7 & 0.7 0.7 \\ 0.7 0.7 & 0.7 0.7 \\ 0.7 0.7 & 0.7 0.7 \\ 00 & 0.7 0.7 \\ 00 & 0.7 0.7 \end{matrix}]$

按下列公式将f_EH和f_EG分别与

和

作卷积，可得到四个(M+1)行((N+1)/2)列的矩阵：According to the following formula, f _EH and f _EG are respectively compared with

and

For convolution, a matrix of four (M+1) rows ((N+1)/2) columns can be obtained:

$MA = f_{EH} &CircleTimes; \tilde{H},$ 即 $MA (x, y) = Σ_{k = x - 2}^{x - 1} f_{EH} (k, y - 1) \tilde{H} (x - k, 1)$ $MA = f_{EH} &CircleTimes; \tilde{h},$ Right now $MA (x, the y) = Σ_{k = x - 2}^{x - 1} f_{EH} (k, the y - 1) \tilde{h} (x - k, 1)$

${MA MA}^{* *} = = [\begin{matrix} 193.0 193.0 & 189189 \\ 193.0 193.0 & 189189 \\ 192.5 192.5 & 189189 \\ 192.0 192.0 & 189189 \end{matrix}]$

${MD}^{H} = f_{EH} &CircleTimes; \tilde{G},$ 即 $M D^{H} (x, y) = Σ_{k = x - 2}^{x - 1} f_{EH} (k, y - 1) \tilde{G} (x - k, 1)$ ${MD}^{h} = f_{EH} &CircleTimes; \tilde{G},$ Right now $m {D.}^{h} (x, the y) = Σ_{k = x - 2}^{x - 1} f_{EH} (k, the y - 1) \tilde{G} (x - k, 1)$

${MD MD}^{H h * *} = = [\begin{matrix} 00 & 00 \\ 00 & 00 \\ 0.5 0.5 & 00 \\ 00 & 00 \end{matrix}]$

${MD}^{V} = f_{EG} &CircleTimes; \tilde{H},$ 即 $M D^{V} (x, y) = Σ_{k = x - 2}^{x - 1} f_{EG} (k, y - 1) \tilde{H} (x - k, 1)$ ${MD}^{V} = f_{EG} &CircleTimes; \tilde{h},$ Right now $m {D.}^{V} (x, the y) = Σ_{k = x - 2}^{x - 1} f_{EG} (k, the y - 1) \tilde{h} (x - k, 1)$

${MD MD}^{V V * *} = = \begin{matrix} [\begin{matrix} 1.0 1.0 & 1.0 1.0 \\ 1.0 1.0 & 1.0 1.0 \\ 0.5 0.5 & 1.0 1.0 \\ 00 & 1.0 1.0 \end{matrix}] \end{matrix}$

${MD}^{D} = f_{EG} &CircleTimes; \tilde{G},$ 即 $M D^{D} (x, y) = Σ_{k = x - 2}^{x - 1} f_{EG} (k, y - 1) \tilde{G} (x - k, 1)$ ${MD}^{D.} = f_{EG} &CircleTimes; \tilde{G},$ Right now $m {D.}^{D.} (x, the y) = Σ_{k = x - 2}^{x - 1} f_{EG} (k, the y - 1) \tilde{G} (x - k, 1)$

${MD MD}^{D D. * *} = = [\begin{matrix} 00 & 00 \\ 00 & 00 \\ 0.5 0.5 & 00 \\ 00 & 00 \end{matrix}]$

其中，k为滤波器窗口取值范围。Among them, k is the value range of the filter window.

再按下式对矩阵MA、MD^H、MD^V、MD^D隔列采样，即可得到1层小波分解结果，如图4中(b)所示，LL为第1层小波分解的近似系数，HL为水平细节系数、LH为垂直细节系数，HH为对角细节系数。每一结果矩阵均为((M+1)/2)行((N+1)/2)列，从左至右依次为取出的部分像素点组成的矩阵的A、D^H、D^V和D^D：Then sample the matrices MA, MD ^H , M ^V , and M ^{D D} every other column according to the following formula, and then the wavelet decomposition result of the first layer can be obtained, as shown in (b) in Fig. 4, LL is the approximate coefficient of the wavelet decomposition of the first layer, HL is the horizontal detail factor, LH is the vertical detail factor, and HH is the diagonal detail factor. Each result matrix is ((M+1)/2) rows ((N+1)/2) columns, and from left to right are A, D ^H , D ^V and D ^D :

$A = MA * {(S_{r}^{N + 1})}^{T},$ $D^{H} = {MD}^{H} * {(S_{r}^{N + 1})}^{T},$ $D^{V} = {MD}^{V} * {(S_{r}^{N + 1})}^{T},$ $D^{D} = {MD}^{D} * {(S_{r}^{N + 1})}^{T}, A, D^{H},$ D^V、D^D分别对应图4中(c)中的左上、右上、左下和右下部分，即图4中(b)1层小波分解结果所示的LL、HL、LH、HH。 $A = MA * {(S_{r}^{N + 1})}^{T},$ ${D.}^{h} = {MD}^{h} * {(S_{r}^{N + 1})}^{T},$ ${D.}^{V} = {MD}^{V} * {(S_{r}^{N + 1})}^{T},$ ${D.}^{D.} = {MD}^{D.} * {(S_{r}^{N + 1})}^{T}, A, {D.}^{h},$ D ^V and D ^D respectively correspond to the upper left, upper right, lower left and lower right parts in (c) of Figure 4, that is, LL, HL, LH, and HH shown in the wavelet decomposition results of layer 1 in (b) of Figure 4.

${A A}^{* *} = = [\begin{matrix} 193.0 193.0 & 189189 \\ 192.0 192.0 & 189189 \end{matrix}]$ ${D D.}^{H h * *} = = [\begin{matrix} 00 & 00 \\ 00 & 00 \end{matrix}]$ ${D D.}^{V V * *} = = [\begin{matrix} 1.0 1.0 & 1.0 1.0 \\ 00 & 1.0 1.0 \end{matrix}]$ ${D D.}^{D D. * *} = = [\begin{matrix} 00 & 00 \\ 00 & 00 \end{matrix}]$

按上述步骤对A进行小波分解，即得到1层小波分解结果，依此类推，可计算出6层小波分解结果。本实施例中由于灰度图文字颜色较背景颜色浅，需将其反色之后再做小波分解。According to the above steps, A is decomposed by wavelet, that is, the result of 1-layer wavelet decomposition is obtained, and so on, the result of 6-layer wavelet decomposition can be calculated. In this embodiment, since the text color of the grayscale image is lighter than the background color, it is necessary to perform wavelet decomposition after inverting the color.

2)小波滤波2) Wavelet filtering

小波滤波过程就是将小波滤波结果中大于阈值T_n的细节系数置为0，保留小于T_n的值，即：The wavelet filtering process is to set the detail coefficients greater than the threshold _Tn in the wavelet filtering result to 0, and keep the values smaller than _Tn , namely:

${D D.}^{H h} ((x x,, y the y)) = = \{\begin{matrix} {D D.}^{H h} ((x x,, y the y)),, if if {D D.}^{H h} ((x x,, y the y)) < < {T T}_{n no} \\ 00,, otherwise otherwise \end{matrix}$

${D D.}^{V V} ((x x,, y the y)) = = \{\begin{matrix} {D D.}^{V V} ((x x,, y the y)),, if if {D D.}^{V V} ((x x,, y the y)) < < {T T}_{n no} \\ 00,, otherwise otherwise \end{matrix}$

${D D.}^{D D.} ((x x,, y the y)) = = \{\begin{matrix} {D D.}^{D D.} ((x x,, y the y)),, if if {D D.}^{D D.} ((x x,, y the y)) < < {T T}_{n no} \\ 00,, otherwise otherwise \end{matrix}$

根据小波滤波最常用的阈值，即由Dohono提出的visu thrink方法设定的阈值，对最高层分解结果中的文字信息进行处理，文字部分的。阈值计算方法见下式：According to the most commonly used threshold of wavelet filtering, that is, the threshold set by the visu shrink method proposed by Dohono, the text information in the top-level decomposition result is processed, and the text part is. The threshold calculation method is shown in the following formula:

T_n＝σ_n*sqrt(2lnN)T _n =σ _n *sqrt(2lnN)

其中，σ_n＝c/0.6745，c为小波细节系数的绝对值的中值，N＝3*H_L*W_L，H_L和W_L是L层小波分解后得到系数矩阵的行和列，本实施例中计算得到的阈值为204.6。6层小波分解后的细节系数如下所示，我们根据阈值做滤波后发现，由于我们取出的部分像素点属于背景像素，细节系数的值小于阈值，最终将保留下来，而文字部分将被过滤掉。Among them, σ _n =c/0.6745, c is the median value of the absolute value of the wavelet detail coefficient, N=3*H _L *W _L , H _L and W _L are the rows and columns of the coefficient matrix obtained after L-level wavelet decomposition, The calculated threshold value in this embodiment is 204.6. The detail coefficient after 6-layer wavelet decomposition is as follows. After filtering according to the threshold value, we find that because some of the pixels we take out belong to the background pixels, the value of the detail coefficient is smaller than the threshold value. Finally, will be preserved, while the text portion will be filtered out.

$\begin{matrix} {D D.}^{H h * *} = = [[2.1 2.1]] & {D D.}^{V V * *} = = [[7.9 7.9]] & {D D.}^{D D. * *} = = [[- - 1.2 1.2]] \end{matrix}$

3)小波重构3) Wavelet reconstruction

小波重构是小波分解的逆运算，对最高层小波分解系数A_L、

利用小波分解系数与

和

的共轭转置矩阵

和

的卷积之和对分解结果进行重构，如以下公式所示：Wavelet reconstruction is the inverse operation of wavelet decomposition. For the highest layer wavelet decomposition coefficients A _L ,

Using wavelet decomposition coefficients and

and

The conjugate transpose matrix of

and

The sum of the convolutions reconstructs the decomposition result, as shown in the following formula:

${f f}_{r r} = = {F f}_{{H h}_{L L}} * * (((({E E.}_{{H h}_{L L}} * * A A &CircleTimes; &CircleTimes; H h)) * * {(({E E.}_{{W W}_{L L}}))}^{T T} &CircleTimes; &CircleTimes; {H h}^{T T})) * * {(({F f}_{{W W}_{L L}}))}^{T T}$

$+ + {F f}_{{H h}_{L L}} * * (((({E E.}_{{H h}_{L L}} * * {D D.}^{H h} &CircleTimes; &CircleTimes; G G)) * * {(({E E.}_{{W W}_{L L}}))}^{T T} &CircleTimes; &CircleTimes; {H h}^{T T})) * * {(({F f}_{{W W}_{L L}}))}^{T T}$

$+ + {F f}_{{H h}_{L L}} * * (((({E E.}_{{H h}_{L L}} * * {D D.}^{V V} &CircleTimes; &CircleTimes; H h)) * * {(({E E.}_{{W W}_{L L}}))}^{T T} &CircleTimes; &CircleTimes; {G G}^{T T})) * * {(({F f}_{{W W}_{L L}}))}^{T T}$

$+ + {F f}_{{H h}_{L L}} * * (((({E E.}_{{H h}_{L L}} * * {D D.}^{D D.} &CircleTimes; &CircleTimes; G G)) * * {(({E E.}_{{W W}_{L L}}))}^{T T} &CircleTimes; &CircleTimes; {G G}^{T T})) * * {(({F f}_{{W W}_{L L}}))}^{T T}$

${f f}_{r r}^{* *} = = [\begin{matrix} 9595 & 9595 \\ 9595 & 9595 \end{matrix}]$

其中，in,

4)图像插值4) Image interpolation

本发明采用双三次插值，利用待采样点周围16个点的灰度做三次插值，不仅考虑到4个直接相邻点的灰度影响，而且考虑到各邻点间灰度值变化率的影响。双三次插值算法需要选取插值基函数来拟合数据，其形式如下式所示：The present invention adopts bicubic interpolation, and uses the gray levels of 16 points around the point to be sampled to perform cubic interpolation, not only considering the influence of gray levels of four directly adjacent points, but also considering the influence of the change rate of gray values between adjacent points . The bicubic interpolation algorithm needs to select the interpolation basis function to fit the data, and its form is shown in the following formula:

$S S ((w w)) = = \{\begin{matrix} 11 - - 22 {| | w w | |}^{22} + + + + {| | w w | |}^{33} & | | w w | | < < 11 \\ 44 - - 88 | | w w | | + + 55 {| | w w | |}^{22} - - {| | w w | |}^{33} & 11 \leq \leq | | w w | | < < 22 \\ 00 & | | w w | | &GreaterEqual; &Greater Equal; 22 \end{matrix}$

按照如下公式进行双三次插值，即可得到插值后的图像矩阵f_R，即近似的背景分布图BG，如图4中(d)所示。f_R与原始图像矩阵大小相同：Perform bicubic interpolation according to the following formula to obtain the interpolated image matrix f _R , that is, the approximate background distribution map BG, as shown in (d) in FIG. 4 . f _R is the same size as the original image matrix:

f_R(i+u，j+v)＝A*B*Cf _R (i+u, j+v)=A*B*C

${f f}_{R R}^{* *} = = [\begin{matrix} 9595 & 9595 & 9595 & 9595 \\ 9595 & 9595 & 9595 & 9595 \\ 9595 & 9595 & 9595 & 9595 \end{matrix}]$

其中，A、B、C均为矩阵，其形式如下：Among them, A, B, and C are all matrices, and their form is as follows:

A＝[S(1+u) S(u) S(1-u) S(2-u)]A＝[S(1+u) S(u) S(1-u) S(2-u)]

$B B = = [\begin{matrix} f f ((i i - - 11,, j j - - 22)) & f f ((i i,, j j - - 22)) & f f ((i i + + 11,, j j - - 22)) & f f ((i i + + 22,, j j - - 22)) \\ f f ((i i - - 11,, j j - - 11)) & f f ((i i,, j j - - 11)) & f f ((i i + + 11,, j j - - 11)) & f f ((i i + + 22,, j j - - 11)) \\ f f ((i i - - 11,, j j)) & f f ((i i,, j j)) & f f ((i i + + 11,, j j)) & f f ((i i + + 22,, j j)) \\ f f ((i i - - 11,, j j + + 11)) & f f ((i i,, j j + + 11)) & f f ((i i + + 11,, j j + + 11)) & f f ((i i + + 11,, j j + + 11)) \end{matrix}]$

C＝[S(1+v) S(v) S(1-v) S(2-v)]^T C＝[S(1+v) S(v) S(1-v) S(2-v)] ^T

3.前景分布近似3. Foreground distribution approximation

将背景分布图BG与原灰度图GRAY作差运算即可得到前景分布图FG。显然，在前景分布中，原灰度图背景像素点的值应趋近于0、前景像素点的值应远离0。由于小波滤噪本质上是对图像做平滑处理，有些像素点(尤其是文字周围的背景)在滤噪时会变深(即像素点的值变大)，在作差运算时有可能出现负值，此时将这种出现负值的情况都直接置为0。因此，最终按下式计算前景分布：The background distribution map BG and the original grayscale image GRAY are subtracted to obtain the foreground distribution map FG. Obviously, in the foreground distribution, the value of the background pixel of the original grayscale image should approach 0, and the value of the foreground pixel should be far away from 0. Since wavelet noise filtering is essentially smoothing the image, some pixels (especially the background around the text) will become darker during noise filtering (that is, the value of the pixel becomes larger), and negative Value, at this time, the case of such a negative value is directly set to 0. Therefore, the foreground distribution is finally calculated as follows:

$FG FG ((x x,, y the y)) = = \{\begin{matrix} BG BG ((x x,, y the y)) - - G G ((x x,, y the y)),, ifG ifG ((x x,, y the y)) < < BG BG ((x x,, y the y)) \\ 00,, otherwise otherwise \end{matrix}$

本实施例中计算得到的前景分布图如图4中(e)所示，由图4中(a)原灰度图与(d)背景分布图按照上式作差运算得到。The foreground distribution diagram calculated in this embodiment is shown in (e) in FIG. 4, which is obtained by subtracting the original grayscale image (a) and the background distribution diagram (d) in FIG. 4 according to the above formula.

4.全局阈值计算4. Global Threshold Calculation

在前景分布中，大量背景像素点取值为0或接近于0，而前景像素点的值则远离0。由于在前景分布图中，前背景差异较大，因此这里采用一个比较简单的全局阈值计算方法：首先计算前景分布中所有像素点的均值μ和标准差σ；再在[|μ-σ|，|μ+σ|]区间上找具有最小投影数的灰度值作为前景分布的全局阈值GT，其计算方法如下列公式所示。In the foreground distribution, a large number of background pixels have a value of 0 or close to 0, while the values of foreground pixels are far from 0. Since the foreground and background are quite different in the foreground distribution map, a relatively simple global threshold calculation method is used here: first calculate the mean value μ and standard deviation σ of all pixels in the foreground distribution; then in [|μ-σ|, Find the gray value with the minimum number of projections on the |μ+σ|] interval as the global threshold GT of the foreground distribution, and its calculation method is shown in the following formula.

H(i)表示前景分布灰度直方图，其计算方法如下式所示：H(i) represents the gray histogram of the foreground distribution, and its calculation method is shown in the following formula:

$H h ((i i)) = = {Σ Σ}_{y the y = = 11}^{N N} {Σ Σ}_{x x = = 11}^{M m} A A ((i i,, FG FG ((x x,, y the y)))),, i i &Element; &Element; [[0,255 0,255]]$

其中， $A (m, n) = \{\begin{matrix} 1, ifm = n \\ 0, otherwise \end{matrix} .$ in, $A (m, no) = \{\begin{matrix} 1, ifm = no \\ 0, otherwise \end{matrix} .$

本实施例根据上述公式计算得到前景分布灰度直方图，如图4中(f)所示。前景分布图中所有像素点的均值μ＝15，标准差σ＝30，在[|μ-σ|，|μ+σ|]区间上找到的全局阈值GT＝45，如图(f)中所标注。In this embodiment, the foreground distribution grayscale histogram is calculated according to the above formula, as shown in (f) in FIG. 4 . The mean of all pixels in the foreground distribution map is μ=15, the standard deviation σ=30, and the global threshold GT=45 found on the interval [|μ-σ|, |μ+σ|], as shown in figure (f) label.

5.局部阈值计算5. Local Threshold Calculation

按以下公式将全局阈值GT与背景分布BG叠加，即可得到原灰度图GRAY中每一个像素点(x，y)的二值化局部阈值LT(x，y)。在原灰度图中有文字的地方，其局部阈值较大，使得在二值化图像中文字可以得到很好的保留；而在原灰度图中为背景的地方，其局部阈值较小，使得大部分背景在二值化图像中仍然是背景。By superimposing the global threshold GT and the background distribution BG according to the following formula, the binarized local threshold LT(x, y) of each pixel (x, y) in the original grayscale image GRAY can be obtained. Where there is text in the original grayscale image, its local threshold is larger, so that the text can be well preserved in the binary image; while in the place where the original grayscale image is the background, its local threshold is smaller, making the large Part of the background is still the background in the binarized image.

LT(x，y)＝GT+BG(x，y)LT(x,y)=GT+BG(x,y)

6.二值化6. Binarization

按以下公式将灰度图像GRAY转换为二值图像BW。对于原灰度图GRAY中的每一个像素点(x，y)，若灰度值小于该点的局部二值化阈值LT(x，y)，则该点被二值化为前景点，否则该点被二值化为背景点。图4中(g)为(a)转换得到的二值图。Convert the grayscale image GRAY to a binary image BW according to the following formula. For each pixel point (x, y) in the original grayscale image GRAY, if the gray value is less than the local binarization threshold LT(x, y) of the point, the point is binarized as a foreground point, otherwise This point is binarized as a background point. (g) in Figure 4 is the binary image converted from (a).

$BW BW ((x x,, y the y)) = = \{\begin{matrix} 11,, ifGRAY ifGRAY ((x x,, y the y)) < < LT LT ((x x,, y the y)) \\ 00,, otherwise otherwise \end{matrix}$

Claims

1. an image binarization method based on wavelet transform, is characterized in that the method comprises the steps:

Step 1. Read in a color image of a natural scene and convert it into a grayscale image;

Step 2. Background distribution approximation. Firstly, L-level wavelet decomposition is performed on the grayscale image to obtain the L-level approximation coefficient LL and the detail coefficients in three directions, which are the horizontal detail coefficient HL, the vertical detail coefficient LH and the diagonal detail coefficient HH respectively. ; Then smooth the image of the text part through low-pass filtering, and do 1-layer wavelet reconstruction to obtain a thumbnail of the background distribution; finally, use image interpolation to enlarge the thumbnail of the background distribution to the size of the original image, and obtain an approximate background distribution map;

Step 3, the foreground distribution is approximate, and the difference image obtained by performing the difference operation between the background distribution map and the original grayscale image in step 1 is the foreground distribution map;

Step 4, select a global binarization method to calculate the threshold on the difference image, that is, the global threshold;

Step 5, superimposing the global threshold value and the background distribution map obtained in step 2, the binarized local threshold value of each pixel in the original grayscale image can be obtained;

Step 6. Convert the original grayscale image in step 1 into a binary image according to the binarization threshold obtained in step 5.

2. The method according to claim 1, characterized in that the color image described in step 1 is converted to a gray scale image GRAY and adopts the following formula:

GRAY(x,y)=0.2989*R(I(x,y))+0.5870*G(I(x,y))+0.1140*B(I(x,y))

Among them, R(.), G(.) and B(.) represent red, green and blue components respectively.

3. method according to claim 1, it is characterized in that the background distribution described in step 2 approximately adopts antisymmetric biorthogonal wavelet to do multi-layer wavelet decomposition to gray scale image, set according to the visu shrink method proposed by Dohono Threshold, only process the text information in the top-level decomposition result, and then reconstruct the top-level decomposition result to eliminate the interference of noise and background factors, and finally use bicubic interpolation to enlarge the reconstructed background thumbnail to the original image size, that is, the approximate background distribution map BG is obtained.

4. The method according to claim 1, characterized in that the foreground distribution described in step 3 is approximate due to the fact that wavelet noise filtering is essentially smoothing the image, and some pixels will become darker when filtering noise, that is, pixel points The value of becomes larger, especially the background pixels around the text, which may have negative values during the difference calculation. At this time, the negative values are directly set to 0; therefore, the following formula is used to calculate the foreground distribution map FG:

FG FG ((x x,, y the y)) = = \{\begin{matrix} BG BG ((x x,, y the y)) - - GRAY GRAY ((x x,, y the y)),, ifGRAY ifGRAY ((x x,, y the y)) < < BG BG ((x x,, y the y)) \\ 00,, otherwise otherwise \end{matrix} . .

5. The method according to claim 1, characterized in that the global threshold calculation described in step 4 adopts a relatively simple global threshold calculation method: first calculate the mean value μ and standard deviation σ of all pixels in the foreground distribution; then Find the gray value with the minimum number of projections on the [|μ-σ|, |μ+σ|] interval as the global threshold GT of the foreground distribution. The calculation method is shown in the following formula,

μ μ = = {Σ Σ}_{i i = = 00}^{255255} i i * * H h ((i i)) / / ((M m * * N N))

σ σ = = \sqrt{\frac{11}{M m * * N N - - 11} {Σ Σ}_{i i = = 00}^{255255} i i * * {((H h ((i i)) - - μ μ))}^{22}}

GT GT = = arg arg {min min}_{i i = = μ μ - - σ σ}^{μ μ + + σ σ} ((H h ((i i))))

Among them, M and N are the number of rows and columns of the foreground distribution map respectively, and H(i) represents the gray level histogram of the foreground distribution, which is calculated according to the following formula:

H h ((i i)) = = {Σ Σ}_{y the y = = 11}^{N N} {Σ Σ}_{x x = = 11}^{M m} A A ((i i,, FG FG ((x x,, y the y)))),, i i &Element; &Element; [[0,255 0,255]],,

A A ((m m,, n no)) = = \{\begin{matrix} 11,, ifm ifm = = n no \\ 00,, otherwise otherwise \end{matrix} . .

6. method according to claim 1, it is characterized in that the local threshold value calculation described in step 5 superimposes global threshold value and background distribution according to following formula, obtains the binary value of each pixel point (x, y) in the original grayscale image Valued local threshold LT(x,y),

LT(x,y)=GT+BG(x,y).

7. The method according to claim 1, wherein the binarization of the grayscale image described in step 6 calculates according to the local binarization threshold obtained in step 5, for each pixel in the original grayscale image ( x, y), if the gray value is less than the local binarization threshold LT(x, y) of the point, the point is binarized as a foreground point, otherwise the point is binarized as a background point.