CN100487722C

CN100487722C - Method for determining connection sequence of cascade classifiers with different features and specific threshold

Info

Publication number: CN100487722C
Application number: CNB2005100994395A
Authority: CN
Inventors: 戚飞虎; 朱凯华; 蒋人杰; 徐立; 相泽知祯
Original assignee: Shanghai Jiao Tong University; Omron Corp
Current assignee: Shanghai Jiao Tong University; Omron Corp
Priority date: 2005-08-26
Filing date: 2005-08-26
Publication date: 2009-05-13
Anticipated expiration: 2025-08-26
Also published as: CN1920852A

Abstract

The present invention proposes a method for determining the connection order and feature thresholds of a cascade of classifiers for a set of different features. The set of cascaded classifiers is used to extract the connected components to be selected from the candidate connected components decomposed from the image. The method includes the following steps: firstly obtain a plurality of connected components by decomposing at least one image as the current sample, and then send the current sample in parallel to the current cascade classifier of each feature for cyclic training, so as to determine each different The connection order of the cascade classifiers of the features and the feature threshold interval. The present invention also proposes a method for obtaining the image to be selected from the image, using the classifier group cascaded according to the aforementioned method, the non-selected connected components can be quickly removed, and more time is spent on calculating possible It is the selected connected component, which not only improves the image processing speed, but also improves the accuracy of image acquisition.

Description

A Method for Determining the Connection Order and Feature Threshold of Cascade Classifiers of Features

技术领域 technical field

本分明涉及一种数字图像处理领域，尤其涉及一种确定一组不同特征的级联分类器的连接顺序和特征阈值的方法，以及利用由该方法形成的级联分类器组从图像中获取选定的图像的方法。The invention clearly relates to the field of digital image processing, and in particular to a method for determining the connection sequence and feature threshold of a set of cascade classifiers with different features, and using the cascade classifier group formed by the method to obtain selected The method of specifying the image.

背景技术 Background technique

自然场景中的文本检测和分割有很多应用。随着高性能、低价格、便携式数码影像设备的增加，场景文本识别的应用急速扩展。通过使用和手机、PDA或其他专用数码设备相连的摄像机，我们能轻易地捕取身边的文本：例如路名、广告、交通警告、饭店菜单等等。对这些文本的自动识别、翻译和发音能够对海外游客、视觉障碍人士和视频检索程序、会议处理等起到很大的帮助。Text detection and segmentation in natural scenes has many applications. With the increase of high-performance, low-cost, and portable digital imaging equipment, the application of scene text recognition is expanding rapidly. By using a camera connected to a mobile phone, PDA or other dedicated digital devices, we can easily capture the text around us: such as road names, advertisements, traffic warnings, restaurant menus, and so on. Automatic recognition, translation and pronunciation of these texts can be of great help to overseas tourists, visually impaired people and video retrieval programs, meeting handling, etc.

从图像特别是自然场景图像中全自动地提取文本，始终是一个具有挑战性的问题。其难点包括：字符的字体、大小、复杂背景、非均匀光照、阴影和图像噪声等。而且，对图像处理速度的要求也越来越高。Fully automatic text extraction from images, especially images of natural scenes, remains a challenging problem. The difficulties include: character fonts, sizes, complex backgrounds, non-uniform lighting, shadows, and image noise. Moreover, the requirements for image processing speed are getting higher and higher.

近年来，针对自然场景图像中文本获取的工作有了较快的发展。目前有两类从自然场景图像中获取文本的方法。In recent years, the work on text acquisition in natural scene images has developed rapidly. There are currently two classes of methods for extracting text from natural scene images.

第一类是基于纹理的方法。Shin等人在2000年发表的《基于支持向量机的数字视频文本检测》中使用星型像素模板来揭示文本的内在特征。在2000年9月发表的《使用局部化度量方法获取文本区域》中，P.Clark等人仔细地提出了5种局部化的度量方法，并将这些度量组合来求得候选文本区域。频域方法也被用于检测类似文本的纹理，例如：短扫描行的傅立叶变换，离散余弦变换，Gabor变换，小波分解，多分辨率边缘检测。我们发现对于相对较小的字符，如菜单或文档中的文本行，这些方法效果良好，因为小文本通常拥有强纹理响应。然而，对于大字符，例如路旁或店名，对于复杂背景的强纹理响应会误导这些算法，从而留下很多大字符没有被发现。The first category is texture-based methods. In "Support Vector Machine-Based Text Detection in Digital Video" published by Shin et al. in 2000, star-shaped pixel templates were used to reveal the intrinsic characteristics of text. In "Getting Text Regions Using Localized Measurement Methods" published in September 2000, P.Clark et al. carefully proposed five localized measurement methods, and combined these measurements to obtain candidate text regions. Frequency-domain methods have also been used to detect text-like textures, for example: Fourier transform of short scan lines, discrete cosine transform, Gabor transform, wavelet decomposition, multi-resolution edge detection. We found that these methods work well for relatively small characters, such as lines of text in menus or documents, because small text usually has a strong texture response. However, for large characters, such as roadsides or shop names, strong texture responses to complex backgrounds can mislead these algorithms, leaving many large characters undetected.

第二类方法是基于连通分量(Connected-Component，CC)的方法。彩色量化，数学形态学操作和对称邻域滤波通常被用于将原始图像分解为候选连通分量。这些算法能有效地处理大字符和小字符。但如何从候选连通分量中提取文本连通分量，人们往往使用启发式的方法，例如：长宽比，对齐与合并分析，布局分析，多层连通分量分析。这类方法的缺点在于，所有启发式的规则是固定的顺序，而且其阈值是手动输入的经验值，通常不稳定，不能保证得到鲁棒的检测结果。另外，还可以用一种强分类器(例如支持向量机，SupportVector Machine，SVM)从候选连通分量中提取文本连通分量，这类方法的缺点是对每个连通分量必须计算其全部特征，计算量和耗费的时间都太大了。The second type of method is a method based on Connected-Component (CC). Color quantization, mathematical morphological operations, and symmetric neighborhood filtering are often used to decompose the original image into candidate connected components. These algorithms handle both large and small characters efficiently. But how to extract text connected components from candidate connected components, people often use heuristic methods, such as: aspect ratio, alignment and merge analysis, layout analysis, multi-layer connected component analysis. The disadvantage of this type of method is that all heuristic rules are in a fixed order, and the threshold value is a manual input experience value, which is usually unstable and cannot guarantee robust detection results. In addition, a strong classifier (such as Support Vector Machine, SupportVector Machine, SVM) can be used to extract text connected components from candidate connected components. The disadvantage of this method is that all features must be calculated for each connected component, and the amount of calculation And it takes too much time.

本发明受到人脸检测技术的启发，利用级联分类器组从候选连通分量中提取要选定的连通分量(例如，文本连通分量)，在提高图像处理速度的同时能获得很高的检测率。Inspired by face detection technology, the present invention uses cascaded classifier groups to extract connected components (for example, text connected components) to be selected from candidate connected components, and can obtain a high detection rate while improving image processing speed .

发明内容 Contents of the invention

本发明的目的之一在于提出一种确定一组不同特征(F₁，F₂，...，F_M)的级联分类器(h₁，h₂，...，h_M)的连接顺序和特征阈值的方法。该级联分类器组用于从由图像分解而得的候选连通分量中提取要选定的连通分量，这里的不同特征与要选定的图像相关，该方法包括以下步骤：One of the objects of the invention is to propose a connection of cascaded classifiers (h ₁ , h ₂ , ..., h _M ) that determine a set of different features (F ₁ , F ₂ , ..., F _M ) Methods for sequential and feature thresholding. The cascaded classifier group is used to extract the connected components to be selected from the candidate connected components obtained by image decomposition, where different features are related to the images to be selected. The method includes the following steps:

a.通过分解至少一幅图像而获得多个连通分量作为当前样例，并将M个不同特征的级联分类器作为当前各个特征的级联分类器，所述当前样例包括正例集合P和反例集合N，所述正例是要选定的连通分量，所述反例是非选定的连通分量；a. Obtain multiple connected components by decomposing at least one image as the current sample, and use the cascade classifier of M different features as the current cascade classifier of each feature, the current sample includes the set of positive examples P And negative example set N, the positive example is the connected component to be selected, and the negative example is the non-selected connected component;

b.将当前样例并行送入当前各个特征的级联分类器中，进行i次循环训练中的一次训练，其中i为0<i≤M的正整数，依次选取参与每次训练的当前所有特征中最大虚警率对应的特征，由此确定各个不同特征的级联分类器的连接顺序，其中所述虚警率为每次训练中被级联分类器误认为要选定的连通分量而实际为非选定的连通分量的数量与当前反例数量之比；b. Send the current sample in parallel to the current cascade classifier of each feature, and perform one training in the i cycle training, where i is a positive integer of 0<i≤M, and select all the current samples participating in each training in turn The feature corresponding to the maximum false alarm rate in the feature, thereby determining the connection sequence of the cascade classifier of each different feature, wherein the false alarm rate is the connected component that is mistaken for the selected connected component by the cascade classifier in each training The ratio of the number of connected components that are actually non-selected to the number of current negative examples;

c.每次选取特征后，再将当前样例送入此次选取的特征对应的级联分类器中进行训练，在此训练过程中，虚警率和检测率都不断变化，并根据该特征被允许的最小检测率获取所述特征的阈值区间，由此确定各个不同特征的级联分类器的特征阈值区间；所述检测率为一个级联分类器正确检测出的选定连通分量的数量与正例数量之比；以及c. After each feature is selected, the current sample is sent to the cascade classifier corresponding to the selected feature for training. During this training process, the false alarm rate and detection rate are constantly changing, and according to the feature The allowed minimum detection rate obtains the threshold interval of the feature, thereby determining the feature threshold interval of the cascade classifier of each different feature; the detection rate is the number of selected connected components correctly detected by a cascade classifier ratio to the number of positive examples; and

d.在执行步骤b和c之后，删除步骤b中被选取的特征和该特征的分类器以更新当前特征和当前各个特征的分类器，并且将该次训练中的正例集合保持不变以及将步骤c中获得特征的阈值区间时被级联分类器误认为要选定的连通分量而实际为非选定的连通分量作为新的反例集合来更新当前样例，用于下次循环训练。d. After performing steps b and c, delete the feature selected in step b and the classifier of the feature to update the current feature and the current classifier of each feature, and keep the set of positive examples in this training unchanged and When the threshold interval of the feature is obtained in step c, the cascaded classifier mistakenly regards the connected component as the selected connected component but is actually not selected as the new negative example set to update the current sample for the next cycle training.

上述步骤a中将图像分解为连通分量进一步包括以下步骤：Decomposing the image into connected components in the above step a further includes the following steps:

a1.用非线性Niblack阈值化方法处理所述图像；和a1. processing said image with a non-linear Niblack thresholding method; and

a2.将所述处理后的图像分解为连通分量。a2. Decomposing the processed image into connected components.

其中，非线性Niblack阈值化方法在标准Niblack方法的背景滤波器和前景滤波器中各增加了一个统计顺序滤波器。Among them, the nonlinear Niblack thresholding method adds a statistical order filter to the background filter and the foreground filter of the standard Niblack method.

本发明的另一目的在于提供一种从图像中获取要选定的图像的方法，包括以下步骤：Another object of the present invention is to provide a method for obtaining images to be selected from images, comprising the following steps:

A.将图像分解为连通分量；A. Decompose the image into connected components;

B.将该连通分量送入根据前述方法所级联起来的一组不同特征的级联分类器的第一级，该特征与要选定的图像相关，每一个级联分类器丢弃非选定连通分量，并向下一级分类器输出要选定的连通分量；以及B. Send this connected component to the first stage of a cascade classifier of a group of different features cascaded according to the aforementioned method, which feature is related to the image to be selected, and each cascade classifier discards non-selected Connected components, and output the connected components to be selected to the next classifier; and

C.将级联分类器组中最后一级分类器输出的要选定的连通分量组合形成要选定的图像。C. Combining the connected components to be selected output by the last classifier in the cascade classifier group to form the image to be selected.

本发明的又一目的在于提供一种从图像中获取要选定的图像的装置，该装置包括：Another object of the present invention is to provide a device for acquiring images to be selected from images, the device comprising:

分解装置，用于将图像分解为连通分量；decomposition means for decomposing the image into connected components;

根据前述方法级联起来的一组不同特征的级联分类器，将该连通分量输入该级联分类器的第一级，每一个级联分类器丢弃非选定的连通分量，并向下一级分类器输出要选定的连通分量；以及A group of cascade classifiers with different characteristics cascaded according to the aforementioned method, the connected components are input into the first stage of the cascade classifier, each cascade classifier discards non-selected connected components, and forwards to the next The level classifier outputs the connected components to be selected; and

图像合成装置，用于将级联分类器组中最后一级分类器输出的要选定的连通分量组合形成要选定的图像。The image synthesis device is used for combining the connected components to be selected output by the last classifier in the cascade classifier group to form the image to be selected.

由于本发明方法使用了新的非线性Niblack方法来处理原始图像，可以高效地将灰度图像分解为多个候选连通分量，提高了连通分量的质量。另外，上述方法训练成的级联分类器组能够容易地去除大多数非文本连通分量，并快速关注认为可能是文本的连通分量。这样，既降低了本方法的计算量，提高了图像处理速度，又能得到很高的检测率。Because the method of the invention uses a new nonlinear Niblack method to process the original image, the grayscale image can be efficiently decomposed into multiple candidate connected components, and the quality of the connected components is improved. In addition, the cascaded classifier group trained by the above method can easily remove most non-text connected components, and quickly focus on the connected components that may be considered as text. In this way, the calculation amount of the method is reduced, the image processing speed is improved, and a high detection rate can be obtained.

附图说明 Description of drawings

图1是根据本发明一个实施例的确定一组不同特征的级联分类器的连接顺序和特征阈值的方法的流程图；Fig. 1 is a flow chart of a method for determining a connection sequence and a feature threshold of a cascade classifier of a set of different features according to an embodiment of the present invention;

图2是根据本发明第二实施例的从图像中获取文本图像的方法的流程图；以及FIG. 2 is a flow chart of a method for obtaining a text image from an image according to a second embodiment of the present invention; and

图3是根据本发明第三实施例的从图像中获取文本图像的装置图。Fig. 3 is a diagram of an apparatus for acquiring text images from images according to a third embodiment of the present invention.

具体实施方式 Detailed ways

下面将结合附图和具体实施方式对本发明作进一步的详细描述。The present invention will be further described in detail in conjunction with the accompanying drawings and specific embodiments.

如前述提到的，本发明方法受到人脸检测技术的启发，利用级联分类器组从候选连通分量中提取文本连通分量，候选连通分量通过分解原始图像而得到，原始图像可以是自然场景图像。将文本连通分量组合起来形成文本图像，这样，我们就可以从自然场景图像中获取文本图像了。As mentioned above, the method of the present invention is inspired by face detection technology, and uses a cascade classifier group to extract text connected components from candidate connected components. The candidate connected components are obtained by decomposing the original image, which can be a natural scene image . The text connected components are combined to form a text image, so that we can obtain text images from natural scene images.

那么，上述的级联分类器组怎样才能从候选连通分量中提取文本连通分量呢？So, how can the aforementioned group of cascaded classifiers extract textual connected components from candidate connected components?

首先，我们提出了12种不同的特征，这12种特征能够有效地区分文本或非文本连通分量。再将这12个特征对应级联分类器组中的每个分类器，并且对该级联分类器组进行训练，以确定这组不同特征的级联分类器的连接顺序和特征阈值。这样级联起来的级联分类器组能够快速地丢弃非文本连通分量，输出文本连通分量。First, we propose 12 different features, which can effectively distinguish textual or non-textual connected components. These 12 features are then corresponding to each classifier in the cascade classifier group, and the cascade classifier group is trained to determine the connection sequence and feature threshold of the cascade classifiers of this group of different features. Such a cascaded cascade classifier group can quickly discard non-text connected components and output text connected components.

接下来先具体描述揭示文本连通分量的内在本质特性的12种特征，包括：几何特征，边缘对比度特征，形状正则特征，笔划特征以及空间一致性特征。Next, we describe in detail the 12 features that reveal the intrinsic properties of connected components of text, including: geometric features, edge contrast features, shape regular features, stroke features, and spatial consistency features.

1.几何特征1. Geometric features

几何特征包括面积比率(Area Ratio)，长度比率(Length Ratio)和长宽比(Aspect Ratio)。它们能够很有效地排除大量的显然是非文本的连通分量，而且计算的代价很小。因此它们能急剧降低整个算法的执行时间。Geometric features include Area Ratio, Length Ratio and Aspect Ratio. They are very effective at rejecting large numbers of apparently non-textual connected components with little computational cost. Therefore they can drastically reduce the execution time of the entire algorithm.

面积比AreaRatio为连通分量的面积与图像面积之比，用于排除太大或太小的连通分量，其公式为：The area ratio AreaRatio is the ratio of the area of the connected component to the image area, which is used to exclude connected components that are too large or too small. The formula is:

$Feature Features__AreaRatio AreaRatio = = \frac{Area area ((CC CC))}{Area area ((Picture picture))} - - - - - - ((11))$

长度比LengthRatio用于排除太长或太短的连通分量：The length ratio LengthRatio is used to exclude connected components that are too long or too short:

$Feature Features__LengthRatio LengthRatio = = \frac{max max {{w w,, h h}}}{max max {{PicW Pic,, PicH PIC}}} - - - - - - ((22))$

长宽比AspectRatio用于提出太细的连通分量：The aspect ratio AspectRatio is used to propose too thin connected components:

$Feature Features__AspectRatio AspectRatio = = max max {{\frac{width width ((CC CC))}{height height ((CC CC))},, \frac{height height ((CC CC))}{width width ((CC CC))}}} - - - - - - ((33))$

上述公式中(2)，(3)中，w表示所述连通分量包围盒宽度，h表示所述连通分量包围盒的高度，W表示图像的宽度，以及H表示图像的高度。In (2) and (3) above, w represents the bounding box width of the connected component, h represents the height of the bounding box of the connected component, W represents the width of the image, and H represents the height of the image.

2.边缘对比度特征2. Edge contrast feature

边缘对比度特征包括边缘对比度(Edge Contrast)，边缘对比度为连通分量的边界和原始图像的边缘图像的重合度与连通分量的边界之比，其公式为：The edge contrast feature includes edge contrast (Edge Contrast), which is the ratio of the coincidence degree of the boundary of the connected component and the edge image of the original image to the boundary of the connected component, and its formula is:

$EdgeContrast EdgeContrast = = \frac{Border Border ((CC CC)) \cap \cap Edge Edge ((Picture picture))}{Border Border ((CC CC))} - - - - - - ((44))$

其中，Border(CC)是连通分量的边界像素，Edge(Picture)是原始图像的边缘检测图像，为Canny算子和Sobel算子的并集，其公式为：Among them, Border (CC) is the boundary pixel of connected components, Edge (Picture) is the edge detection image of the original image, which is the union of Canny operator and Sobel operator, and its formula is:

Edge(Picture)＝Canny(Picture)∪Sobel(Picture) (5)Edge(Picture)＝Canny(Picture)∪Sobel(Picture) (5)

边缘对比度特征是最重要的特征。提出这个特征是基于非常通用的视角，不考虑复杂背景和非均匀光照，文本连通分量通常被其边缘响应“高度包围”。因此，我们使用等式(4)来测量一个连通分量的边缘包围程度。这个特征非常充分地利用了基于纹理检测算法的优势，而其对于大字符也有很强的响应。而且，此特征提供了一种独立于图像的衡量每个连通分量边缘对比度的方法。The edge contrast feature is the most important feature. This feature is proposed based on a very general perspective, regardless of complex backgrounds and non-uniform lighting, text connected components are usually "height surrounded" by their edge responses. Therefore, we use Equation (4) to measure the degree of edge envelopment of a connected component. This feature makes full use of the advantages of texture-based detection algorithms, and it also has a strong response to large characters. Moreover, this feature provides an image-independent measure of the edge contrast of each connected component.

3.形状正规化特征3. Shape regularization features

文本连通分量往往比自然场景中的噪声连通分量拥有更多的正规化形状。基于这种观点，我们提出了4个特征：空洞数、轮廓粗糙度、紧致度和占空比。我们发现文本连通分量在空洞数和轮廓粗糙度上具有较小的值，在紧致度和占空比上具有较大的值；而非文本的连通分量则恰恰相反。这些特征用于抑制具有不规则形状但却拥有较强纹理响应的连通分量。Text connected components tend to have more regularized shapes than noisy connected components in natural scenes. Based on this viewpoint, we propose 4 features: number of voids, profile roughness, compactness, and duty cycle. We find that text connected components have small values for hole number and contour roughness, and large values for compactness and duty cycle; the opposite is true for non-text connected components. These features are used to suppress connected components with irregular shapes but strong texture responses.

$Featuer Featuer__ContourRoughness Contour Roughness = = \frac{| | CC CC - - open open ((imfill imfill ((CC CC)),, 22 \times \times 22)) | |}{| | CC CC | |} - - - - - - ((66))$

Feature_CCHoles＝|imholes(CC)| (7)Feature_CCHoles＝|imholes(CC)| (7)

$Feature Features__Compact Compact = = \frac{Area area ((CC CC))}{{[[Perimeter Perimeter ((CC CC))]]}^{22}} - - - - - - ((88))$

$Feature Features__OccupyRatio OccupyRatio = = \frac{Area area ((CC CC))}{Area area ((BoundingBox BoundingBox ((CC CC))))} - - - - - - ((99))$

上述公式中，imfill(CC)是填补连通分量内部洞的操作，2×2是形态学开运算的结构元素(structure element)，形态学开运算(open)是对连通分量进行平滑的操作。In the above formula, imfill (CC) is the operation of filling the internal holes of the connected components, 2×2 is the structure element of the morphological opening operation, and the morphological opening operation (open) is a smoothing operation on the connected components.

4.笔画统计特征4. Statistical characteristics of strokes

字符是由笔画组成的，因此我们提出2个计算相对复杂的特征，来揭示连通分量的笔画统计信息。这两个特征其实是在字符笔画的方面检查连通分量的“非规则性”。Characters are composed of strokes, so we propose two computationally complex features to reveal the stroke statistics of connected components. These two features actually check the "irregularity" of connected components in terms of character strokes.

第一个特征是平均笔画宽度MeanStrokeWidth，我们基于这样一种观点：字符的笔画宽度通常都比较小：The first feature is the average stroke width MeanStrokeWidth. We are based on the idea that the stroke width of characters is usually relatively small:

Feature_Stroke_Mean＝Mean(strokeWidth(skeleton(CC))) (10)Feature_Stroke_Mean＝Mean(strokeWidth(skeleton(CC))) (10)

第二个特征是归一化的笔画宽度标准差，我们基于这样的观点：同个字符的笔画往往具有相似的宽度，在笔画宽度标准差特征上具有非常大的值的连通分量，更可能是噪声：The second feature is the normalized stroke width standard deviation, we are based on the idea that strokes of the same character tend to have similar widths, and connected components with very large values on the stroke width standard deviation feature are more likely to be noise:

$Featuer Featuer__Stroke Stroke__Std Std = = \frac{Deviation Deviation ((strokeWidth strokeWidth ((skeleton skeleton ((CC CC))))))}{Mean mean ((strokeWidth strokeWidth ((skeleton skeleton ((CC CC))))))} - - - - - - ((1111))$

上述公式中，skeleton为形态学骨架算法，将连通分量抽骨架而得到骨架图，strokeWidth为对于所述骨架图上每一点求出的笔划宽度，Mean为对于所述骨架图上所有点求平均值，得到平均宽度。In the above formula, skeleton is a morphological skeleton algorithm, which extracts the skeleton of connected components to obtain a skeleton diagram, strokeWidth is the stroke width obtained for each point on the skeleton diagram, and Mean is the average value of all points on the skeleton diagram , to get the average width.

5.空间一致性特征5. Spatial Consistency Features

最后两个空间一致性特征探索了空间一致性信息，来滤除非文本连通分量。噪声往往具有较小的空间规则性和聚合性，于是我们提出了这两个特征。空间一致性特征包括空间一致性面积比率(Spatial Coherence Area Ratio)和空间一致性边界特征(Spatial CoherenceBoundary Touching)，其中，The last two spatial consistency features explore spatial consistency information to filter out non-text connected components. Noise tends to have less spatial regularity and aggregation, so we propose these two features. Spatial coherence features include spatial coherence area ratio (Spatial Coherence Area Ratio) and spatial coherence boundary features (Spatial CoherenceBoundary Touching), where,

$Feature Features__AreaRatio AreaRatio__S S = = \frac{Area area ((imdilate imdilate ((CC CC,, 55 \times \times 55))))}{Area area ((Picture picture))} - - - - - - ((1212))$

Feature_Boundary_S＝Bound(imdilate(CC，5×5)) (13)Feature_Boundary_S＝Bound(imdilate(CC, 5×5)) (13)

上述公式中，imdilate是对连通分量进行膨胀的操作，5×5为膨胀操作的结构元素(structureelement)。In the above formula, imdilate is the operation of expanding the connected components, and 5×5 is the structure element of the expansion operation.

在很多显然非文本的连通分量已经被排除的情况下，在每一层中(Niblack具有黑和白两个颜色层)，如果经过一个小结构元素的膨胀之后，某个连通分量扩展得非常厉害，那么它很有可能是空间相关的随机噪声。而文本连通分量则不会这样，由于字符串的结构本质，字符间往往具有一点的间距，膨胀之后不会互相粘连而扩展为一个很大的连通分量。通过使用空间一致性滤波器，我们可以有效降低图像噪声。In the case where many obviously non-text connected components have been excluded, in each layer (Niblack has two color layers of black and white), if after a small structural element is expanded, a certain connected component expands very strongly , then it is likely to be spatially correlated random noise. This is not the case for text connected components. Due to the structural nature of strings, there is often a little space between characters. After expansion, they will not stick to each other and expand into a large connected component. By using a spatially consistent filter, we can effectively reduce image noise.

提出了上述12个能够有效地区分文本或非文本连通分量的特征后，将12个特征对应级联分类器组中的每个分类器，并且对该级联分类器组进行训练。我们的训练方法要解决两个问题，一，以什么顺序来排列这些特征；二，在每个特征上的阈值应该为多少。其优点在于使级联分类器组能以先弱后强的方式级联，既保证了获取图像的精度，又提高了图像处理速度。图1是根据本发明第一实施例的确定一组不同特征的级联分类器的连接顺序和特征阈值的方法的流程图。After the above 12 features that can effectively distinguish text or non-text connected components are proposed, the 12 features are corresponding to each classifier in the cascade classifier group, and the cascade classifier group is trained. Our training method needs to solve two problems, first, in what order to arrange these features; second, what should be the threshold value on each feature. The advantage is that the cascaded classifier group can be cascaded in a weak first and then strong manner, which not only ensures the accuracy of image acquisition, but also improves the image processing speed. FIG. 1 is a flow chart of a method for determining the connection sequence and feature threshold of a cascade classifier of a set of different features according to the first embodiment of the present invention.

要进行训练，首先要确定训练样例(步骤110)。例如，我们可以从图片库中随机选取200幅图片，将这200幅图片分解成多个连通分量作为训练样例(将原始图片分解为连通分量的方法将在下文详细描述)。该训练样率包括正例集合P和反例集合N。正例是我们手动标注为文本的连通分量，反例是我们手动标注为非文本的连通分量。To perform training, firstly a training example is determined (step 110). For example, we can randomly select 200 pictures from the picture library, and decompose these 200 pictures into multiple connected components as training samples (the method of decomposing the original picture into connected components will be described in detail below). The training sample rate includes positive example set P and negative example set N. Positive examples are connected components that we manually label as text, and negative examples are connected components that we manually label as non-text.

对于每一个训练样例(即一个连通分量)，它有两个布尔值：一个是标注真值(GroundTruth)，也就是此样本是否为文本，true为文本，false为非文本；另一个是分类器输出值，也就是分类器认为此样本是否为文本，输出positive为文本，negative为非文本。按此意义，虚警率false-positive表示被分类器误认为是文本的非文本样例与所有非文本样例之比；检测率detection rate实际就是true-positive，表示被分类器正确认为是文本的文本样例与所有文本样例之比；false-rejection就是false-negative，表示被分类器正确认为不是文本的非文本样例与所有非文本样例之比。For each training sample (that is, a connected component), it has two Boolean values: one is the groundtruth value (GroundTruth), that is, whether this sample is text, true is text, false is non-text; the other is classification The output value of the classifier, that is, whether the classifier considers this sample to be text, the output positive is text, and the negative is non-text. In this sense, the false positive rate indicates the ratio of non-text samples that are mistaken for text by the classifier to all non-text samples; the detection rate is actually true-positive, which means that the classifier correctly considers it to be text The ratio of text samples to all text samples; false-rejection is false-negative, indicating the ratio of non-text samples that are correctly considered not to be text by the classifier to all non-text samples.

P作为正例集合，在整个训练过程中没有变化，因为我们期望每个正例(文本连通分量)能够通过所有的分类器，也就是说每个分类器必须“认识”这些正例，即要学习它们。而对于反例集合N，由于每个分类器都会“拦截”一部分反例，对于级联中的每个分类器而言，它们看到的反例是不同的。第一个分类器看到所有的反例，第二个只能看到被第一个错分为文本的那些反例...从后一个分类器的角度来说，它只需要关注前面那些分类器没有能够正确区分的问题，就是说它要处理的反例仅仅是通过前面所有分类器的非文本连通分量。所以，我们需要在训练的每次循环中改变反例集合N。As a set of positive examples, P does not change during the entire training process, because we expect each positive example (text connected component) to pass through all classifiers, that is to say, each classifier must "know" these positive examples, that is, to Learn them. For the set of negative examples N, since each classifier will "intercept" a part of the negative examples, for each classifier in the cascade, the negative examples they see are different. The first classifier sees all negative examples, the second only sees those negative examples that were misclassified as text by the first one... From the perspective of the latter classifier, it only needs to pay attention to the previous classifiers There is no problem that can be correctly distinguished, which means that the counterexamples it has to deal with are only non-text connected components that pass through all previous classifiers. So, we need to change the set of negative examples N in each iteration of training.

如上提到的，接下来详细描述将原始图片分解为连通分量的方法。As mentioned above, the method of decomposing the original picture into connected components is described in detail next.

众所周知，将图像分解为连通分量是基于连通分量方法中非常关键的一步。如果分解步骤取得的结果很差，那么整个算法的性能就会急剧下降。现有的分解方法，主要追求有效性和鲁棒性。As we all know, decomposing an image into connected components is a very critical step in connected component-based methods. If the decomposition step achieves poor results, the performance of the entire algorithm will drop dramatically. Existing decomposition methods mainly pursue efficiency and robustness.

本实施例使用了一种新的将图像分解为连通分量的分解方法，包括两个步骤：首先用非线性Niblack阈值化方法处理原始图像；再将处理后的图像分解为连通分量。This embodiment uses a new decomposition method for decomposing an image into connected components, which includes two steps: first, the original image is processed with a nonlinear Niblack thresholding method; and then the processed image is decomposed into connected components.

Niblack方法的关键在于：它认为人们所关心的那些文本像素点，其强度会和其邻域平均强度有一定的差距，这个差距大于其邻域强度标准差的k倍。其原被用于对图像进行二值化处理。本实施例中，我们用该方法先对图像进行处理，然后再将处理后的图像分解为候选的连通分量，这样能在现有的有效性和鲁棒性的基础上还得到高效性和实现的低复杂性。The key to the Niblack method is that it believes that the intensity of those text pixels that people care about will have a certain gap with the average intensity of its neighborhood, and this gap is greater than k times the standard deviation of the intensity of its neighborhood. It was originally used to binarize images. In this embodiment, we use this method to process the image first, and then decompose the processed image into candidate connected components, which can achieve high efficiency and implementation on the basis of existing effectiveness and robustness. low complexity.

其中，非线性Niblack阈值化方法在标准Niblack方法的背景滤波器和前景滤波器中各增加了一个统计顺序滤波器。非线性Niblack阈值化方法的公式为：Among them, the nonlinear Niblack thresholding method adds a statistical order filter to the background filter and the foreground filter of the standard Niblack method. The formula for the nonlinear Niblack thresholding method is:

$NLNiblack NLNiblack ((x x,, y the y)) = = \{\begin{matrix} 11,, f f ((x x,, y the y)) < < {T T}_{+ +} ((x x,, y the y)) \\ - - 11,, f f ((x x,, y the y)) < < {T T}_{- -} ((x x,, y the y)) \\ \begin{matrix} 00,, & other other \end{matrix} \end{matrix} - - - - - - ((1414))$

${T T}_{&PlusMinus; &PlusMinus;} ((x x,, y the y)) = = {\overset{^^}{μ μ}}_{p p 11} ((x x,, y the y,, {W W}_{B B})) &PlusMinus; &PlusMinus; k k \cdot &Center Dot; {\overset{^^}{σ σ}}_{p p 22} ((x x,, y the y,, {W W}_{F f}))$

${\overset{^^}{μ μ}}_{p p 11} = = Order order [[Mean mean ((f f ((x x,, y the y)),, {W W}_{B B})),, p p 11,, {W W}_{B B}]]$

${\overset{^^}{σ σ}}_{p p 22} = = Order order [[Deviation Deviation ((f f ((x x,, y the y)),, {W W}_{F f})),, p p 22,, {W W}_{F f}]]$

其中：k是根据标准Niblack方法的经验值，被设为0.17-0.19之间的数值，较佳地，本实施例中被设为0.18。f(x，y)是输入图像的(x，y)位置处的像素点强度，Mean(，W)是窗口宽度为W的均值滤波器，Deviate(，W)是窗口宽度为W的标准差滤波器，Order[，p，W]是以p为百分比，W为宽度的顺序统计滤波器。Where: k is an empirical value according to the standard Niblack method, and is set to a value between 0.17-0.19, preferably, it is set to 0.18 in this embodiment. f(x, y) is the pixel intensity at the (x, y) position of the input image, Mean(, W) is the mean filter with a window width of W, and Deviate(, W) is the standard deviation with a window width of W Filter, Order[, p, W] is an order statistical filter with p as the percentage and W as the width.

本实施例中，在背景滤波器

中，滤波器宽度W_B设为原始图像宽度的1/16，百分比p1设为50％。这是因为大的中值滤波器可以在提取背景对象的同时不排除它们的高频分量。这个背景滤波器可以应付自然场景中的非均匀光照情况。In this embodiment, in the background filter

, the filter width W _B is set to 1/16 of the original image width, and the percentage p1 is set to 50%. This is because a large median filter can extract background objects without excluding their high-frequency components. This background filter can cope with non-uniform lighting in natural scenes.

在前景滤波器

中，宽度W_F是W_B的1/5，p2设为80％。对于具有较大方差的小块区域，这个高百分比的滤波器可以有效地将其影响传播到邻近的区域，同时能有效地抑制局部噪声。in the foreground filter

, the width W _F is 1/5 of W _B , and p2 is set to 80%. For small patch regions with large variance, this high percentage of filters can effectively propagate its influence to neighboring regions while effectively suppressing local noise.

当然，上述的滤波器宽度和百分比都可以根据实际需要进行调整。Of course, the above filter width and percentage can be adjusted according to actual needs.

另外，值得一提的是，上述图像分解步骤也可以不用非线性Niblack方法处理图像，而用现有的将图像分解为连通分量的技术，同样也能达到本发明的目的，但是由于用现有技术获得的连通分量质量差一些，因而使得本方法的总体效果也会有所下降。In addition, it is worth mentioning that the above-mentioned image decomposition step can also not use the nonlinear Niblack method to process the image, but use the existing technology of decomposing the image into connected components, which can also achieve the purpose of the present invention, but because the existing The quality of the connected components obtained by the technology is poor, so the overall effect of this method will also be reduced.

接下来，进行设定和初始化操作(步骤120)。Next, setup and initialization operations are performed (step 120).

设定该级联分类器组(h₁，h₂，...h₁₂)的系统总体目标检测率D_target＝0.95；并手动输入该目标检测率。Set the overall system target detection rate D _target of the cascaded classifier group (h ₁ , h ₂ , ... h ₁₂ ) = 0.95; and manually input the target detection rate.

初始化变量：设置总体检测率D₀＝1.0，反例集合N₁＝N，循环次数i＝0，i的范围为0<i≤M，即0<i≤12，以及初始化特征集合，该特征集合包含12个特征(F₁，F₂，...F₁₂)。分类器与特征是一一对应。Initialize variables: set the overall detection rate D ₀ =1.0, the set of negative examples N ₁ =N, the number of cycles i=0, the range of i is 0<i≤M, that is, 0<i≤12, and initialize the feature set, the feature set Contains 12 features (F ₁ , F ₂ , . . . F ₁₂ ). There is a one-to-one correspondence between classifiers and features.

令循环次数i＝i+1(步骤130)。Let the number of loops i=i+1 (step 130).

判断i是否大于M(步骤140)。如果i不大于M，则进行i次循环计算中的一次。例如，i＝1，那么就进行第一次循环计算。下面以第一次循环计算为例进行详细说明。It is judged whether i is greater than M (step 140). If i is not greater than M, perform one of i loop calculations. For example, if i=1, then the first cycle calculation is performed. The following takes the first cycle calculation as an example to describe in detail.

将该正例集合P以及当前反例集合N₁中的样例并行送入每个级联分类器中进行训练(步骤150)。每个分类器都计算所有样例的特征值。例如，如果第一个分类器对应的特征为几何特征“面积比率”，那么就计算所有样例的面积比率，即样例连通分量包围盒的面积和整个图片的面积之比。The samples in the positive example set P and the current negative example set _N1 are sent to each cascaded classifier in parallel for training (step 150). Each classifier computes feature values for all examples. For example, if the feature corresponding to the first classifier is the geometric feature "area ratio", then the area ratio of all samples is calculated, that is, the ratio of the area of the bounding box of the connected component of the sample to the area of the entire image.

得到所有样例的特征值后，以特征值为横坐标，连通分量的数量为纵坐标，形成正例P和反例N₁的特征值分布图。After obtaining the eigenvalues of all samples, take the eigenvalues as the abscissa and the number of connected components as the ordinate to form the eigenvalue distribution diagram of the positive example P and the negative example N ₁ .

针对每个特征，设一初始值为(-∞，+∞)的阈值区间，如果一个样例的特征值在该阈值区间之外，则该样例被该特征对应的级联分类器判为非文本的连通分量；如果一个样例的特征值在该阈值区间之外，则该样例被该特征对应的级联分类器判为文本的连通分量。For each feature, set a threshold interval with an initial value of (-∞, +∞). If the feature value of a sample is outside the threshold interval, the sample will be judged as Connected components of non-text; if the feature value of a sample is outside the threshold interval, the sample is judged as a connected component of text by the cascade classifier corresponding to the feature.

在该阈值区间(-∞，+∞)时，所有的样例都符合该阈值区间，因此，每个分类器的检测率d为1，虚警率f也为1。针对每个特征，将该阈值区间不断缩小，使得越来越多样例的特征值不符合该阈值区间，正例和反例被不断判为非文本连通分量，每个级联分类器的检测率d_1j和虚警率f_1j不断下降，当第1次循环训练的某个分类器的检测率d_1i下降到不小于前次循环后的总体检测率D_i-1时，停止缩小所述阈值区间。这里D_i-1＝D₀＝1.0。由于实际计算时分布的离散性，d_1i不可能降到等于D₀，只会稍微大一些。In this threshold interval (-∞, +∞), all samples conform to this threshold interval, therefore, the detection rate d of each classifier is 1, and the false alarm rate f is also 1. For each feature, the threshold interval is continuously reduced, so that the feature values of more and more samples do not meet the threshold interval, positive examples and negative examples are continuously judged as non-text connected components, and the detection rate of each cascade classifier d _1j and the false alarm rate f _1j continue to decrease, when the detection rate d _1i of a certain classifier in the first cycle training drops to not less than the overall detection rate D _i-1 after the previous cycle, stop narrowing the threshold interval . Here D _i-1 =D ₀ =1.0. Due to the discrete nature of the distribution during actual calculation, it is impossible for d _1i to be reduced to be equal to D ₀ , only slightly larger.

在此阈值区间时，计算得到每个级联分类器的检测率d_1j，虚警率f_1j以及正确丢弃非文本连通分量的概率FR_1j，其中，FR_1j＝1-f_1j，为一个级联分类器正确丢弃非文本连通分量的数量与当前反例数量之比。At this threshold interval, the detection rate d _1j , the false alarm rate f _1j and the probability FR _1j of correctly discarding non-text connected components of each cascaded classifier are calculated, where FR _1j =1-f _1j is a level The ratio of the number of non-text connected components correctly discarded by the joint classifier to the number of current negative examples.

在当前的特征集合中，即12个特征中，选取最大虚警率f_1j对应的特征feature_k(步骤160)。该被选取的特征feature_k为第一个特征，其对应的分类器即为该级联分类器组的第一个分类器。In the current feature set, that is, among the 12 features, select the feature feature _k corresponding to the maximum false alarm rate f _1j (step 160). The selected feature feature _k is the first feature, and its corresponding classifier is the first classifier of the cascaded classifier group.

选取最大虚警率对应的特征，是因为通过上述的一轮计算可以看出，在同等的条件下，最大虚警率对应的特征最多地将非文本样例认为是文本样例，那么该特征就被认为是最无效的特征，其分类能力最差，因此要将它放在级联分类器组的最前面，依此类推，以使得用该方法级联起来的分类器组具有先弱后强的级联方式。The feature corresponding to the maximum false alarm rate is selected because it can be seen from the above-mentioned round of calculations that under the same conditions, the feature corresponding to the maximum false alarm rate most considers non-text samples as text samples, then the feature It is considered to be the least effective feature, and its classification ability is the worst, so it should be placed at the top of the cascaded classifier group, and so on, so that the classifier group cascaded by this method has the weakest Strong cascading approach.

接下来，计算该被选取的特征feature_k的质量比例以及其被允许的最小检测率(步骤170)。Next, calculate the quality ratio of the selected feature feature _k and its allowed minimum detection rate (step 170).

被选取的特征feature_k的质量比例γ＝FR_k/∑FR_1j，其中，FR_k为第1次循环训练中被选取的特征feature_k对应的级联分类器正确丢弃非文本连通分量的概率，相当于该分类器的质量，该值由步骤160中得到；∑FR_1j表示第1次循环训练中所有特征features对应的级联分类器正确丢弃非文本连通分量的概率之和。两者之比即为该被选取的特征对应的级联分类器的质量比例，用于衡量该特征区分文本连通分量和非文本连通分量的能力强弱。The quality ratio of the selected feature feature _k γ=FR _k /∑FR _1j , where FR _k is the probability that the cascaded classifier corresponding to the selected feature feature _k in the first round of training correctly discards non-text connected components, Equivalent to the quality of the classifier, this value is obtained in step 160; ∑FR _1j represents the sum of the probability that the cascaded classifiers corresponding to all features in the first round of training correctly discard non-text connected components. The ratio between the two is the quality ratio of the cascade classifier corresponding to the selected feature, which is used to measure the ability of the feature to distinguish text connected components from non-text connected components.

根据检测率分配公式d_i＝(D_target/D_i-1)^γ，计算该特征feature_k被允许的最小检测率d_min，D_i-1为前次循环训练后的总体检测率，i为循环次数。由于是第一次循环训练，这里D_i-1＝D₀＝1.0，d_min＝(D_target)^γ。According to the detection rate distribution formula d _i =(D _target /D _i-1 ) ^γ , calculate the allowed minimum detection rate d _min of feature _k , where D _i-1 is the overall detection rate after the previous cycle training, and i is Cycles. Since it is the first cycle training, here D _i-1 =D ₀ =1.0, d _min =(D _target ) ^γ .

下面具体描述该检测率分配公式是如何得到的。How the detection rate distribution formula is obtained is described in detail below.

假设我们将把一些连通分量串行送入一组M个不同特征的级联分类器中，一级级地进行分类，如果任何一个分类器认为一个连通分量是非文本连通分量，即将其去除，如果认为是文本连通分量，即输出给下一级分类器再次进行分类。这样，我们很容易得到如下关系：Assume that we will serially send some connected components into a set of cascade classifiers with M different features, and classify them level by level. If any classifier thinks that a connected component is a non-text connected component, it will be removed. If It is considered as a text connected component, that is, it is output to the next classifier for classification again. In this way, we can easily obtain the following relationship:

$F = Π_{i = 1}^{M} f_{i}$ $D = Π_{i = 1}^{M} d_{i} - - - (15)$ $f = Π_{i = 1}^{m} f_{i}$ $D. = Π_{i = 1}^{m} d_{i} - - - (15)$

对于M个分类器中的每个都有一个检测率d_i，对于这个d_i有一个虚警率f_i，为了简化表达，我们把d_i组成一个向量{d₁，d₂，..d_M}，此时总体检测率为 $D = Π_{i = 1}^{M} d_{i},$ 总体虚警率为 $F = Π_{i = 1}^{M} f_{i} .$ 如果对于这M个分类器我们设定另一组检测率

则对应的虚警率为

总体检测率

D' = Π_{i = 1}^{M} d_{i}',

总体虚警率为

F' = Π_{i = 1}^{M} {f'}_{i} .

在D＝D′的情况下，未必有F＝F′。我们的目的是，在总体检测率D＝D_target的情况中，选择具有最小虚警率F的那组检测率向量。那么如何在D固定的情况下，最小化F呢？For each of the M classifiers, there is a detection rate d _i , and for this d _i there is a false alarm rate f _i , in order to simplify the expression, we form d _i into a vector {d ₁ , d ₂ , ..d _M }, the overall detection rate at this time

D. = Π_{i = 1}^{m} d_{i},

The overall false alarm rate

f = Π_{i = 1}^{m} f_{i} .

If for these M classifiers we set another set of detection rates

Then the corresponding false alarm rate is

overall detection rate

D.' = Π_{i = 1}^{m} d_{i}',

The overall false alarm rate

f' = Π_{i = 1}^{m} {f'}_{i} .

In the case of D=D', it is not necessary that F=F'. Our purpose is to select the set of detection rate vectors with the smallest false alarm rate F in the case of the overall detection rate D=D _target . So how to minimize F when D is fixed?

通过对等式(15)基本形式的对数转换，我们发现总体检测率线性地分配给所有的分类器：By log-transforming the basic form of Equation (15), we find that the overall detection rate is distributed linearly across all classifiers:

$\log (F) = Σ_{i = 1}^{M} \log (f_{i})$ $\log (D) = Σ_{i = 1}^{M} \log (d_{i}) - - - (16)$ $\log (f) = Σ_{i = 1}^{m} \log (f_{i})$ $\log (D.) = Σ_{i = 1}^{m} \log (d_{i}) - - - (16)$

假设总体检测率D根据分类器的“质量”线性地分配给所有分类器，第i个分类器的“质量”为Q，所有分类器质量之和为 $Q = Σ_{i = 1}^{M} Q_{i},$ 第i个分类器的质量比例γ_i定义为：Assuming that the overall detection rate D is linearly distributed to all classifiers according to the "quality" of the classifier, the "quality" of the i-th classifier is Q, and the sum of the quality of all classifiers is $Q = Σ_{i = 1}^{m} Q_{i},$ The quality ratio _γi of the i-th classifier is defined as:

${γ γ}_{i i} = = \frac{{Q Q}_{i i}}{{Σ Σ}_{j j = = 11}^{M m} {Q Q}_{j j}} - - - - - - ((1717))$

令D为总体检测率，我们可以将分配公式表达如下，第i个分类器分配到的检测率d_i为：Let D be the overall detection rate, we can express the distribution formula as follows, the detection rate d _i assigned to the i-th classifier is:

${d d}_{i i} = = {((D D.))}^{{γ γ}_{i i}} - - - - - - ((1818))$

由等式1)我们有：From equation 1) we have:

$D D. = = {Π Π}_{i i = = 11}^{M m} {d d}_{i i} = = {Π Π}_{i i = = 11}^{M m} {((D D.))}^{{γ γ}_{i i}} = = {D D.}^{{Σ Σ}_{i i = = 11}^{M m} {γ γ}_{i i}} = = {D D.}^{{Σ Σ}_{i i = = 11}^{M m} \frac{{Q Q}_{i i}}{Q Q}} = = D D. - - - - - - ((1919))$

这说明我的分配算法首先在数值上是正确的。This shows that my allocation algorithm is numerically correct in the first place.

因为D∈[0，1]，其指数函数是一个单调递减函数。一个分类器的“质量”越好，γ越大，分配到的检测率d越小。因为“质量”好意味着这个分类器能够最有效地排除非文本，所以我们允许它的检测率d小一些，让它可以有更多的空间去排除非文本连通分量。降低检测率代表了设置更为严格的条件，这样就可以排除更多的非文本连通分量。分类器的“质量”可以由正确排除非文本的概率来衡量。Because D∈[0,1], its exponential function is a monotonically decreasing function. The better the "quality" of a classifier and the larger γ, the smaller the assigned detection rate d. Because "quality" means that this classifier can most effectively exclude non-text, so we allow its detection rate d to be smaller, so that it can have more space to exclude non-text connected components. Decreasing the detection rate means setting more stringent conditions, so that more non-text connected components can be excluded. The "quality" of a classifier can be measured by the probability of correctly excluding non-text.

在得到选取的特征featurek被允许的最小检测率后，将所有正例集合P以及当前反例集合N₁中的样例送入选取的特征对应的级联分类器h_k中进行训练(步骤180)。After obtaining the minimum detection rate allowed by the selected feature featurek, send all samples in the positive example set P and the current negative example set _N1 to the cascaded classifier h _k corresponding to the selected feature for training (step 180) .

该分类器计算所有样例的特征值。例如，如果该特征是长度比率，则计算所有样例的长度比率，计算公式参照上文的描述。This classifier computes feature values for all examples. For example, if the feature is a length ratio, the length ratios of all samples are calculated, and the calculation formula refers to the above description.

设一初始值为(-∞，+∞)的阈值区间，当一个样例的特征值在该阈值区间之外，则该样例被级联分类器h_k判为非文本连通分量。Set a threshold interval with an initial value of (-∞, +∞). When the feature value of a sample is outside the threshold interval, the sample is judged as a non-text connected component by the cascade classifier h _k .

将该阈值区间不断缩小，使得正例和反例被不断判为非文本连通分量，级联分类器h_k的检测率d_k和虚警率f_k不断下降，当d_k下降到不小于步骤180中获得的被允许的最小检测率d_min时，停止缩小所述阈值区间；此时的阈值区间即为该选取的特征feature_k的阈值区间。The threshold range is continuously narrowed, so that positive examples and negative examples are continuously judged as non-text connected components, and the detection rate d _k and false alarm rate f _k of the cascade classifier h _k continue to decrease. When d _k drops to no less than step 180 When the allowed minimum detection rate d _min obtained in , stop narrowing the threshold interval; the threshold interval at this time is the threshold interval of the selected feature _k .

到目前为止，选取特征以及确定特征阈值区间的工作都已完毕。So far, the work of selecting features and determining feature threshold intervals has been completed.

接下来要更新变量，以用于下一次循环训练(步骤190)。Next, the variables are to be updated for the next cycle training (step 190).

删除上述被选取的特征和该特征的分类器以更新当前特征集合和当前各个特征的分类器。将步骤180中获得特征的阈值区间时被级联分类器误认为文本连通分量的非文本连通分量作为新的反例集合N_i+1，正例集合P保持不变，从而更新当前样例。再更新当前总体检测率D_i＝D_i-1*d_min，用于下次循环训练。Delete the selected feature and the classifier of the feature to update the current feature set and the current classifier of each feature. The non-text connected component that was mistaken for the text connected component by the cascade classifier when the threshold interval of the feature was obtained in step 180 is used as a new negative example set N _i+1 , and the positive example set P remains unchanged, thereby updating the current example. Then update the current overall detection rate D _i =D _i-1 *d _min for the next cycle training.

接下来的循环计算与上述第一次的完全相同，每次选出一个特征并获得该特征的阈值区间。每次选出的特征对应的分类器序号即为该次循环的次数i。直到i大于M，则结束循环计算。The next cycle calculation is exactly the same as the first one above, each time a feature is selected and the threshold interval of the feature is obtained. The serial number of the classifier corresponding to each selected feature is the number i of the cycle. Until i is greater than M, the loop calculation ends.

按上述方法确定的连接顺序级联起来的级联分类器组可以快速地排除非文本连通分量，而将更多的时间花费在计算可能是文本的连通分量上。The cascaded classifier group cascaded according to the connection order determined by the above method can quickly exclude non-text connected components, and spend more time on computing connected components that may be text.

本实施例中提出的特征与文本图像相关，可以有效地区分文本或者非文本连通分量，因此，该组特征对应的级联分类器组可以从候选连通分量中获取文本连通分量，从而，通过组合文本连通分量，获得我们需要的文本图像。但是，本领域的技术人员应该熟知，如果提出的特征与其它要选定的内容相关，该内容可以是我们希望从原始图像中获取的任何内容，那么与该组特征对应的级联分类器组可以从候选连通分量中获取要选定的连通分量，从而组合形成我们要选定的图像，而并不限于文本图像。因此，由本实施例中的方法确定的级联分类器组可以根据与要选定的内容相关的特征，获取要选定的连通分量。The features proposed in this embodiment are related to text images, and can effectively distinguish text or non-text connected components. Therefore, the cascaded classifier group corresponding to this set of features can obtain text connected components from candidate connected components, thus, by combining Text connected components to obtain the text image we need. However, those skilled in the art should know that if the proposed features are related to other content to be selected, which can be any content we want to obtain from the original image, then the set of cascaded classifiers corresponding to this set of features The connected components to be selected can be obtained from the candidate connected components, so as to form the image we want to select, not limited to the text image. Therefore, the cascaded classifier group determined by the method in this embodiment can obtain the connected components to be selected according to the features related to the content to be selected.

图2是根据本发明第二实施例的从图像中获取文本图像的方法的流程图。Fig. 2 is a flow chart of a method for acquiring a text image from an image according to a second embodiment of the present invention.

首先，将原始图像分解为多个候选连通分量(步骤210)。这里的原始图像可以是自然场景图像。该步骤中可以先用非线性Niblack阈值化方法处理该原始图像；然后再将处理后的图像分解为多个连通分量。这里用非线性Niblack阈值化方法处理该原始图像的方法与第一实施例中的处理方法是相同的，此处不再赘述。用非线性Niblack阈值化方法可以快速而鲁棒地获取候选连通分量。First, the original image is decomposed into multiple candidate connected components (step 210). The original image here can be a natural scene image. In this step, the original image can be processed with a nonlinear Niblack thresholding method; and then the processed image can be decomposed into multiple connected components. Here, the method of processing the original image with the nonlinear Niblack thresholding method is the same as that in the first embodiment, and will not be repeated here. Candidate connected components can be obtained quickly and robustly with nonlinear Niblack thresholding method.

其次，将多个候选连通分量送入根据第一实施例的方法所级联起来的一组不同特征的级联分类器的第一级，该特征与文本图像相关，每一个级联分类器丢弃非文本连通分量，并向下一级分类器输出文本连通分量(步骤220)。该级联分类器组的连接顺序和特征阈值按照第一实施例的方法确定。Secondly, a plurality of candidate connected components are sent to the first stage of a cascade classifier of a group of different features cascaded according to the method of the first embodiment, the feature is related to the text image, and each cascade classifier discards non-text connected components, and output text connected components to the next classifier (step 220). The connection sequence and feature threshold of the cascade classifier group are determined according to the method of the first embodiment.

具体地，在多个候选连通分量输入级联分类器组的第一级后，第一分类器根据自己对应的特征，计算接收到的所有连通分量的特征值。将所有连通分量的特征值分别与该特征的阈值区间进行比较；最后将特征值在该阈值区间之外的连通分量作为非文本连通分量丢弃；将特征值在该阈值区间内的连通分量作为文本连通分量输出给第二级分类器。也就是说，被第一分类器拒绝的连通分量，将不再被输入第二分类器，不需要在对其进行进一步的计算和判断了，因此，可以节约大量的计算时间。Specifically, after multiple candidate connected components are input to the first stage of the cascaded classifier group, the first classifier calculates the feature values of all received connected components according to its corresponding features. Compare the eigenvalues of all connected components with the threshold interval of the feature; finally discard the connected components whose eigenvalues are outside the threshold interval as non-text connected components; use the connected components whose eigenvalues are within the threshold interval as text The connected components are output to the second-level classifier. That is to say, the connected components rejected by the first classifier will no longer be input into the second classifier, and further calculation and judgment are not required, therefore, a lot of calculation time can be saved.

第二个分类器接收到第一个分类器输出的连通分量后，再进行相同的计算和分类工作，依此类推，直到最后一个分类器丢弃非文本连通分量，输出文本连通分量。After the second classifier receives the connected components output by the first classifier, it performs the same calculation and classification work, and so on, until the last classifier discards the non-text connected components and outputs the text connected components.

可选地，上述级联分类器组输出的文本连通分量还可以再输入一个强分类器(步骤230)。该强分类器为由标准Adaboost方法进行训练的分类器，该强分类器的特征与前述级联分类器组的特征相同。该强分类器对前述级联分类器组输出的每个连通分量的所有特征值进行线性组合并判断该连通分量是否为文本连通分量，从而丢弃非文本连通分量，输出文本连通分量。由于每个连通分量的所有特征值在级联分类器组中都已计算过了，因此在这个强分类器中只要进行线性组合，就能得到该连通分量的总的特征值。这样，可以花费较少的计算时间，进一步提高精度。Optionally, the text connected components output by the cascaded classifier group can be further input into a strong classifier (step 230). The strong classifier is a classifier trained by the standard Adaboost method, and the features of the strong classifier are the same as those of the aforementioned cascade classifier group. The strong classifier linearly combines all the eigenvalues of each connected component output by the cascaded classifier group and judges whether the connected component is a text connected component, thereby discarding the non-text connected components and outputting the text connected components. Since all the eigenvalues of each connected component have been calculated in the cascade classifier group, the total eigenvalues of the connected component can be obtained only by linear combination in this strong classifier. In this way, less computation time can be spent and the accuracy can be further improved.

当然，这里不使用强分类器，也能达到本发明的目的、加了强分类器，可以进一步提高最终形成图像的精度。Of course, the purpose of the present invention can be achieved without using a strong classifier here, and the accuracy of the final image can be further improved by adding a strong classifier.

最后，将步骤230中输出的文本连通分量组合形成文本图像(步骤240)。这样，我们就从原始图像中获得了我们需要的文本图像。Finally, the text connected components output in step 230 are combined to form a text image (step 240). This way, we get the text image we need from the original image.

在本实施例的方法中，由于使用了新的非线性Niblack方法来处理原始图像，可以高效地将灰度图像分解为多个候选连通分量，提高了连通分量的质量。另外，级联分类器组能够容易地去除大多数非文本连通分量，并快速关注认为可能是文本的连通分量。这样，降低了本方法的计算量，提高了图像处理速度，并能得到很高的检测率。In the method of this embodiment, since a new nonlinear Niblack method is used to process the original image, the grayscale image can be efficiently decomposed into multiple candidate connected components, and the quality of the connected components is improved. In addition, cascaded classifier groups are able to easily remove most non-text connected components and quickly focus on connected components that are considered likely to be text. In this way, the calculation amount of the method is reduced, the image processing speed is improved, and a high detection rate can be obtained.

本领域的技术人员应该熟知，虽然本实施例中级联分类器组的特征与文本图像相关，但是该特征也可以和其它要选定的内容相关，那么本实施例中的方法也可以用于从图像中获取要选定的任何图像，而不限于文本图像。Those skilled in the art should be well aware that although the features of the cascaded classifier group in this embodiment are related to text images, this feature can also be related to other selected content, so the method in this embodiment can also be used for Get any image to select from images, not limited to text images.

图3是根据本发明第三实施例的从图像中获取文本图像的装置图。装置300包括分解装置310，级联分类器组320，强分类器330以及图像合成装置340。Fig. 3 is a diagram of an apparatus for acquiring text images from images according to a third embodiment of the present invention. The device 300 includes a decomposition device 310 , a cascaded set of classifiers 320 , a strong classifier 330 and an image synthesis device 340 .

分解装置310用于将原始图像分解为多个连通分量。该分解装置310还包括处理装置312和图像分解装置316。处理装置312用非线性Niblack阈值化方法先处理原始图像，这里非线性Niblack阈值化方法与第一实施例相同。图像分解装置316将处理后的图像分解为多个连通分量。Decomposing means 310 is used to decompose the original image into multiple connected components. The decomposing device 310 also includes a processing device 312 and an image decomposing device 316 . The processing device 312 uses the nonlinear Niblack thresholding method to first process the original image, and the nonlinear Niblack thresholding method here is the same as that in the first embodiment. The image decomposition means 316 decomposes the processed image into a plurality of connected components.

级联分类器组320是根据第一实施例的方法级联起来的一组不同特征(F₁，F₂，...，F₁₂)的级联分类器(h₁，h₂，...h₁₂)，这些特征与文本图像相关。将该连通分量输入级联分类器组的第一级，每一个级联分类器丢弃非文本连通分量，并向下一级分类器输出文本连通分量。The cascaded classifier group 320 is a group of cascaded _classifiers ( _h ₁ , _{h 2} _, .. .h ₁₂ ), these features are related to text images. This connected component is input to the first stage of the set of cascaded classifiers, and each cascaded classifier discards the non-text connected component and outputs the text connected component to the next classifier.

每个分类器中还包括计算装置，比较装置和输出装置。计算装置，用于根据本分类器对应的特征，计算接收到的所有连通分量的特征值。比较装置，将所有连通分量的特征值分别与该特征的阈值区间进行比较。输出装置，将特征值在该阈值区间之外的连通分量作为非文本连通分量丢弃；将特征值在该阈值区间内的连通分量作为文本连通分量输出给下一级分类器。Computing means, comparing means and output means are also included in each classifier. The calculation device is used to calculate the received feature values of all the connected components according to the features corresponding to the classifier. The comparing means compares the eigenvalues of all the connected components with the threshold interval of the feature respectively. The output device discards the connected components whose feature values are outside the threshold interval as non-text connected components; outputs the connected components whose feature values are within the threshold interval as text connected components to the next-level classifier.

强分类器330，该强分类器为由标准Adaboost方法进行训练的分类器，该强分类器的特征与级联分类器组320的特征相同，即该强分类器的特征包含级联分类器组320的所有特征。强分类器330对级联分类器组320输出的连通分量的所有特征值进行线性组合，并判断该连通分量是否为文本连通分量，从而丢弃非文本连通分量，输出文本连通分量。Strong classifier 330, the strong classifier is a classifier trained by the standard Adaboost method, the feature of the strong classifier is the same as the feature of the cascade classifier group 320, that is, the feature of the strong classifier includes the cascade classifier group All the features of the 320. The strong classifier 330 linearly combines all the feature values of the connected components output by the cascade classifier group 320, and judges whether the connected components are text connected components, thereby discarding the non-text connected components and outputting the text connected components.

图像合成装置340，用于将强分类器330输出的文本连通分量组合形成要文本图像。Image synthesis means 340, configured to combine the text connected components output by the strong classifier 330 to form a desired text image.

本领域的技术人员应该熟知，虽然本实施例中级联分类器组320的特征与文本图像相关，但是该特征也可以和其它要选定的内容相关，那么本实施例中的装置也可以用于从图像中获取要选定的任何图像，而不限于文本图像。Those skilled in the art should be well aware that although the features of the cascaded classifier group 320 in this embodiment are related to text images, this feature can also be related to other selected content, so the device in this embodiment can also be used For getting any image to select from image, not limited to text image.

本发明结合上述典型实施例进行了详细描述，各种选择、修改、变化、改进和/或基本的等同技术，目前已知的或者是(可能)未知的内容，对本领域的普通技术人员是熟知的。因此，本发明的上述的典型实施例，在与阐明而不在于限制本发明。在不脱离本发明的精神和范围之内可以做多种改变。因此，本发明可以包含所有已知的或者以后发展的选择、修改、变化、改进和/或基本的等同技术。The present invention has been described in detail in conjunction with the above-mentioned typical embodiments, and various options, modifications, changes, improvements and/or basic equivalent technologies, currently known or (possibly) unknown, are well known to those skilled in the art of. Therefore, the above-mentioned exemplary embodiments of the present invention are intended to illustrate but not limit the present invention. Various changes may be made without departing from the spirit and scope of the invention. Accordingly, the present invention may embrace all known or later developed alternatives, modifications, variations, improvements and/or substantially equivalent techniques.

Claims

1. A method for determining the connection order and feature threshold of a cascade of classifiers (h ₁ , h ₂ , ..., h _M ) for a set of different features (F ₁ , F ₂ , ..., F _M ) , the cascaded classifier group formed by the method is used to obtain the image to be selected from the image, and the different features are related to the image to be selected, wherein M is a positive integer ≥ 1, characterized in that it includes The following steps:

a. Obtain multiple connected components by decomposing at least one image as the current sample, and use the cascade classifier of M different features as the current cascade classifier of each feature, the current sample includes the set of positive examples P And a negative example set N, the positive example is marked as a connected component to be selected, and the negative example is marked as a non-selected connected component;

b. Send the current sample in parallel to the current cascade classifier of each feature, and perform one training in the i cycle training, where i is a positive integer of 0<i≤M, and select all the current samples participating in each training in turn The feature corresponding to the maximum false alarm rate in the feature, thereby determining the connection sequence of the cascade classifier of each different feature, wherein the false alarm rate is the connected component that is mistaken for the selected connected component by the cascade classifier in each training The ratio of the number of connected components that are actually non-selected to the number of current negative examples;

c. After each feature is selected, the current sample is sent to the cascade classifier corresponding to the selected feature for training. During this training process, the false alarm rate and detection rate are constantly changing, and according to the feature The allowed minimum detection rate obtains the threshold interval of the feature, thereby determining the feature threshold interval of the cascade classifier of each different feature; the detection rate is the number of selected connected components correctly detected by a cascade classifier ratio to the number of positive examples; and

d. After performing steps b and c, delete the feature selected in step b and the classifier of the feature to update the current feature and the current classifier of each feature, and keep the set of positive examples in this training unchanged and When the threshold interval of the feature is obtained in step c, the cascaded classifier mistakenly regards the connected component as the selected connected component but is actually not selected as the new negative example set to update the current sample for the next cycle training.

2. The method according to claim 1, wherein, decomposing the image into connected components in step a further comprises the steps of:

a1. process the image with nonlinear Niblack thresholding method;

a2. Decomposing the processed image into connected components.

3. The method according to claim 2, wherein the nonlinear Niblack thresholding method has respectively increased a statistical order filter in the background filter and the foreground filter of the standard Niblack method.

4. the method for claim 3, is characterized in that, the formula of described nonlinear Niblack thresholding method is:

NLNiblack NLNiblack ((x x,, y the y)) = = \{\begin{matrix} 11,, & f f ((x x,, y the y)) > > {T T}_{+ +} ((x x,, y the y)) \\ - - 11,, & f f ((x x,, y the y)) < < {T T}_{- -} ((x x,, y the y)) \\ 00,, & other other \end{matrix}

{T T}_{&PlusMinus; &PlusMinus;} ((x x,, y the y)) = = {\overset{^^}{μ μ}}_{p p 11} ((x x,, y the y,, {W W}_{B B})) &PlusMinus; &PlusMinus; k k \cdot &Center Dot; {\overset{^^}{σ σ}}_{p p 22} ((x x,, y the y,, {W W}_{F f}))

{\overset{^^}{μ μ}}_{p p 11} = = Order order [[Mean mean ((f f ((x x,, y the y)),, {W W}_{B B})),, p p 11,, {W W}_{B B}]]

{\overset{^^}{σ σ}}_{p p 22} = = Order order [[Deviation Deviation ((f f ((x x,, y the y)),, {W W}_{F f})),, p p 22,, {W W}_{F f}]]

Where: k is set to 0.17-0.19 according to the standard Niblack method;

f(x, y) is the pixel intensity at the (x, y) position of the input image;

Mean(, W) is a mean filter with a window width of W;

Deviation (, W) is a standard deviation filter with a window width of W;

Order[, p, W] is an order statistical filter with p as the percentage and W as the width.

5. The method of claim 4, wherein the background filter , the filter width W _B is set to 1/16 of the image width, and the percentage p1 is set to 50%; in the foreground filter , the width W _F is 1/5 of W _B , and p2 is set to 80%.

6. The method according to claim 5, characterized in that setting and initializing operations are carried out before step b carries out cycle training, further comprising the following steps:

Setting the system overall target detection rate D _target of the cascaded classifier group (h ₁ , h ₂ , ... h _j ), where 0<j<=M, M>1;

Initialization variables: overall detection rate D ₀ =1.0, set of negative examples N ₁ =N, number of cycles i=0, range of i is 0<i≤M, and initialization feature set, which contains j features (F ₁ , F ₂ , . . . F _j ), 0<j<=M, M>1.

7. the method for claim 6 is characterized in that, the cycle training in the step b further comprises the following steps:

b1. Send the samples in the positive example set P and the current negative example set Ni to each cascade classifier in parallel, and calculate the feature values of all samples,

b2: For each feature, set a first threshold interval with an initial value of (-∞, +∞). When the feature value of a sample is outside the first threshold interval, the sample is excluded from the first threshold interval. The cascade classifier corresponding to the feature is judged as a non-selected connected component;

b3: For each feature, the first threshold interval is continuously reduced, so that positive examples and negative examples are continuously judged as non-selected connected components, and the detection rate d _ij and false alarm rate f _ij of each cascade classifier are continuously Decline, when the detection rate d _i of a certain classifier of the i-th cycle training drops to not less than the overall detection rate D _i-1 after the previous cycle, stop shrinking the first threshold interval; and

b4: Obtain the detection rate d _ij of each cascaded classifier, the false alarm rate f _ij and the probability FR _ij of correctly discarding non-selected connected components when obtaining the current first threshold interval, where FR _ij = 1-f _ij , as the ratio of the number of non-selected connected components correctly discarded by a cascade classifier to the current number of negative examples; and

b5. In the current feature set, select the feature feature _k with the largest false alarm rate _fij , and the sequence number of the cascade classifier corresponding to the selected feature feature _k is the current cycle number i.

8. The method according to claim 7, further comprising the following steps after step b5:

b6. According to the result of step b5, calculate the mass ratio of the selected feature feature _k γ=FR _k /∑FR _ij , where FR _k is the cascade corresponding to the selected feature feature _k in the i-th cycle training The probability that the classifier correctly discards the non-selected connected components is equivalent to the quality of the cascaded classifier, and ∑FR _ij is the sum of the probability of correctly discarding the non-selected connected components of all cascaded classifiers in the ith cycle training; as well as

b7. According to the detection rate allocation formula d _i =(D _target /D _i-1 ) ^γ , calculate the allowed minimum detection rate d _min of the feature feature _k , where D _i-1 is the overall detection rate after the previous cycle training , i is the number of cycles; and

b8. Update the current overall detection rate D _i =D _i-1 *d _min .

9. The method according to claim 8, characterized in that, the detection rate allocation formula in the step b7 is obtained by the following method:

Assuming that the overall detection rate D is linearly distributed to all cascade classifiers according to the "quality" of the cascade classifiers, the quality of the i-th cascade classifier is Q _i , and the sum of the qualities of all cascade classifiers is

Q = Σ_{i = 1}^{m} Q_{i},

The mass ratio _γi of the i-th cascade classifier is defined as:

{γ γ}_{i i} = = \frac{{Q Q}_{i i}}{{Σ Σ}_{j j = = 11}^{M m} {Q Q}_{j j}}

Then the detection rate allocation formula is expressed as follows, that is, the detection rate d _i assigned to the i-th classifier is:

{d d}_{i i} = = {((D D.))}^{{γ γ}_{i i}}

10. The method of claim 9, wherein step c further comprises the steps of:

c1. Send the samples in the positive example set P and the current negative example set Ni to the cascade classifier h _k corresponding to the selected features, and calculate the feature values of all samples;

c2. Set an initial value as the second threshold interval of (-∞, +∞), when the feature value of a sample is outside the second threshold interval, the sample is classified by the cascade classifier h _k is judged as a non-selected connected component; and

c3. The second threshold interval is continuously reduced, so that positive examples and negative examples are continuously judged as non-selected connected components, and the detection rate d _k and false alarm rate f _k of the cascade classifier h _k continue to decrease, when When d _k drops to not less than the allowed minimum detection rate d _min obtained in step b7, stop narrowing the second threshold interval; the second threshold interval at this time is the threshold interval of the selected feature _k .

11. The method according to any one of claims 1-10, wherein the image to be selected is a text image, the connected components to be selected are text connected components, and the non-selected Connected components are non-text connected components, and features related to text images include: geometric features, edge contrast features, shape regular features, stroke features, and spatial consistency features.

12. The method of claim 11, wherein the geometric features include area ratios, length ratios and aspect ratios, wherein,

The area ratio is the ratio of the area of the connected components to the image area, and its formula is:

Feature Features__AreaRatio AreaRatio = = \frac{Area area ((CC CC))}{Area area ((Picture picture))}

The formula for the length ratio is:

Feature Features__MLengthRatio MLengthRatio = = \frac{max max {{w w,, h h}}}{max max {{PicW Pic,, PicH PIC}}}

The formula for aspect ratio is:

Feature_AspectRatio=max{w/h, h/w}

In the above formula, CC represents the connected component, max{a, b} represents the larger value of a and b, w represents the width value of the bounding box of the connected component, h represents the height value of the bounding box of the connected component, PicW Indicates the width of the image, and PicH indicates the height of the image.

13. The method according to claim 11, wherein the edge contrast feature comprises edge contrast, and the edge contrast is the ratio of the boundary of the connected component and the edge image of the original image to the ratio of the boundary of the connected component, Its formula is:

EdgeContrast EdgeContrast = = \frac{Border Border ((CC CC)) \cap \cap Edge Edge ((Picture picture))}{Border Border ((CC CC))}

Among them, Border (CC) is the boundary of connected components, Edge (Picture) is the edge detection image of the original image, which is the union of Canny operator and Sobel operator, and its formula is:

Edge(Picture)＝Canny(Picture)∪Sobel(Picture)

14. The method according to claim 11, wherein the shape regular features include connected component boundary roughness, number of holes, compactness and duty cycle, and its formula is,

Connected component boundary roughness:

Feature Features__ContourRoughness Contour Roughness = = \frac{| | CC CC - - open open ((imfill imfill ((CC CC)),, 22 \times \times 22)) | |}{| | CC CC | |}

Number of holes:

Feature_CCHoles＝|imholes(CC)|

Firmness:

Feature Features__Compact Compact = = \frac{Area area ((CC CC))}{{[[Perimeter Perimeter ((CC CC))]]}^{22}}

Duty cycle:

Feature Features__OccupyRatio OccupyRatio = = \frac{Area area ((CC CC))}{Area area ((BoundingBox BoundingBox ((CC CC))))}

In the above formula, imfill (CC) is the operation of filling the internal holes of the connected components, 2×2 is the structural element of the morphological opening operation, and open() in the above formula is the morphological opening operation, which is a smoothing operation on the connected components. imholes(CC) indicates the number of holes in connected components, Area(CC) indicates the area of connected components, Perimeter(CC) indicates the perimeter of connected components, and Area(BoundingBox(CC)) indicates the area of the bounding box of connected components.

15. The method according to claim 11, wherein the stroke features include stroke average width and stroke width standard deviation, wherein,

Average stroke width:

Feature_Stroke_Mean=Mean(strokeWidth(skeleton(CC)))

Stroke width standard deviation:

Feature Features__Stroke Stroke__std std = = \frac{Deviation Deviation ((strokeWidth strokeWidth ((skeleton skeleton ((CC CC))))))}{Mean mean ((strokeWidth strokeWidth ((skeleton skeleton ((CC CC))))))}

In the above formula, skeleton is a morphological skeleton algorithm, and the skeleton diagram is obtained by extracting the skeleton of the connected components, strokeWidth is the stroke width obtained for each point on the skeleton diagram, and Mean is the average value of all points on the skeleton diagram , to get the average width, and Deviation() means to find the standard deviation.

16. The method of claim 11, wherein the spatial consistency features include spatial consistency area ratios and spatial consistency boundary features, wherein,

Spatial Consistency Area Ratio:

Feature Features__AreaRatio AreaRatio__S S = = \frac{Area area ((imdilate imdilate ((CC CC,, 55 \times \times 55))))}{Area area ((Picture picture))}

Spatial Consistency Boundary Features:

Feature_Boundary_S = Bound(imdilate(CC, 5×5))

In the above formula, imdilate is the operation of expanding the connected components, 5×5 is the structural element of the expansion operation, Area(imdilate(CC, 5×5)) indicates the area of the connected components after the expansion operation, Bound(imdilate( CC, 5×5)) represents the boundary of the connected components after the dilation operation.

17. A method for obtaining an image to be selected from an image, comprising the following steps:

A. Decompose the image into connected components;

B. Sending said connected components into the first stage of a cascade classifier of a group of different features cascaded according to the method of claim 1, said features being associated with the image to be selected, each cascade classification The non-selected connected components are discarded by the filter, and the connected components to be selected are output to the next classifier; and

C. Combining the connected components to be selected output by the last classifier in the cascade classifier group to form the image to be selected.

18. The method of claim 17, wherein step A further comprises the steps of:

A1. process described image with nonlinear Niblack thresholding method;

A2. Decomposing the processed image into connected components.

19. The method according to claim 18, wherein the nonlinear Niblack thresholding method adds a statistical order filter to the background filter and the foreground filter of the standard Niblack method.

20. the method for claim 19, is characterized in that, the formula of described nonlinear Niblack thresholding method is:

NLNiblack NLNiblack ((x x,, y the y)) = = \{\begin{matrix} 11,, & f f ((x x,, y the y)) > > {T T}_{+ +} ((x x,, y the y)) \\ - - 11,, & f f ((x x,, y the y)) < < {T T}_{- -} ((x x,, y the y)) \\ 00,, & other other \end{matrix}

{T T}_{&PlusMinus; &PlusMinus;} ((x x,, y the y)) = = {\overset{^^}{μ μ}}_{p p 11} ((x x,, y the y,, {W W}_{B B})) &PlusMinus; &PlusMinus; k k \cdot &Center Dot; {\overset{^^}{σ σ}}_{p p 22} ((x x,, y the y,, {W W}_{F f}))

{\overset{^^}{μ μ}}_{p p 11} = = Order order [[Mean mean ((f f ((x x,, y the y)),, {W W}_{B B})),, p p 11,, {W W}_{B B}]]

{\overset{^^}{σ σ}}_{p p 22} = = Order order [[Deviation Deviation ((f f ((x x,, y the y)),, {W W}_{F f})),, p p 22,, {W W}_{F f}]]

Where: k is set to 0.17-0.19 according to the standard Niblack method;

f(x, y) is the pixel intensity at the (x, y) position of the input image;

Mean(, W) is a mean filter with a window width of W;

Deviation (, W) is a standard deviation filter with a window width of W;

21. The method of claim 20, wherein the background filter

, the filter width W _B is set to 1/16 of the image width, and the percentage p1 is set to 50%; in the foreground filter

, the width W _F is 1/5 of W _B , and p2 is set to 80%.

22. The method according to claim 21, wherein each level of classifiers in the cascade classifiers of the set of different features in step B discards non-selected connected components and outputs to the next level of classifiers The method of selecting connected components includes the following steps:

B1. Calculate the eigenvalues of all connected components received according to the features corresponding to each classifier;

B2. comparing the eigenvalues of all connected components with the threshold intervals of the features; and

B3. Discard the connected components whose eigenvalues are outside the threshold interval as non-selected connected components; output the connected components whose eigenvalues are within the threshold interval as connected components to be selected to the next-level classifier.

23. the method for claim 22, is characterized in that, the connected component of described cascade classifier group output is input a strong classifier again, and described strong classifier is the classifier that is trained by standard Adaboost method, The characteristics of the strong classifier are the same as the characteristics of the set of cascaded classifiers.

24. The method according to claim 23, wherein the strong classifier performs a linear combination of all eigenvalues of the connected components output by the cascaded classifier group and judges whether the connected components are to be selected Connected components, thereby discarding non-selected connected components and outputting connected components to be selected.

25. The method according to any one of claims 17-24, wherein the image to be selected is a text image, the connected components to be selected are text connected components, and the non-selected The connected components of are non-text connected components, and the features related to text images include: geometric features, edge contrast features, shape regular features, stroke features, and spatial consistency features.

26. The method of claim 25, wherein the geometric features include area ratios, length ratios and aspect ratios, wherein,

Feature Features__AreaRatio AreaRatio = = \frac{Area area ((CC CC))}{Area area ((Picture picture))}

The formula for the length ratio is:

Feature Features__MLengthRatio MLengthRatio = = \frac{max max {{w w,, h h}}}{max max {{PicW Pic,, PicH PIC}}}

The formula for aspect ratio is:

Feature_AspectRatio=max{w/h, h/w}

27. The method according to claim 25, wherein the edge contrast feature comprises edge contrast, and the edge contrast is the ratio of the boundary of the connected component and the edge image of the original image to the ratio of the boundary of the connected component, Its formula is:

EdgeContrast EdgeContrast = = \frac{Border Border ((CC CC)) \cap \cap Edge Edge ((Picture picture))}{Border Border ((CC CC))}

Edge(Picture)＝Canny(Picture)∪Sobel(Picture)

28. The method of claim 25, wherein the shape regular features include connected component boundary roughness, number of holes, compactness and duty cycle, the formula of which is,

Connected component boundary roughness:

Feature Features__ContourRoughness Contour Roughness = = \frac{| | CC CC - - open open ((imfill imfill ((CC CC)),, 22 \times \times 22)) | |}{| | CC CC | |}

Number of holes:

Feature_CCHoles=|imholes(CC) compactness:

Feature Features__Compact Compact = = \frac{Area area ((CC CC))}{{[[Perimeter Perimeter ((CC CC))]]}^{22}}

Duty cycle:

Feature Features__OccupyRatio OccupyRatio = = \frac{Area area ((CC CC))}{Area area ((BoundingBox BoundingBox ((CC CC))))}

29. The method according to claim 25, wherein the stroke features include stroke average width and stroke width standard deviation, wherein,

Average stroke width:

Feature_Stroke_Mean=Mean(strokeWidth(skeleton(CC)))

Stroke width standard deviation:

Feature Features__Stroke Stroke__std std = = \frac{Deviation Deviation ((strokeWidth strokeWidth ((skeleton skeleton ((CC CC))))))}{Mean mean ((strokeWidth strokeWidth ((skeleton skeleton ((CC CC))))))}

In the above formula, skeleton is a morphological skeleton algorithm, which extracts the skeleton of connected components to obtain a skeleton diagram, strokeWidth is the stroke width obtained for each point on the skeleton diagram, and Mean is the average value of all points on the skeleton diagram , to get the average width, and Deviation() means to find the standard deviation.

30. The method of claim 25, wherein the spatially consistent features include spatially consistent area ratios and spatially consistent boundary features, wherein,

Feature Features__AreaRatio AreaRatio__S S = = \frac{Area area ((imdilate imdilate ((CC CC,, 55 \times \times 55))))}{Area area ((Picture picture))}

Feature_Boundary_S = Bound(imdilate(CC, 5×5))

31. A device for acquiring an image to be selected from images, characterized in that the device comprises:

decomposition means for decomposing the image into connected components;

A group of cascade classifiers of different features cascaded according to the method of weight 1, the connected components are input into the first stage of the group of different features of the cascade classifiers, and each cascade classifier discards non-selected The specified connected components, and output the connected components to be selected to the next classifier; and

The image synthesis device is used for combining the connected components to be selected output by the last classifier in the cascade classifier group to form the image to be selected.

32. The device of claim 31, wherein the decomposition device further comprises:

Processing means, process described image with nonlinear Niblack thresholding method;

Image decomposing means for decomposing the processed image into connected components.

33. The device according to claim 32, wherein the nonlinear Niblack thresholding method adds a statistical order filter to the background filter and the foreground filter of the standard Niblack method.

34. The device according to claim 33, wherein the formula of the nonlinear Niblack thresholding method is:

NLNiblack NLNiblack ((x x,, y the y)) = = \{\begin{matrix} 11,, & f f ((x x,, y the y)) > > {T T}_{+ +} ((x x,, y the y)) \\ - - 11,, & f f ((x x,, y the y)) < < {T T}_{- -} ((x x,, y the y)) \\ 00,, & other other \end{matrix}

{T T}_{&PlusMinus; &PlusMinus;} ((x x,, y the y)) = = {\overset{^^}{μ μ}}_{p p 11} ((x x,, y the y,, {W W}_{B B})) &PlusMinus; &PlusMinus; k k \cdot &Center Dot; {\overset{^^}{σ σ}}_{p p 22} ((x x,, y the y,, {W W}_{F f}))

{\overset{^^}{μ μ}}_{p p 11} = = Order order [[Mean mean ((f f ((x x,, y the y)),, {W W}_{B B})),, p p 11,, {W W}_{B B}]]

{\overset{^^}{σ σ}}_{p p 22} = = Order order [[Deviation Deviation ((f f ((x x,, y the y)),, {W W}_{F f})),, p p 22,, {W W}_{F f}]]

Where: k is set to 0.17-0.19 according to the standard Niblack method;

f(x, y) is the pixel intensity at the (x, y) position of the input image;

Mean(, W) is a mean filter with a window width of W;

Deviation (, W) is a standard deviation filter with a window width of W;

35. The apparatus of claim 34, wherein the background filter , the filter width W _B is set to 1/16 of the image width, and the percentage p1 is set to 50%; in the foreground filter

, the width W _F is 1/5 of W _B , and p2 is set to 80%.

36. The apparatus of claim 35, wherein each stage of classifiers in the set of cascaded classifiers comprises:

A computing device, configured to calculate the received eigenvalues of all connected components according to the features corresponding to each classifier;

Comparing means for comparing the eigenvalues of all connected components with threshold intervals of said features, and

The output device discards the connected components whose eigenvalues are outside the threshold interval as non-selected connected components; outputs the connected components whose eigenvalues are within the threshold interval as the connected components to be selected to the next classifier .

37. The device according to claim 36, wherein the connected components output by the cascaded classifier group are further input into a strong classifier, and the strong classifier is a classifier trained by a standard Adaboost method, and the The characteristics of the strong classifier are the same as the characteristics of the set of cascaded classifiers.

38. The device according to claim 37, wherein the strong classifier performs a linear combination of all eigenvalues of the connected components output by the cascaded classifier group and judges whether the connected components are to be selected Connected components, thereby discarding non-selected connected components and outputting selected connected components.

39. The device according to any one of claims 31-38, wherein the image to be selected is a text image, the connected components to be selected are text connected components, and the non-selected Connected components are non-text connected components, and features related to text images include: geometric features, edge contrast features, shape regular features, stroke features, and spatial consistency features.

40. The apparatus of claim 39, wherein the geometric features include area ratios, length ratios, and aspect ratios, wherein,

Feature Features__AreaRatio AreaRatio = = \frac{Area area ((CC CC))}{Area area ((Picture picture))}

The formula for the length ratio is:

Feature Features__MLengthRatio MLengthRatio = = \frac{max max {{w w,, h h}}}{max max {{PicW Pic,, PicH PIC}}}

The formula for aspect ratio is:

Feature_AspectRatio=max{w/h, h/w}

41. The device according to claim 39, wherein the edge contrast feature comprises edge contrast, and the edge contrast is the ratio of the coincidence degree of the boundary of the connected component and the edge image of the original image to the boundary of the connected component, Its formula is:

EdgeContrast EdgeContrast = = \frac{Border Border ((CC CC)) \cap \cap Edge Edge ((Picture picture))}{Border Border ((CC CC))}

Edge(Picture)＝Canny(Picture)∪Sobel(Picture)

42. The apparatus according to claim 39, wherein the shape regular features include connected component boundary roughness, number of holes, compactness and duty cycle, the formula of which is,

Connected component boundary roughness:

Feature Features__ContourRoughness Contour Roughness = = \frac{| | CC CC - - open open ((imfill imfill ((CC CC)),, 22 \times \times 22)) | |}{| | CC CC | |}

Number of holes:

Feature_CCHoles=|imholes(CC) compactness:

Feature Features__Compact Compact = = \frac{Area area ((CC CC))}{{[[Perimeter Perimeter ((CC CC))]]}^{22}}

Duty cycle:

Feature Features__OccupyRatio OccupyRatio = = \frac{Area area ((CC CC))}{Area area ((BoundingBox BoundingBox ((CC CC))))}

43. The device according to claim 39, wherein the stroke features include stroke average width and stroke width standard deviation, wherein,

Average stroke width:

Feature_Stroke_Mean=Mean(strokeWidth(skeleton(CC)))

Stroke width standard deviation:

Feature Features__Stroke Stroke__std std = = \frac{Deviation Deviation ((strokeWidth strokeWidth ((skeleton skeleton ((CC CC))))))}{Mean mean ((strokeWidth strokeWidth ((skeleton skeleton ((CC CC))))))}

44. The apparatus of claim 39, wherein the spatially consistent features include a spatially consistent area ratio and a spatially consistent boundary feature, wherein,

Spatial Consistency Area Ratio:

Feature Features__AreaRatio AreaRatio__S S = = \frac{Area area ((imdilate imdilate ((CC CC,, 55 \times \times 55))))}{Area area ((Picture picture))}

Spatial Consistency Boundary Features:

Feature_Boundary_S = Bound(imdilate(CC, 5×5))