CN102208023B

CN102208023B - Method for recognizing and designing video captions based on edge information and distribution entropy

Info

Publication number: CN102208023B
Application number: CN 201110024330
Authority: CN
Inventors: 魏宝刚; 庄越挺; 袁杰; 鲁伟明
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2011-01-23
Filing date: 2011-01-23
Publication date: 2013-05-08
Anticipated expiration: 2031-01-23
Also published as: CN102208023A

Abstract

The invention discloses a video subtitle recognition method based on edge information and distribution entropy. It uses the edge detection method of corner point enhancement to get the edge information of the image, then connects the edge points and collects the connected domain, uses the segmentation algorithm to properly segment the connected domain, and then uses the refinement operation to get their accurate position, uses the trailing filter and The joint entropy filter filters out non-text regions, leaving only text regions. For the detected text domain, after unifying into black and white characters, use local threshold binarization, edge noise point expansion removal operation based on forbidden extension point constraint, and noise removal operation based on surrounding edge point counting to obtain binary value The image is sent to the OCR software for identification. This method can overcome the shortcomings of general methods that are sensitive to language, subtitle arrangement, and background complexity. By introducing segmentation algorithms and joint entropy filters, good detection results can be obtained, and the traditional binarization method is greatly improved. Improved recognition accuracy.

Description

Design method of video subtitle recognition based on edge information and distribution entropy

技术领域 technical field

本发明涉及一种基于边缘信息和分布熵的视频字幕识别方法，该方法用于实现在视频中检测并提取字幕用于OCR识别，属于计算机图像处理领域。 The invention relates to a video subtitle recognition method based on edge information and distribution entropy. The method is used to detect and extract subtitles in a video for OCR recognition, and belongs to the field of computer image processing.

背景技术 Background technique

随着多媒体和电子工业的发展，越来越多的视频信息被生产出来。如何有效的组织和检索它们就成为一个难题。很多视频资料如电视新闻、体育比赛、电影、综艺节目等都有后期制作中加入的字幕信息，这些字幕信息与视频内容密切相关。如能有效的识别这些字幕，则能利用它们对视频资料进行组织和检索，具有很强的实用价值。 With the development of multimedia and electronic industries, more and more video information is produced. How to effectively organize and retrieve them becomes a difficult problem. Many video materials, such as TV news, sports games, movies, variety shows, etc., have subtitle information added in post-production, and these subtitle information is closely related to video content. If these subtitles can be effectively identified, they can be used to organize and retrieve video materials, which has strong practical value.

视频字幕识别分为四步：字幕检测、字幕定位、字幕提取和OCR识别。字幕检测用于确定字幕区域；字幕定位用于定位每一行字幕的精确位置；字幕提取用于将字幕区域二值化，只保留笔划像素；最后一步一般交由商用OCR软件实现。字幕检测可以分为四种方法：基于边的方法、基于连通域的方法、基于颜色聚类的方法和基于纹理的方法。基于边的方法使用边过滤器来检测文本边，然后用形态学操作来合并它们。第八届文档分析与识别会议(In Proceedings of 8^rd International Conference on Document Analysis and Recognition (ICDAR),2005,610-614)公布的方法使用边缘检测方法得到四个边缘映射图，然后使用K-MEANS算法检测候选文本区域，最后使用启发式规则和投影分析来确定和精化文本区域。如果没有复杂的背景，基于边的方法的效果会很好，但是当背景包含很多边信息时，它们的效果就不太好。基于纹理的方法使用Gabor过滤器、小波变换、快速傅里叶变换等提取纹理特征，然后用神经网络、SVM分类器等机器学习的方法检测字幕区域。IEEE通信技术2008年会议论文集中（In Proceeding of IEEE International Conference on Communication Technology(ICCT),2008,722-725) 公布的一种方法使用HARR小波变换通过将4小块小波系数合并成一大块来定位大字体文本中，然后使用形态学膨胀操作和神经网格来增强效果。基于连通域的方法将一帧分割成多个小连通域，然后将它们合并到较大的连通域中用来定位字幕。ACM 多媒体技术2007年会议论文集中（In Proceedings of the ACM International Multimedia Conference and Exhibition 2007(MM)，847-850）公布的一种方法使用基于信用的颜色的聚类去掉噪声，他们根据各颜色面板的文本对比度差异来适应性的选择相对最好的颜色面板执行二值化操作。基于颜色聚类的方法假设视频帧中的文本颜色都是统一的，然而这一假设在大多数情况下是不成立的，因此其应用的局限性较大。由于利用一种特征进行字幕检测其效果不理想，因此很多方法联合使用以上多种特征。对于字幕定位，一般使用灰度投影的方法。字幕提取方法可以分为基于颜色的方法和基于笔划的方法。很多基于颜色的方法使用Otsu方法对灰度图进行二值化，但当字幕和背景的灰度级非常相近时，该方法不能很好的分辨出它们，从而不能很好的去噪。《电气和电子工程师协会视频技术电路与系统学报》2005年第15期（IEEE Transactions on Circuits and Systems for Video Technology 2005，15（2）：243-255）和《电气和电子工程师协会图像处理学报》2009年第18期（IEEE Transactions on Image Processing 2009，18（2）：401-411）中公布的一种方法使用有更好的分辨力的局部适应性阀值，结合dam点标记和向内填充，使得大部分噪声点能去被移除。 Video subtitle recognition is divided into four steps: subtitle detection, subtitle positioning, subtitle extraction and OCR recognition. Subtitle detection is used to determine the subtitle area; subtitle positioning is used to locate the precise position of each line of subtitles; subtitle extraction is used to binarize the subtitle area, and only stroke pixels are reserved; the last step is generally implemented by commercial OCR software. Subtitle detection can be divided into four methods: edge-based methods, connected domain-based methods, color clustering-based methods, and texture-based methods. Edge-based methods use edge filters to detect text edges and then use morphological operations to merge them. The method published in the 8th Conference on Document Analysis and Recognition (In Proceedings of 8 ^rd International Conference on Document Analysis and Recognition (ICDAR), 2005, 610-614) uses the edge detection method to obtain four edge maps, and then uses K-MEANS Algorithms detect candidate text regions, and finally use heuristic rules and projection analysis to identify and refine text regions. Edge-based methods work well if there is no complex background, but they do not work well when the background contains a lot of side information. Texture-based methods use Gabor filters, wavelet transforms, fast Fourier transforms, etc. to extract texture features, and then use machine learning methods such as neural networks and SVM classifiers to detect subtitle regions. A method published in Proceeding of IEEE International Conference on Communication Technology (ICCT), 2008, 722-725) uses HARR wavelet transform to locate by combining 4 small blocks of wavelet coefficients into a large block. large font text, and then use morphological dilation operations and neural meshes to enhance the effect. Connected domain-based methods segment a frame into multiple small connected domains, and then merge them into a larger connected domain for subtitle localization. A method published in Proceedings of the ACM International Multimedia Conference and Exhibition 2007 (MM), 847-850 uses credit-based clustering of colors to remove noise. Text contrast differences are adaptively selected relative to the best color panel to perform binarization. The method based on color clustering assumes that the text color in the video frame is uniform, but this assumption is not true in most cases, so its application is limited. Since the effect of subtitle detection using one feature is not ideal, many methods use the above features in combination. For subtitle positioning, the method of grayscale projection is generally used. Subtitle extraction methods can be divided into color-based methods and stroke-based methods. Many color-based methods use the Otsu method to binarize the grayscale image, but when the grayscale levels of the subtitle and the background are very similar, this method cannot distinguish them well, so it cannot denoise well. "IEEE Transactions on Circuits and Systems for Video Technology 2005, 15 (2): 243-255" 2005 No. 15 and "IEEE Transactions on Circuits and Systems for Video Technology" A method published in Issue 18 of 2009 (IEEE Transactions on Image Processing 2009, 18(2):401-411) uses a local adaptive threshold with better resolution, combined with dam point marking and inward padding , so that most of the noise points can be removed.

上面这些字幕检测方法均对视频字幕检测工作作出了一些有益的尝试，但这些方法对字幕与背景的分辨效果不是很好，仅采用这些方法检测一些语言、字体及文字对齐方式多变的视频进行处理效果不佳。另外已存的字幕提取方法虽然能去掉大部分噪音，但由于OCR软件对噪声点非常敏感，导致复杂背景下文本识别的效果不佳。 The above subtitle detection methods have made some useful attempts to detect video subtitles, but these methods are not very effective in distinguishing between subtitles and backgrounds. Only these methods are used to detect some videos with changing languages, fonts and text alignments. Handling is poor. In addition, although the existing subtitle extraction methods can remove most of the noise, because OCR software is very sensitive to noise points, the effect of text recognition in complex backgrounds is not good.

发明内容 Contents of the invention

本发明的目的是克服现有技术的不足，提供一种基于边缘信息和分布熵的视频字幕识别方法。 The purpose of the present invention is to overcome the deficiencies of the prior art and provide a video subtitle recognition method based on edge information and distribution entropy.

基于边缘信息和分布熵的视频字幕识别方法的步骤如下： The steps of the video subtitle recognition method based on edge information and distribution entropy are as follows:

1）检测当前帧与前一已处理帧的差别，若差别大，则进行以下字幕识别操作，否则继续取下一帧进行判断； 1) Detect the difference between the current frame and the previous processed frame, if the difference is large, perform the following subtitle recognition operation, otherwise continue to take the next frame for judgment;

2）字幕识别首先进行字幕检测，在字幕检测中使用边缘检测、边缘点连接、连通域收集及分割方法、连通域精化及拖尾过滤方法得到候选文本域及其位置，再用联合熵过滤器移除非文本域，只留下字幕区域； 2) Subtitle recognition First, perform subtitle detection. In subtitle detection, use edge detection, edge point connection, connected domain collection and segmentation methods, connected domain refinement and trailing filtering methods to obtain candidate text domains and their positions, and then use joint entropy filtering The filter removes the non-text field, leaving only the subtitle area;

3）对字幕区域进行重复性检测，若该区域未重复，则将其颜色极统一为黑底白字，然后进行字幕抽取，否则处理下一字幕区域； 3) Check the repeatability of the subtitle area. If the area is not repeated, the color is very uniform as black and white characters, and then the subtitle is extracted, otherwise the next subtitle area is processed;

4）在字幕抽取中对颜色极统一后的字幕区域进行二值化，去除噪声点后送OCR软件识别。 4) In the subtitle extraction, binarize the subtitle area after the color is extremely uniform, remove the noise point and send it to the OCR software for recognition.

所述的检测当前帧与前一已处理帧的差别，若差别大，则进行以下字幕识别操作，否则继续取下一帧进行判断步骤为：设本帧为I_i，其边缘二值图为E_i，其前一已处理帧即前面第5帧为I_i-5，其边缘二值图为E_i－５，令D_i,i-5=E_i⊕E_i－５,令上一次检测出的字幕区域为Area_i-5,j，又上一次各字幕区域边缘二值图累加和的最小值为pMES，则当前帧中字幕区域累加差值计算如下： The difference between the described detection current frame and the previous processed frame, if the difference is large, then carry out the following subtitle recognition operation, otherwise continue to take the next frame and proceed to the judgment step as follows: set this frame as I _i , and its edge binary image is E _i , its previous processed frame is I _i-5 , its edge binary image is E _i-5 , let D _i,i-5 =E _i ⊕E _i-5 , let last time The detected subtitle area is Area _i-5,j , and the last time the minimum value of the cumulative sum of the edge binary images of each subtitle area is pMES , then the cumulative difference of the subtitle area in the current frame is calculated as follows:

Figure 2011100243300100002DEST_PATH_IMAGE002

1

若cFD小于或等于pMES×0.5，则不需要对本帧进行字幕识别操作，继续取后面第5帧进行判断，否则就需要对本帧进行字幕识别操作，为了进一步防止漏掉字幕，另设一计数值ck，每次cFD小于或等于pMES×0.5时ck值加1，反之则ck重置为0，若ck等于5，则无论前面判断如何，都需要对本帧进行字幕识别操作，同时ck重赋值0。 If cFD is less than or equal to pMES × 0.5, then there is no need to perform subtitle recognition operation on this frame, and continue to take the fifth frame to judge, otherwise, it is necessary to perform subtitle recognition operation on this frame. In order to further prevent missing subtitles, another count value is set ck, every time cFD is less than or equal to pMES × 0.5, the value of ck is increased by 1, otherwise, ck is reset to 0, if ck is equal to 5, no matter what the previous judgment is, the subtitle recognition operation of this frame is required, and ck is reassigned to 0 .

所述的字幕识别首先进行字幕检测，在字幕检测中使用边缘检测、边缘点连接、连通域收集及分割方法、连通域精化及拖尾过滤方法得到候选文本域及其位置，再用联合熵过滤器移除非文本域，只留下字幕区域步骤为： Described subtitle recognition first carries out subtitle detection, uses edge detection, edge point connection, connected domain collection and segmentation method, connected domain refinement and trailing filter method to obtain candidate text domain and its position in subtitle detection, then uses joint entropy The filter removes non-text fields, leaving only subtitles. The steps are:

（1）边缘检测方法 (1) Edge detection method

给定图像I，采用Sobel算子检测边缘，Sobel算子由水平S_H、垂直S_V、对角线S_LD、逆对角线S_RD四个方向上的梯度模板组成，边缘场由下式计算： Given an image I, the Sobel operator is used to detect edges. The Sobel operator is composed of gradient templates in four directions: horizontal S _H , vertical S _V , diagonal S _LD , and anti-diagonal S _RD . The edge field is given by the following formula calculate:

Figure 2011100243300100002DEST_PATH_IMAGE004

2

其中

表示在像素(x,y)处与最大梯度绝对值方向垂直的方向，k为一个调节系数，本文中其取值为1，S然后量化成16级，量化后表示为S’，然后用下式得到边缘映射图EdgeMap： in

Indicates the direction perpendicular to the direction of the maximum gradient absolute value at the pixel (x, y), k is an adjustment coefficient, its value is 1 in this paper, S is then quantized into 16 levels, and expressed as S' after quantization, and then used as follows The formula to get the edge map EdgeMap :

3

（2）边缘点连接方法 (2) Edge point connection method

对于边缘映射图EdgeMap，若同行两个边缘点的距离小于某一阀值T _d，则将EdgeMap中这两个像素之间的像素值都置为1，也即填充这两个边缘点间的像素，T _d由下式确定： For the edge map EdgeMap , if the distance between two edge points in the same row is less than a certain threshold T _d , set the pixel values between these two pixels in the EdgeMap to 1, that is, fill the space between the two edge points pixel, T _d is determined by:

4

其中height和width分别为图像I的高和宽； Wherein height and width are the height and width of image I respectively;

（3）连通域收集及分割方法 (3) Connected domain collection and segmentation method

对上步得到的EdgeMap进行连通域收集，去掉那些高或宽小于整幅图像高或宽的1%的连通域，同时去掉那些最小包围矩形小于整幅图像面积0.2%的连通域，再使用如下步骤对每个连通域C进行区域分割： Collect the connected domains of the EdgeMap obtained in the previous step, remove those connected domains whose height or width is less than 1% of the entire image height or width, and remove those connected domains whose smallest enclosing rectangle is less than 0.2% of the entire image area, and then use the following The step is to segment each connected domain C:

a) 对于C中的每一行i

，得到该行及以上部分的最小包围矩形的面积

和该行以下部分的最小包围矩形的面积，求出这两个面积的和，找出取得最小和的行号存储在bR中； a) for each row i in C

, to get the area of the minimum enclosing rectangle of the row and above

and the area of the smallest enclosing rectangle below the row , calculate the sum of these two areas, find out the row number that obtains the minimum sum and store it in bR;

b) 对于C中的每一列j

，得到该列及左边部分的最小包围矩形的面积

和该列右边部分的最小包围矩形的面积

，求出这两个面积的和，找出取得最小和的列号存储在bC中； b) For each column j in C

, get the area of the minimum enclosing rectangle of the column and the left part

and the area of the smallest enclosing rectangle for the right part of the column

, calculate the sum of these two areas, find out the column number that obtains the minimum sum and store it in bC;

c) 令

，

，若mRA<mCA，则将连通域C在行上以第bR行为界分成两个连通域，否则将连通域C在列上以第bC列为界分成两个连通域； c) orders

,

, if mRA<mCA, the connected domain C is divided into two connected domains on the row with the bRth line boundary, otherwise the connected domain C is divided into two connected domains on the column with the bCth column as the boundary;

其中t _c ,b _c l _c 和 r _c分别是区域C的上界行号、下界行号、左界列号和右界列号； Among them, t _c , b _c l _c and r _c are the upper boundary row number, lower boundary row number, left boundary column number and right boundary column number of area C respectively;

为了防止过分割，只有当连通区域C同时满足以下两个条件时才进行分割：

连通域填充率小于0.8；

分成的两个新连通域面积都大于整幅图像面积的0.2%； In order to prevent over-segmentation, only when the connected region C satisfies the following two conditions at the same time will it be segmented:

Connected domain filling rate is less than 0.8;

The areas of the two new connected domains are both larger than 0.2% of the entire image area;

（4）连通域精化及拖尾过滤方法 (4) Connected domain refinement and trailing filtering method

在进行区域精化前，先去掉那些高大于宽的2倍的连通域，这样可能会误删那些竖排的字幕，为了处理竖排字幕，只须将图像旋转90度，其它操作一模一样； Before performing area refinement, first remove those connected domains whose height is greater than twice the width, which may accidentally delete those vertical subtitles. In order to process vertical subtitles, you only need to rotate the image by 90 degrees, and the other operations are exactly the same;

对上步得到的每个连通区域C，对其位置进行进一步精化的步骤如下： For each connected region C obtained in the previous step, the steps to further refine its position are as follows:

输入：边缘映射图edgeMap，连通域C的初始上下边界位置

Input: edge map edgeMap , the initial upper and lower boundary positions of the connected domain C

输出：精化后的上下界位置

Output: refined upper and lower bound positions

d) 对于连通域C的任意行

，计算其在edgeMap中的左右非0像素跨距

，并存储在集合cSA中； d) For any row in the connected domain C

, calculate its left and right non-zero pixel spans in edgeMap

, and stored in the set cSA;

e) 对于连通域C的任意行

，计算其在edgeMap中的行像素点数，并存储在集合

中，即有

； e) For any row in the connected domain C

, calculate the number of row pixels in the edgeMap , and store it in the collection

in, that is

;

f) 取cSA中的最大值存在中，并将其序号存在pSRN中； f) Take the maximum value in cSA to exist , and store its serial number in pSRN ;

取中的最大值存在

中，并将其序号存在

中； Pick The maximum value in

, and store its sequence number in

middle;

g) 对于在

范围内的所有行，取

的最大行序号

； g) for the

For all rows in the range, take

The maximum row number of

;

对于在

范围内的所有行，取

的最小行序号

； for in

For all rows in the range, take

The minimum row number of

;

对于在

范围内的所有行，取

的最大行序号

； for in

For all rows in the range, take

The maximum row number of

;

对于在

范围内的所有行，取

的最小行序号

； for in

For all rows in the range, take

The minimum row number of

;

h) 令

，即得到精化后的上下界位置； h) orders

, that is, the upper and lower bound positions after refinement are obtained ;

其中

和

通常取值为0.6和0.3； in

and

Usually the values are 0.6 and 0.3;

使用如下拖尾过滤方法去掉一些非字幕连通域： Use the following trailing filtering method to remove some non-subtitle connected domains:

i) 在上面步骤g) 完成后，继续在oPNA中向上和下扫描，直到当前行处的值小于

，假设得到的行号分别为t _tail和 b _tail； i) After the above step g) is completed, continue to scan up and down in oPNA until the value at the current row is less than

, assuming that the obtained line numbers are t _tail and b _tail respectively;

j) 用下式计算尾巴的长： j) Calculate the length of the tail using the following formula:

tl ₁=t ₂-t _tail , tl ₂=b _tail-b ₂ , tl=max (tl ₁, tl ₂) tl ₁ = t ₂ - t _tail , tl ₂ = b _tail - b ₂ , tl =max ( tl ₁ , tl ₂ )

k) 用下式进行过滤，若deleteFlag(C)为1，说明此连通域不是字幕区域，应该删除； k) Use the following formula to filter, if deleteFlag(C) is 1, it means that this connected domain is not a subtitle area and should be deleted;

5

其中ub_c和ut_c分别表示连通域C精化后的上下界位置，而和通常取值为0.2和0.3； where ub _c and ut _c denote the refined upper and lower bounds of connected domain C respectively, and and usually take values of 0.2 and 0.3;

（5）联合熵过滤器 (5) Joint entropy filter

使用联合前景像素分布熵和边缘像素分布熵的联合熵过滤器进行过滤，只留下字幕区域； Use the joint entropy filter of the joint foreground pixel distribution entropy and edge pixel distribution entropy to filter, leaving only the subtitle area;

对于前景像素分布熵，是对某一连通域C的最小包围矩形Rect [t_c,b_c,l_c,r_c]，其中t_c,b_c分别是上下界，l_c,r_c分别是左右界，使用Otsu阀值将其二值化，然后将其分成2行×4列=8部分，使用下式计算分布熵： For the foreground pixel distribution entropy, it is the smallest enclosing rectangle Rect [t _c , b _c , l _c , r _c ] for a connected domain C, where t _c , b _c are the upper and lower bounds respectively, and l _c , r _c are respectively For the left and right boundaries, use the Otsu threshold to binarize it, then divide it into 2 rows × 4 columns = 8 parts, and use the following formula to calculate the distribution entropy:

6

其中p_i,j表示第i行第j列那部分非0像素的比率； Among them, p _i,j represents the ratio of non-zero pixels in the i-th row and j-th column;

对于边缘像素分布熵，是将连通域C的最小包围矩形Rect [t_c,b_c,l_c,r_c]内的Sobel边缘二值图分成2行×4列=8部分，使用下式计算分布熵： For the edge pixel distribution entropy, the Sobel edge binary image in the smallest enclosing rectangle Rect [t _c , b _c , l _c , r _c ] of the connected domain C is divided into 2 rows × 4 columns = 8 parts, and calculated using the following formula Distribution entropy:

7

其中e _ij表示第i行第j列那部分边缘像素数目，而 e _r是8部分边缘像素数目总和， Where e _ij represents the number of edge pixels in row i, column j, and e _r is the sum of the number of edge pixels in 8 parts,

对于任一精化后的连通域C，若其

且

，则认为其是字幕区域，否则就是非字幕区域，应该删除，实验得

和

分别取6.4和2.76时效果最好； For any refined connected domain C, if its

and

, it is considered to be a subtitle area, otherwise it is a non-subtitle area and should be deleted. The experiment shows that

and

The effect is best when taking 6.4 and 2.76 respectively;

对于某些既有横排又有竖排字幕的图像，在原图和旋转90度所得的图像中进行字幕检测，再将两者检测结果进行合并，消除重复。 For some images with both horizontal and vertical subtitles, subtitle detection is performed in the original image and the image obtained by rotating 90 degrees, and then the detection results of the two are combined to eliminate duplication.

所述的对字幕区域进行重复性检测，若该区域未重复，则将其颜色极统一为黑底白字，然后进行字幕抽取，否则处理下一字幕区域步骤为： Described repeatability detection is carried out to subtitle area, if this area does not repeat, then its color is extremely unified as black and white characters, then carry out subtitle extraction, otherwise process next subtitle area step as:

（6）重复性检测 (6) Repeatability detection

采用结合位置和灰度颜色直方图的方法对检测出的字幕区域进行消重，步骤如下： Use the method of combining position and grayscale color histogram to deduplicate the detected subtitle area, the steps are as follows:

l) 提取并存储前一已处理帧所有字幕区域位置Rect_i[t_i,b_i,l_i,r_i]及灰度直方图GH_i{g_i,0,g_i,1,…g_i,255}，其中

为第i个字幕区域灰度级为k的像素数目；提取并存储当前帧所有字幕区域位置Rect_j[t_j,b_j,l_j,r_j] 及灰度直方图GH_j{g_j,0,g_j,1,…g_j,255}； l) Extract and store all subtitle area positions Rect _i [t _i ,b _i ,l _i , _ri ] and grayscale histogram GH _i {g _i,0 ,g _i,1 ,…g _i of the previous processed frame _,255 }, where

is the number of pixels whose gray level is k in the i-th subtitle area; extract and store all subtitle area positions Rect _j [t _j ,b _j ,l _j ,r _j ] and gray histogram GH _j {g _{j, 0} , g _{j, 1} , ... g _{j, 255} };

m)计算它们的位置相似度

和灰度直方图相似度

，其中

是它们的公共部分的面积，而

是它们中大的那个的面积，若与

有一个大于0.8，则是相同区域的重复检测，此时去掉一个，保留一个； m) Calculate their positional similarity

Similarity to grayscale histogram

,in

is the area of their common part, and

is the area of the larger of them, if and

If one is greater than 0.8, it means repeated detection in the same area. At this time, one is removed and one is kept;

（7）颜色极统一 (7) Extremely uniform color

将字幕区域灰度图统一成黑底白字，采取以下步骤： To unify the grayscale image of the subtitle area into black and white characters, take the following steps:

n) 首先将灰度化后的字幕区域用Otsu方法二值化，然后分别使用3×3的掩模

和

对二值化后的字幕区域进行卷积操作，用下式确定每个像素处的边缘颜色： n) First binarize the grayscaled subtitle area with the Otsu method, and then use a 3×3 mask

and

Perform convolution operation on the binarized subtitle area, and use the following formula to determine the edge color at each pixel:

8

令N_w和N_b分别表示白色边缘像素个数和黑色边缘像素个数，定义

为它们的比； Let N _w and N _b denote the number of white edge pixels and black edge pixels respectively, define

for their ratio;

o) 对上步8式得到的边缘映射图P，将边缘像素按列投影，设边缘映射图在列上分解成{x₀,x₁,…,x_n}，其中x_i为边缘图在列上投影为0的某一连续区间的中点，依次建立矩形Rect_i[1,height,x_i,x_i+1]，在该矩形范围内的边缘映射图P中从四边向内扫描，将遇到的第一个边缘像素点删除，重新统计白色边缘像素个数和黑色边缘像素个数，分别设为和

； o) For the edge map P obtained in the previous step 8, the edge pixels are projected by column, and the edge map is decomposed into {x ₀ , x ₁ ,…,x _n } on the column, where x _i is the edge map in The midpoint of a continuous interval whose projection is 0 on the column, establish a rectangle Rect _i [1, height, x _i , x _i+1 ] in sequence, scan from four sides inward in the edge map P within the range of the rectangle, Delete the first edge pixel encountered, re-count the number of white edge pixels and the number of black edge pixels, respectively set to and

;

p) 定义

为它们的比，定义 p) definition

For their ratio, define

9

用如下方法判断字幕区域的颜色极： Use the following method to judge the color pole of the subtitle area:

(a) 若，则为白字； (a) if , it is white;

(b) 若

，则当

时，字幕为白色，当

时字幕为黑色； (b) if

, then when

When , the subtitle is white, when

When the subtitle is black;

(c) 若

，则为白字； (c) if

, it is white;

(d) 若

，则当

时，字幕为白色，当

时字幕为黑色； (d) if

, then when

When , the subtitle is white, when

When the subtitle is black;

(e) 若

，则为黑字； (e) if

, it is in black;

其中

； in

;

q) 判断出字幕颜色极后，若为黑字，则将该字幕区域灰度图反色，否则不作操作。 q) After judging that the color of the subtitle is extremely high, if it is black, then invert the grayscale image of the subtitle area, otherwise no operation is performed.

所述的在字幕抽取中对颜色极统一后的字幕区域进行二值化，去除噪声点后送OCR软件识别步骤为： In the subtitle extraction, the subtitle area after the extremely unified color is binarized, and the OCR software recognition steps are sent after removing the noise points as follows:

r) 将灰度化的字幕区域的高规整化为24，然后分别向上下扩展4个像素，从而扩展后高度为32，设为EI； r) Normalize the height of the grayscaled subtitle area to 24, and then expand 4 pixels up and down respectively, so that the height after expansion is 32, which is set as EI;

s) 将结果二值图B每个像素初始化为1，然后对EI进行步进式水平局部阀值二值化，在一个16×32的局部窗口中用Otsu方法进行二值化，每次水平步进8个像素，同样的方法对EI进行步进式垂直局部阀值二值化，在一个image_width×8的局部窗口中用Otsu方法进行二值化，每次垂直步进4个像素，在每个子窗口中，EI中灰度值低于局部阀值的，B中相应的像素值设为0； t) 将B中与扩展区域值为1的像素相连的像素置0，为了防止将笔划像素也置为0，定义dam points: s) Initialize each pixel of the resulting binary image B to 1, and then perform step-by-step horizontal local threshold binarization on the EI, and use the Otsu method to binarize in a 16×32 local window, each level Step by 8 pixels, the same method is used to perform step-by-step vertical local threshold binarization on EI, and use the Otsu method to binarize in a local window of image_width×8, each vertical step by 4 pixels, in In each sub-window, if the gray value in EI is lower than the local threshold, the corresponding pixel value in B is set to 0; Pixels are also set to 0, defining dam points:

其中H_len(x,y)表示像素(x,y)所在的最长水平连续1序列的长度，而V_len(x,y) 表示像素(x,y)所在的最长垂直连续1序列的长度，对于dam points点，是无法扩展为背景像素的； Where H_len(x,y) represents the length of the longest horizontally continuous 1 sequence where the pixel (x,y) is located, and V_len(x,y) represents the length of the longest vertically continuous 1 sequence where the pixel (x,y) is located, For dam points, it cannot be expanded into background pixels;

u) 使用Sobel算子得到EI的边缘信息，对B中的每一个值为1的连通域，统计落在其中或环绕它的边缘像素点的个数epn，若epn<tepn，则将该连通域的所有像素置为0，从而将该连通域去掉，tepn用下式确定： u) Use the Sobel operator to get the edge information of EI. For each connected domain with a value of 1 in B, count the number epn of the edge pixels falling in it or surrounding it. If epn<tepn , the connected domain Set all the pixels in the domain to 0, so as to remove the connected domain, tepn is determined by the following formula:

tepn=max(cheight,cwidth) tepn =max( cheight,cwidth )

其中cheight和cwidth分别为该连通域的高和宽。 Among them, cheight and cwidth are the height and width of the connected domain, respectively.

v) 将二值图B送入OCR软件进行识别。 v) Send binary image B to OCR software for recognition.

本发明与现有技术相比具有的有益效果： The present invention has the beneficial effect compared with prior art:

1）本发明中的字幕检测算法能克服常用检测算法对语言、字幕对齐方式和背景复杂性敏感的缺点，通过加强字幕特有的角点信息并使用区域分割算法，同时结合联合熵过滤器，能得到对语言、字幕对齐方式和背景复杂性的变化鲁棒性较好的检测结果； 1) The subtitle detection algorithm in the present invention can overcome the shortcomings of common detection algorithms that are sensitive to language, subtitle alignment and background complexity. Obtain detection results that are robust to changes in language, subtitle alignment, and background complexity;

2）本发明中的字幕提取算法能在一般的提取算法的基础上进一步去掉噪声像素，使后续的OCR识别精度有了一定的提高； 2) The subtitle extraction algorithm in the present invention can further remove noise pixels on the basis of general extraction algorithms, so that the subsequent OCR recognition accuracy has been improved to a certain extent;

3）本发明能在一定程度上解决视频帧中重复字幕过多的问题，同时又能防止某些字幕被漏检，在连续的视频帧序列上取得了较好的效果。 3) The present invention can solve the problem of too many repeated subtitles in video frames to a certain extent, and at the same time prevent some subtitles from being missed, and achieve better results in continuous video frame sequences.

附图说明 Description of drawings

图1是视频字幕识别框架图； Fig. 1 is a frame diagram of video subtitle recognition;

图2是视频字幕检测框架图； Fig. 2 is a frame diagram of video subtitle detection;

图3是对某一帧图像进行视频字幕检测的流程实例图； Fig. 3 is the flowchart example figure that a certain frame image is carried out video subtitle detection;

图4是对某一字幕区域进行视频字幕抽取的实例图； Fig. 4 is the example figure that a certain subtitle area is carried out video subtitle extraction;

具体实施方式 Detailed ways

为了更好的理解本发明的技术方案，以下结合附图1和附图2对本发明作进一步的描述。附图1描述了本发明视频字幕识别方法的框架图，附图2描述了本发明中视频字幕检测方法的框架图。 In order to better understand the technical solution of the present invention, the present invention will be further described below in conjunction with accompanying drawings 1 and 2 . Accompanying drawing 1 has described the frame diagram of the video subtitle recognition method of the present invention, and accompanying drawing 2 has described the frame diagram of the video subtitle detection method in the present invention.

1

（1）边缘检测方法 (1) Edge detection method

2

其中

3

（2）边缘点连接方法 (2) Edge point connection method

4

a) 对于C中的每一行i

，得到该行及以上部分的最小包围矩形的面积

和该行以下部分的最小包围矩形的面积

，求出这两个面积的和，找出取得最小和的行号存储在bR中； a) for each row i in C

, to get the area of the minimum enclosing rectangle of the row and above

and the area of the smallest enclosing rectangle below the row

, calculate the sum of these two areas, find out the row number that obtains the minimum sum and store it in bR;

b) 对于C中的每一列j

，得到该列及左边部分的最小包围矩形的面积

和该列右边部分的最小包围矩形的面积

c) 令

，

,

连通域填充率小于0.8；

Connected domain filling rate is less than 0.8;

输入：边缘映射图edgeMap，连通域C的初始上下边界位置 Input: edge map edgeMap , the initial upper and lower boundary positions of the connected domain C

输出：精化后的上下界位置 Output: refined upper and lower bound positions

d) 对于连通域C的任意行，计算其在edgeMap中的左右非0像素跨距

，并存储在集合cSA中； d) For any row in the connected domain C , calculate its left and right non-zero pixel spans in edgeMap

, and stored in the set cSA;

e) 对于连通域C的任意行

，计算其在edgeMap中的行像素点数，并存储在集合

中，即有

； e) For any row in the connected domain C

in, that is

;

取

中的最大值存在

中，并将其序号存在

中； Pick

The maximum value in

, and store its sequence number in

middle;

g) 对于在范围内的所有行，取

的最大行序号

； g) for the For all rows in the range, take

The maximum row number of

;

对于在

范围内的所有行，取的最小行序号

； for in

For all rows in the range, take The minimum row number of

;

对于在

范围内的所有行，取

的最大行序号

； for in

For all rows in the range, take

The maximum row number of

;

对于在

范围内的所有行，取

的最小行序号

； for in

For all rows in the range, take

The minimum row number of

;

h) 令

，即得到精化后的上下界位置； h) orders

, that is, the upper and lower bound positions after refinement are obtained ;

其中和

通常取值为0.6和0.3； in and

Usually the values are 0.6 and 0.3;

i) 在上面步骤g) 完成后，继续在oPNA中向上和下扫描，直到当前行处的值小于，假设得到的行号分别为t _tail和 b _tail； i) After the above step g) is completed, continue to scan up and down in oPNA until the value at the current row is less than , assuming that the obtained line numbers are t _tail and b _tail respectively;

5 5

其中ub_c和ut_c分别表示连通域C精化后的上下界位置，而

和

通常取值为0.2和0.3； Among them, ub _c and ut _c represent the upper and lower bound positions of the connected domain C after refinement, and

and

Usually the values are 0.2 and 0.3;

（5）联合熵过滤器 (5) Joint entropy filter

6

7

对于任一精化后的连通域C，若其

且

和

and

The effect is best when taking 6.4 and 2.76 respectively;

（6）重复性检测 (6) Repeatability detection

m)计算它们的位置相似度

和灰度直方图相似度

，其中是它们的公共部分的面积，而

是它们中大的那个的面积，若

与有一个大于0.8，则是相同区域的重复检测，此时去掉一个，保留一个； m) Calculate their positional similarity

Similarity to grayscale histogram

,in is the area of their common part, and

is the area of the larger of them, if

and If one is greater than 0.8, it means repeated detection in the same area. At this time, one is removed and one is kept;

（7）颜色极统一 (7) Extremely uniform color

和

and

8

for their ratio;

o) 对上步8式得到的边缘映射图P，将边缘像素按列投影，设边缘映射图在列上分解成{x₀,x₁,…,x_n}，其中x_i为边缘图在列上投影为0的某一连续区间的中点，依次建立矩形Rect_i[1,height,x_i,x_i+1]，在该矩形范围内的边缘映射图P中从四边向内扫描，将遇到的第一个边缘像素点删除，重新统计白色边缘像素个数和黑色边缘像素个数，分别设为

和

； o) For the edge map P obtained in the previous step 8, the edge pixels are projected by column, and the edge map is decomposed into {x ₀ , x ₁ ,…,x _n } on the column, where x _i is the edge map in The midpoint of a continuous interval whose projection is 0 on the column, establish a rectangle Rect _i [1, height, x _i , x _i+1 ] in sequence, scan from four sides inward in the edge map P within the range of the rectangle, Delete the first edge pixel encountered, re-count the number of white edge pixels and the number of black edge pixels, respectively set to

and

;

p) 定义

为它们的比，定义 p) definition

For their ratio, define

9

(a) 若

，则为白字； (a) if

, it is white;

(d) 若

，则当时，字幕为白色，当时字幕为黑色； (d) if

, then when When , the subtitle is white, when When the subtitle is black;

(e) 若

，则为白字； (e) if

, it is white;

(d) 若

，则当时，字幕为白色，当

时字幕为黑色； (d) if

, then when When , the subtitle is white, when

When the subtitle is black;

(f) 若

，则为黑字； (f) if

, it is in black;

其中

； in

;

tepn=max(cheight,cwidth) tepn =max( cheight,cwidth )

实施例 Example

如图3、4所示，对于视频中的某一幅帧图像，给出了对包含在其中的字幕的识别流程实例。下面结合本发明的方法详细说明该实例实施的具体步骤，如下： As shown in Figures 3 and 4, for a certain frame image in the video, an example of the process of identifying the subtitles contained in it is given. Below in conjunction with the method of the present invention describe in detail the concrete steps that this example implements, as follows:

对于某一帧图像，如附图3（a）所示，采用权利要求3中的（1）边缘检测方法得出其角点加强的边缘映射图，结果如附图3（b）所示； For a certain frame of image, as shown in Figure 3(a), use the edge detection method (1) in claim 3 to obtain an edge map with enhanced corner points, and the result is shown in Figure 3(b);

(1) 以上步得到的边缘映射图为输入，采用权利要求3中的（2）边缘点连接方法连接边缘点，结果如附图3（c）所示； (1) The edge map obtained in the above step is used as input, and the edge points are connected by (2) edge point connection method in claim 3, and the result is shown in Figure 3 (c);

(2) 以边缘点连接后的映射图为输入，采用权利要求3中的（3）连通域收集及分割算法得到较大的连通域，结果如附图3（d）所示； (2) Taking the map of the connected edge points as input, adopt (3) connected domain collection and segmentation algorithm in claim 3 to obtain a larger connected domain, and the result is shown in Figure 3(d);

(3) 对上步得到的连通域，使用权利要求3中的（4）连通域精化及拖尾过滤方法得到更准确的区域位置大小并进行初步过滤，结果如附图3（e)所示； (3) For the connected domain obtained in the previous step, use the (4) connected domain refinement and tailing filtering method in claim 3 to obtain a more accurate area position size and perform preliminary filtering. The result is shown in Figure 3 (e) Show;

(4) 对过滤后剩下的连通域，使用权利要求3中的（5）联合熵过滤器去掉非字幕区域，最后检测结果如附图3（f）所示； (4) For the remaining connected domains after filtering, use the (5) joint entropy filter in claim 3 to remove the non-subtitle area, and the final detection result is shown in Figure 3 (f);

(5) 对于上步检测出的某一特定字幕区域，如附图4（a）所示，先使用权利要求4中的（6）重复性检测判断其是否与之前已检测区域重复，如不重复，则使用权利要求4中的（7）颜色极统一方法将该区域统一成黑底白字； (5) For a specific subtitle area detected in the previous step, as shown in Figure 4(a), first use the (6) repeatability detection in claim 4 to determine whether it is repeated with the previously detected area, if not Repeatedly, use the (7) color pole unification method in claim 4 to unify the area into white characters on a black background;

(6) 对统一颜色极后的字幕区域，使用权利要求5中的二值化和去噪算法，得到较好的二值图，结果如附图4（b)所示； (6) For the subtitle area behind the uniform color, use the binarization and denoising algorithm in claim 5 to obtain a better binary image, the result is shown in Figure 4 (b);

(7) 使用商业OCR软件对二值图进行识别，结果如附图4（c)所示。 (7) Use commercial OCR software to identify the binary image, and the result is shown in Figure 4 (c).

从附图中可以看出，本方法能较好的检测视频图像帧中的字幕区域，并将之二值化，二值化后的结果能达到较好的识别精度。 It can be seen from the accompanying drawings that this method can better detect the subtitle area in the video image frame and binarize it, and the binarized result can achieve better recognition accuracy.

Claims

1. A video subtitle recognition method based on edge information and distribution entropy, is characterized in that its steps are as follows:

1) Detect the difference between the current frame and the previous processed frame, if the difference is large, perform the following subtitle recognition operation, otherwise continue to take the next frame for judgment;

2) Subtitle recognition First, perform subtitle detection. In subtitle detection, use edge detection, edge point connection, connected domain collection and segmentation methods, connected domain refinement and trailing filtering methods to obtain candidate text domains and their positions, and then use joint entropy filtering The filter removes the non-text field, leaving only the subtitle area;

3) Check the repeatability of the subtitle area. If the area is not repeated, the color is very uniform as black and white characters, and then the subtitle is extracted, otherwise the next subtitle area is processed;

4) In the subtitle extraction, binarize the subtitle area after the color is extremely uniform, remove the noise point and send it to the OCR software for recognition; the difference between the current frame and the previous processed frame is detected, if the difference is large, the following is performed Subtitle recognition operation, otherwise continue to take the next frame to judge the steps as follows: set this frame as I _i , its edge binary image is E _i , its previous processed frame is I _i-5 , its edge The binary image is E _i－5 , let D _i,i-5 =E _i ⊕E _i－5 , let any subtitle area detected last time be Area _i-5,j , and the edge of each subtitle area last time The minimum value of the cumulative sum of the binary image is pMES, and the cumulative difference of the subtitle area in the current frame is calculated as follows:

cFD = \underset{j}{Σ} {D.}_{i, i - 5} ({area}_{i - 5, j}) - - - 1

If cFD is less than or equal to pMES×0.5, then there is no need to perform subtitle recognition operation on this frame, and continue to take the next 5th frame for judgment, otherwise, it is necessary to perform subtitle recognition operation on this frame. In order to further prevent missing subtitles, another count value is set ck, every time cFD is less than or equal to pMES×0.5, the value of ck is increased by 1, otherwise, ck is reset to 0, if ck is equal to 5, no matter what the previous judgment is, the subtitle recognition operation of this frame is required, and ck is reassigned to 0 .

2. a kind of video subtitle recognition method based on edge information and distribution entropy according to claim 1, it is characterized in that described subtitle recognition first carries out subtitle detection, uses edge detection, edge point connection, connected region in subtitle detection The collection and segmentation method, connected region refinement and trailing filtering method obtain the candidate text domain and its position, and then use the joint entropy filter to remove the non-text domain, leaving only the subtitle area. The steps are:

(1) Edge detection method

Given an image I, the Sobel operator is used to detect edges. The Sobel operator is composed of gradient templates in four directions: horizontal S _H , vertical S _V , diagonal S _LD , and anti-diagonal S _RD . The edge field is given by the following formula calculate:

S＝MAX(|S _H |,|S _V |,|S _LD |,|S _RD |)+k×|S _⊥-MAX | 2

Among them, ⊥-MAX represents the direction perpendicular to the direction of the maximum gradient absolute value at the pixel (x, y), k is an adjustment coefficient, and its value is 1 in this paper, and S is then quantized into 16 levels, and it is expressed as S' after quantization , and then use the following formula to get the edge map EdgeMap:

EdgeMap EdgeMap ((x x,, y the y)) = = \{\begin{matrix} 11 & {S S}^{' '} ((x x,, y the y)) > > = = 1515 \\ 00 & {S S}^{' '} ((x x,, y the y)) < < 1515 \end{matrix} - - - - - - 33

(2) Edge point connection method

For the edge map EdgeMap, if the distance between two edge points in the same line is less than a certain threshold T _d , then set the pixel values between these two pixels in the EdgeMap to 1, and T _d is determined by the following formula:

{T T}_{d d} = = max max ((44,, min min (([[\frac{max max ((height height,, width width))}{5050}]],, 1616)))) - - - - - - 44

Wherein height and width are the height and width of image I respectively;

(3) Connected area collection and segmentation method

Collect the connected regions of the EdgeMap obtained in the previous step, remove those connected regions whose height or width is less than 1% of the entire image height or width, and remove those connected regions whose smallest enclosing rectangle is less than 0.2% of the entire image area, and then use the following The step performs region segmentation on each connected region C:

a) For each row i∈[t _c , b _c ] in the connected region C, get the area of the smallest enclosing rectangle Area _up (i) of the row and above and the area of the smallest enclosing rectangle Area _down of the part below the row (i), find out the sum of these two areas, find out the row number of the minimum sum and store it in bR;

b) For each column j∈[l _c , r _c ] in the connected region C, obtain the area of the minimum enclosing rectangle Area _left (j) of the column and the left part and the area of the minimum enclosing rectangle Area _right of the right part of the column (j), calculate the sum of these two areas, find out the column number to obtain the minimum sum and store it in bC;

c) Let mRA=Area _up (bR)+Area _down (bR), mCA=Area _left (bC)+Area _right (bC), if mRA<mCA, then divide the connected area C into Two connected areas, otherwise, the connected area C is divided into two connected areas on the column with the bC column as the boundary;

Among them, t _c , b _c , l _c and r _c are the upper boundary row number, lower boundary row number, left boundary column number and right boundary column number of connected area C respectively;

In order to prevent over-segmentation, the segmentation is only performed when the connected region C satisfies the following two conditions at the same time: ①The filling rate of the connected region is less than 0.8; ②The areas of the two new connected regions are both larger than 0.2% of the entire image area;

(4) Connected area refinement and trailing filtering method

Before performing area refinement, remove those connected areas whose height is greater than twice the width, which may mistakenly delete those vertical subtitles. In order to process vertical subtitles, you only need to rotate the image by 90 degrees, and the other operations are exactly the same;

For each connected region C obtained in the previous step, the steps to further refine its position are as follows:

Input: edge map edgeMap, the initial upper and lower bound line numbers t _c , b _c of the connected region C

Output: refined upper and lower bound line numbers ut _c , ub _c

d) For any row i∈[t _c , b _c ] of the connected area C, calculate its left and right non-zero pixel span r _i -l _i in the edgeMap, and store it in the set cSA;

e) For any row i∈[t _c ,b _c ] of the connected domain C, calculate the number of row pixels in edgeMap and store it in the set oPNA

f) Take the maximum value in cSA and store it in pCS, and store its serial number in pSRN;

Take the maximum value in oPNA and store it in pOPN, and store its serial number in pPRN;

g) For all rows within the range of i∈[t _c , pSRN], take the maximum row number t ₁ with cSA[i]<pCS×η ₁ ;

For all rows within the range of i∈[pSRN,b _c ], take the smallest row number b ₁ where cSA[i]<pCS×η ₁ ;

For all rows within the range of i∈[t _c ,pPRN], take the maximum row number t ₂ of oPNA[i]<pOPN×η ₂ ;

For all rows within the range of i∈[pPRN,b _c ], take the smallest row number b ₂ where oPNA[i]<pOPN×η ₂ ;

h) Make ut _c ＝max(t ₁ , t ₂ ), ub _c ＝min(b ₁ , b ₂ ), that is, get the refined upper and lower boundary row numbers ut _c , ub _c ;

Wherein η ₁ and η ₂ usually take a value of 0.6 and 0.3;

Use the following trailing filtering method to remove some non-subtitle connected domains:

i) after the above step g) is completed, continue to scan up and down in the oPNA until the value at the current row is less than pOPN × η ₃ , assuming that the row numbers obtained are respectively t _tail and b _tail ;

j) Use the following formula to calculate the length of the tail:

tl ₁ =t ₂ -t _tail ,tl ₂ =b _tail -b ₂ ,tl=max(tl ₁ ,tl ₂ )

k) Use the following formula to filter, if deleteFlag (C) is 1, it means that this connected area is not a subtitle area and should be deleted;

deleteFlag deleteFlag ((C C)) = = \{\begin{matrix} 11 & tl tl > > (({ub ub}_{c c} - - {ut out}_{c c})) \times \times {η η}_{44} \\ 00 & others others \end{matrix} - - - - - - 55

Among them, ut _c and ub _c respectively represent the upper and lower boundary line numbers of the connected region C after refinement, and η ₃ and η ₄ usually take values of 0.2 and 0.3;

(5) Joint entropy filter

Use the joint entropy filter of the joint foreground pixel distribution entropy and edge pixel distribution entropy to filter, leaving only the subtitle area;

For the distribution entropy of foreground pixels, it is the smallest enclosing rectangle Rect[t _c , b _c , l _c , r _c ] for a connected region C, where t _c , b _c are the upper and lower boundary row numbers, l _c , r _c They are the column numbers of the left and right borders, and use the Otsu threshold to binarize them, and then divide them into 2 rows × 4 columns = 8 parts, and use the following formula to calculate the distribution entropy:

{E E.}_{FPD FPD} = = - - \underset{i i,, j j}{Σ Σ} (({p p}_{i i,, j j} ln ln {p p}_{i i,, j j} + + ((11 - - {p p}_{i i,, j j})) ln ln ((11 - - {p p}_{i i,, j j})))) i i &Element; &Element; {{1,2 1,2}},, j j &Element; &Element; {{1,2,3,4 1,2,3,4}} - - - - - - 66

Among them, p _i,j represents the ratio of non-zero pixels in the i-th row and j-th column;

For the edge pixel distribution entropy, the Sobel edge binary image in the smallest enclosing rectangle Rect[t _c , b _c , l _c , r _c ] of the connected area C is divided into 2 rows × 4 columns = 8 parts, and calculated using the following formula Distribution entropy:

{E E.}_{E E.} = = - - \underset{i i,, j j}{Σ Σ} ((\frac{{e e}_{ij ij}}{{e e}_{r r}} ln ln \frac{{e e}_{ij ij}}{{e e}_{r r}} + + ((11 - - \frac{{e e}_{ij ij}}{{e e}_{r r}})) ln ln ((11 - - \frac{{e e}_{ij ij}}{{e e}_{r r}})))) i i &Element; &Element; {{1,2 1,2}},, j j {{1,2,3,4 1,2,3,4}} - - - - - - 77

Where e _ij represents the number of edge pixels in row i, column j, and e _r is the sum of the number of edge pixels in 8 parts,

For any refined connected area C, if its E _FPD > E _T1 and E _E > E _T2 , it is considered to be a subtitle area, otherwise it is a non-subtitle area and should be deleted. Experimentally, E _T1 and E _T2 are respectively taken as 6.4 and 2.76 are the best;

For some images with both horizontal and vertical subtitles, subtitle detection is performed in the original image and the image obtained by rotating 90 degrees, and then the detection results of the two are combined to eliminate duplication.

3. a kind of video subtitle recognition method based on edge information and distribution entropy according to claim 1, it is characterized in that described subtitle area is carried out repetition detection, if this area does not repeat, then its color is extremely unified as Black background and white text, and then perform subtitle extraction, otherwise, the steps to process the next subtitle area are:

(6) Repeatability detection

Use the method of combining position and grayscale color histogram to deduplicate the detected subtitle area, the steps are as follows:

l) Extract and store all subtitle area positions Rect _i [t _i , b _i , l _i , r _i ] and grayscale histogram GH _i {g _i,0 ,g _i,1 ,…g _i of the previous processed frame _,k ,...g _i,255 }, where g _i,k is the number of pixels whose gray level is k in the i-th subtitle area; extract and store all subtitle area positions of the current frame Rect _j [t _j ,b _j ,l _j ,r _j ] and gray histogram GH _j {g _j,0 ,g _j,1 ,…g _j,255 };

m) Calculate their positional similarity

Similarity to grayscale histogram

Where Rect _i ∩ Rect _j is the area of their common part, and max(Rect _i ,Rect _j ) is the area of the larger of them, if Simi _Loc (i,j) and Simi _GHis (i,j) have a If it is greater than 0.8, it means repeated detection in the same area. At this time, one is removed and one is kept;

(7) Extremely uniform color

To unify the grayscale image of the subtitle area into black and white characters, take the following steps:

n) First binarize the grayscaled subtitle area with the Otsu method, and then use a 3×3 mask

k_{w} (x, the y) = [\begin{matrix} 0 & - 1 & 0 \\ - 1 & 4 & - 1 \\ 0 & - 1 & 0 \end{matrix}]

and

k_{b} (x, the y) = [\begin{matrix} 0 & 1 & 0 \\ 1 & - 4 & 1 \\ 0 & 1 & 0 \end{matrix}]

P P ((x x,, y the y)) = = \{\begin{matrix} White White__Edge Edge,, & {K K}_{w w} ((x x,, y the y)) > > 00 \\ Black Black__Edge Edge,, & {K K}_{b b} ((x x,, y the y)) > > 00 \\ Non No__Edge Edge,, & {K K}_{w w} ((x x,, y the y)) \leq \leq 00 and and {K K}_{b b} ((x x,, y the y)) \leq \leq 00 \end{matrix} - - - - - - 88

Let N _w and N _b represent the number of white edge pixels and the number of black edge pixels respectively, define R ₁ =N _w /N _b as their ratio;

o) For the edge map P obtained in the previous step 8, the edge pixels are projected by column, and the edge map is decomposed into {x ₀ , x ₁ ,…,x _n } on the column, where x _i is the edge map At the midpoint of a continuous interval whose projection is 0 on the column, a rectangle Rect _i [1, height, x _i , x _i+1 ] is sequentially established, and the edge map P within the range of the rectangle is scanned from the four sides inward , delete the first edge pixel encountered, recount the number of white edge pixels and the number of black edge pixels, and set them to N _w ' and N _b 'respectively;

p) Define R ₂ =N _w '/N _b ' as their ratio, define

ΔR ΔR = = \frac{{R R}_{22} - - {R R}_{11}}{max max (({R R}_{11},, {R R}_{22}))},, - - 11 \leq \leq ΔR ΔR \leq \leq 11 - - - - - - 99

Use the following method to judge the color pole of the subtitle area:

(a) If ΔR>T _2h , it is white;

If T _2l ≤ΔR≤T _2h , when R ₁ >T _2v , the subtitle is white, and when R ₁ ≤T _2v , the subtitle is black;

If T _1h ≤ΔR≤T _2l , then it is white;

(d) If T _1l ≤ΔR≤T _1h , when R ₁ >T _1v , the subtitle is white, and when R ₁ ≤T _1v , the subtitle is black;

If ΔR<T _1l , it is in black;

Where T _1l =-0.25, T _1h =-0.15, T _1v =1.2, T _2l =0, T _2h =0.35, T _2v =0.8;

q) After judging the color of the subtitle, if it is black, then invert the grayscale image of the subtitle area, otherwise no operation is performed.

4. a kind of video subtitle recognition method based on edge information and distribution entropy according to claim 1, it is characterized in that in described subtitle extraction, the subtitle region after the extremely uniform color is carried out binarization, after removing noise point The steps to send OCR software recognition are:

r) normalize the height of the grayscaled subtitle area to 24, then expand 4 pixels up and down respectively, so that the height after expansion is 32, and the subtitle area image after the expansion is set as EI;

s) Initialize each pixel of the resulting binary image B to 1, then perform step-by-step horizontal local threshold binarization on EI, and use the Otsu method to binarize in a 16×32 local window, each time Stepping 8 pixels, the same method is used to perform step-by-step vertical local threshold binarization on EI, and use the Otsu method to binarize in a local window of image_width×8, each vertical step is 4 pixels, in In each sub-window, if the gray value in EI is lower than the local threshold value, the corresponding pixel value in B is set to 0; Pixels are also set to 0, defining dam points:

Dam points＝{(x,y)|B(x,y)＝1∧1≤min(H_len(x,y),V_len(x,y)≤4

Where H_len(x,y) represents the length of the longest horizontally continuous 1 sequence where the pixel (x,y) is located, and V_len(x,y) represents the length of the longest vertically continuous 1 sequence where the pixel (x,y) is located, For dam points, it cannot be expanded into background pixels;

u) Use the Sobel operator to obtain the edge information of EI. For each connected area with a value of 1 in B, count the number epn of the edge pixels falling in it or surrounding it. If epn<tepn, the connected area All pixels in the area are set to 0, thereby removing the connected area, and tepn is determined by the following formula:

tepn=max(cheight,cwidth)

where cheight and cwidth are the height and width of the connected area, respectively;

v) Send binary image B to OCR software for recognition.