CN101425088A

CN101425088A - Key frame extracting method and system based on chart partition

Info

Publication number: CN101425088A
Application number: CNA2008102250487A
Authority: CN
Inventors: 戴琼海; 高跃; 谢旭东
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2008-10-24
Filing date: 2008-10-24
Publication date: 2009-05-06

Abstract

The present invention provides a method and system for extracting key frames based on graph segmentation. The method includes the following steps: analyzing video shots, extracting frame features from all video frames in the video shots, and calculating the similarity between images of each frame , and form the intra-shot video frame similarity matrix D _N×N ; apply all the video frames in the shot to build a graph G=(V, E), where each frame in the shot is used as a node of V, node i and node The edge between j is determined by the similarity and positional relationship between the i-th frame and the j-th frame; apply the Normalized Cuts method to the graph G=(V, E) to segment the graph, and divide the graph into several segments; select each segment from the graph and The frame most similar to the other frames in the segment is used as the key frame. The present invention effectively obtains the key frame with stronger expressive ability to the original video shot.

Description

Key frame extraction method and system based on graph segmentation

技术领域 technical field

本发明涉及视频分析领域，特别是涉及一种基于图分割的关键帧提取方法和系统。The invention relates to the field of video analysis, in particular to a method and system for extracting key frames based on graph segmentation.

背景技术 Background technique

随着图像，视频处理技术的发展，人们应用处理的媒体信息量每天都呈几何级数增长，信息技术的发展带来了海量的视频数据。视频点播、数字交互电视、视频会议等媒体信息交换和应用形式已经完全融入人们的日常工作、学习和娱乐之中。但是，数字化后的媒体信息，尤其是数字化后的视频信息具有数据海量性，传统的文本数据分析、检索方法在进行视频信息管理时要耗费大量的时间和人力，效率低下。因此，如何高效、快捷地浏览和检索这些视频资料，成为人们日益迫切的需求。With the development of image and video processing technology, the amount of media information processed by people is increasing exponentially every day, and the development of information technology has brought massive video data. Media information exchange and application forms such as video on demand, digital interactive TV, and video conferencing have been fully integrated into people's daily work, study, and entertainment. However, the digitized media information, especially the digitized video information has a massive amount of data. Traditional text data analysis and retrieval methods consume a lot of time and manpower when managing video information, and are inefficient. Therefore, how to efficiently and quickly browse and retrieve these video materials has become an increasingly urgent need for people.

关键帧是反映一组镜头中主要信息内容的一帧或若干帧图像，可以简洁地表达镜头内容。当检索需要视频资料时，就不用从头到尾地查找一段视频，而是通过关键帧的非线性浏览来快速定位查询的内容。A key frame is one or several frames of images that reflect the main information content in a group of shots, and can express the shot content concisely. When video data is needed for retrieval, it is not necessary to search for a video from beginning to end, but to quickly locate the query content through non-linear browsing of key frames.

目前的视频关键帧提取方法主要可以分成三类。第一种方法是指定位置方法。指定位置方法是最简单的关键帧提取方法。这类方法不考虑视频的具体内容与视频的变化趋势，而采用相对固定的位置作为关键帧。比如确定镜头起止点后，直接取第一帧、最后一帧、中间帧或最接近所有帧平均值的一帧作为关键帧。这种技术虽然操作简单，计算迅速，可以实时得到关键帧，但不能保证视频里所有重要片段都有至少一个关键帧，也不能保证关键帧对镜头内容的代表性。The current video key frame extraction methods can be mainly divided into three categories. The first method is the specified location method. The specified position method is the simplest keyframe extraction method. This type of method does not consider the specific content of the video and the changing trend of the video, but uses a relatively fixed position as the key frame. For example, after determining the start and end points of the shot, directly take the first frame, the last frame, the middle frame or the frame closest to the average value of all frames as the key frame. Although this technology is simple to operate, fast to calculate, and can obtain key frames in real time, it cannot guarantee that all important segments in the video have at least one key frame, nor can it guarantee that the key frames are representative of the lens content.

第二种方法是分析镜头内发生显著内容变化的方法。这种方法顺序地处理视频序列，只关心视频在时间轴上的变化显著程度。第一个关键帧通常取镜头的第一帧，按顺序遍历所有帧，当变化达到一定程度(达到阈值)时，就将达到阈值的帧作为下一个关键帧。如在springer的期刊Machine Vision andApplications(vol.10，no.2p.51-65，1997)公布的方法中，从前一参考帧开始向后寻找，直到找到一帧到参考帧的距离大于阈值，就将这一帧的前一帧作为新的关键帧。然后从关键帧开始向后查找，直到找到一帧到这个新的关键帧的距离大于阈值，就将这一帧的前一帧作为下一个参考帧。这样得到的关键帧就代表前一参考帧到下一参考帧之间所有的帧。但是这种方法提取出来的关键帧与起始位置和阈值设定有很大关系。如果是采用累计变化的方法，即使变化很小的长视频也会产生较多关键帧，因此关键帧对视频重要片段的代表性可能不够。而且，由于是累计变化，与处理视频的方向也有关系，从后向前处理视频得到的结果与从前向后处理的结果不同。The second approach is to analyze where significant content changes occur within a shot. This approach processes video sequences sequentially and only cares about how significantly the video changes along the time axis. The first key frame usually takes the first frame of the lens, traverses all frames in order, and when the change reaches a certain level (reaches the threshold), the frame that reaches the threshold is taken as the next key frame. For example, in the method published in springer's journal Machine Vision and Applications (vol.10, no.2p.51-65, 1997), start looking backward from the previous reference frame until the distance between a frame and the reference frame is greater than the threshold, then Use the previous frame of this frame as the new keyframe. Then search backward from the key frame until the distance between a frame and the new key frame is found to be greater than the threshold, and the previous frame of this frame is used as the next reference frame. The key frame obtained in this way represents all the frames between the previous reference frame and the next reference frame. However, the key frames extracted by this method have a lot to do with the starting position and threshold setting. If the cumulative change method is used, even long videos with small changes will generate many key frames, so the key frames may not be representative of important segments of the video. Moreover, since it is a cumulative change, it is also related to the direction in which the video is processed, and the result obtained by processing the video from back to front is different from that from front to back.

第三种方法则是通过聚类分析将视频镜头的帧分成若干类，选取最靠近聚类中心的点表示聚类的点，最终形成视频序列的关键帧集合。目前主要的聚类方法，比如应用模糊C均值聚类等方法，聚类间相似度较低，并不能有效的使得聚类内的相似度足够大。The third method is to divide the frames of the video footage into several categories through cluster analysis, select the point closest to the cluster center to represent the cluster point, and finally form the key frame set of the video sequence. At present, the main clustering methods, such as fuzzy C-means clustering and other methods, have low similarity between clusters and cannot effectively make the similarity within clusters large enough.

总之，需要本领域技术人员迫切解决的一个技术问题就是：如何有效获得对原始视频镜头表示能力更强的关键帧。In a word, a technical problem that needs to be solved urgently by those skilled in the art is: how to effectively obtain key frames with stronger representation ability for the original video footage.

发明内容 Contents of the invention

本发明所要解决的技术问题是提供一种基于图分割的关键帧提取方法和系统，有效获得对原始视频镜头表示能力更强的关键帧。The technical problem to be solved by the present invention is to provide a method and system for extracting key frames based on graph segmentation, which can effectively obtain key frames with stronger representation ability for original video shots.

为了解决上述问题，本发明公开了一种基于图分割的关键帧提取方法，包括以下步骤：In order to solve the above problems, the present invention discloses a method for extracting key frames based on graph segmentation, comprising the following steps:

解析视频镜头，对视频镜头中的所有视频帧提取帧的特征，进行各帧图像之间相似度的计算，并形成镜头内视频帧相似度矩阵D_N×N，其中，D_ij中保存第i帧与第j帧的综合相似度；Analyze the video shot, extract the frame features for all the video frames in the video shot, calculate the similarity between the images of each frame, and form the similarity matrix D _{N×N of} the video frame in the shot, where D _ij saves the i-th The comprehensive similarity between the frame and the jth frame;

应用镜头内的全部视频帧建立一个图G＝(V，E)，其中镜头内的每一帧作为V的一个节点，节点i和节点j之间的边由第i帧和第j帧的相似度及位置关系确定；Apply all the video frames in the shot to build a graph G=(V, E), where each frame in the shot is used as a node of V, and the edge between node i and node j is determined by the similarity between the i-th frame and the j-th frame Determination of degree and position relationship;

对图G＝(V，E)应用Normalized Cuts方法进行分割，将图分成若干段；Applying the Normalized Cuts method to graph G=(V, E) is segmented, and the graph is divided into several segments;

从图中每段选取与所述段中其他帧最相似的一帧作为关键帧。From each segment in the graph, a frame that is most similar to other frames in said segment is selected as a key frame.

优选的，通过比较两帧图像的色彩直方图计算所述两帧图像之间的相似度。Preferably, the similarity between the two frames of images is calculated by comparing the color histograms of the two frames of images.

进一步，所述图G＝(V，E)中节点i和节点j之间的边通过以下步骤确定：Further, the edge between node i and node j in the graph G=(V, E) is determined through the following steps:

计算第i帧和第j帧的相似度D_ij；Calculate the similarity D _ij of the i-th frame and the j-th frame;

计算节点i和节点j之间的位置权重 $ω (i, j) = e^{- \frac{1}{σ} {(i - j)}^{2}},$ 其中σ为参数；Calculate the position weight between node i and node j $ω (i, j) = e^{- \frac{1}{σ} {(i - j)}^{2}},$ where σ is a parameter;

计算节点i和节点j之间的边e(i，j)＝ω(i，j)×D_ij。Calculate the edge e(i,j)=ω(i,j)×D _ij between node i and node j.

优选的，对图G＝(V，E)应用Normally Cuts方法进行图分割包括以下步骤：Preferably, applying the Normally Cuts method to graph G=(V, E) to carry out graph segmentation includes the following steps:

定义图的两个分段V′、V＂之间的相似度为 $cut (V', V'') = Σ_{i &Element; V', j &Element; V''}^{n} e (i, j);$ The similarity between two segments V', V" of the definition graph is $cut (V', V'') = Σ_{i &Element; V', j &Element; V''}^{no} e (i, j);$

定义图的连接指数为 $assoc (X, V) = Σ_{i &Element; X, j &Element; V}^{n} ω (i, j);$ Define the connection index of the graph as $assoc (x, V) = Σ_{i &Element; x, j &Element; V}^{no} ω (i, j);$

建立约束条件 $Ncut (V', V'') = \frac{cut (V', V'')}{assoc (V', V)} + \frac{cut (V', V'')}{assoc (V'', V)};$ Create constraints $Ncut (V', V'') = \frac{cut (V', V'')}{assoc (V', V)} + \frac{cut (V', V'')}{assoc (V'', V)};$

反复迭代，得到全局最小的Ncut(V′，V＂)，获得对该图的最佳分割。Repeat iterations to obtain the global minimum Ncut(V', V"), and obtain the best segmentation of the graph.

优选的，通过将所述段中所有视频帧的色彩直方图取平均，然后选择与该平均色彩直方图最接近的视频帧作为关键帧。Preferably, the color histograms of all video frames in the segment are averaged, and then the video frame closest to the average color histogram is selected as the key frame.

进一步，还包括计算所述关键帧在该镜头中的权重。Further, it also includes calculating the weight of the key frame in the shot.

进一步，所述关键帧在镜头中的权重通过以下计算方法获得：Further, the weight of the key frame in the shot is obtained by the following calculation method:

整个镜头内的视频帧数为NT，当前段内的视频帧数为NK；The number of video frames in the entire shot is NT, and the number of video frames in the current segment is NK;

当前关键帧在整个镜头中的权重为 $W = \frac{NK}{NT} .$ The weight of the current keyframe in the whole shot is $W = \frac{NK}{NT} .$

根据本发明的实施例，还公开了一种基于图分割的关键帧提取系统，该系统包括：According to an embodiment of the present invention, a key frame extraction system based on graph segmentation is also disclosed, the system includes:

视频帧相似度矩阵计算模块，用于解析视频镜头，对视频镜头中的所有视频帧提取帧的特征，进行各帧图像之间相似度的计算，并形成镜头内视频帧相似度矩阵D_N×N；The video frame similarity matrix calculation module is used to analyze the video shot, extract the frame features of all video frames in the video shot, calculate the similarity between each frame image, and form the video frame similarity matrix D _{N ×} in the shot _N ;

建图模块，用于应用镜头内的全部视频帧建立一个图G＝(V，E)，其中镜头内的每一帧作为V的一个节点；A graph building module, used to set up a graph G=(V, E) using all video frames in the shot, wherein each frame in the shot is used as a node of V;

节点之间边计算模块，用于根据第i帧和第j帧的相似度及位置关系确定节点i和节点j之间的边；The edge calculation module between nodes is used to determine the edge between node i and node j according to the similarity and positional relationship between the i-th frame and the j-th frame;

图分割模块，用于对图G＝(V，E)应用Normalized Cuts方法进行分割，将图分成若干段；The graph segmentation module is used to segment the graph G=(V, E) using the Normalized Cuts method, and the graph is divided into several sections;

关键帧选取模块，用于从图中每段选取与所述段中其他帧最相似的一帧作为关键帧，并计算所述关键帧在该镜头中的权重。The key frame selection module is used to select a frame most similar to other frames in the segment from each segment in the figure as a key frame, and calculate the weight of the key frame in the shot.

进一步，所述节点之间边计算模块包括：Further, the edge calculation module between the nodes includes:

计算第i帧和第j帧的相似度D_ij的子模块；Calculate the submodule of the similarity D _ij of the i-th frame and the j-th frame;

计算节点i和节点j之间的位置权重 $ω (i, j) = e^{- \frac{1}{σ} {(i - j)}^{2}}$ 的子模块，其中σ为参数；Calculate the position weight between node i and node j $ω (i, j) = e^{- \frac{1}{σ} {(i - j)}^{2}}$ The sub-module of , where σ is a parameter;

计算节点i和节点j之间的边e(i，j)＝ω(i，j)×D_ij的子模块。A submodule that calculates the edge e(i, j)=ω(i, j)×D _ij between node i and node j.

进一步，所述图分割模块包括：Further, the graph segmentation module includes:

用于定义图的两个分段V′、V＂之间的相似度为 $cut (V', V'') = Σ_{i &Element; V', j &Element; V''}^{n} e (i, j)$ 的子模块；The similarity between two segments V', V" used to define the graph is $cut (V', V'') = Σ_{i &Element; V', j &Element; V''}^{no} e (i, j)$ submodule of

用于定义图的连接指数为 $assoc (X, V) = Σ_{i &Element; X, j &Element; V}^{n} ω (i, j)$ 的子模块；The connectivity index used to define the graph is $assoc (x, V) = Σ_{i &Element; x, j &Element; V}^{no} ω (i, j)$ submodule of

用于建立约束条件 $Ncut (V', V'') = \frac{cut (V', V'')}{assoc (V', V)} + \frac{cut (V', V'')}{assoc (V'', V)}$ 的子模块；used to create constraints $Ncut (V', V'') = \frac{cut (V', V'')}{assoc (V', V)} + \frac{cut (V', V'')}{assoc (V'', V)}$ submodule of

用于反复迭代，得到全局最小的Ncut(V′，V＂)，获得对该图的最佳分割的子模块。It is used for repeated iterations to obtain the global minimum Ncut(V', V"), and obtain the sub-modules for the optimal segmentation of the graph.

与现有技术相比，本发明具有以下优点：Compared with the prior art, the present invention has the following advantages:

本发明的方法通过用视频片段的帧建立一个图，通过Normalized Cuts方法将图进行分割，并且限制分割的约束条件，从而得到对该图的最优划分。应用Normally Cuts进行图分割时所限制的约束条件可以保证两个分段之间具有较大的差异，而每个分段内具有较好的相似度。在获得的每个视频片段上将与其他帧最相似的帧作为关键帧，可以保持关键帧之间具有较大不相似度；并且，该视频片段内的视频帧数占整个镜头帧数的比例值作为该关键帧在镜头上的权重，使得获得的关键帧对原始视频的表达更加准确。同时，本发明方法设计简单，易于实现。The method of the present invention establishes a graph by using frames of video clips, divides the graph by the Normalized Cuts method, and limits the constraint conditions of the segmentation, so as to obtain the optimal division of the graph. The constraints imposed by the application of Normally Cuts for graph segmentation can ensure that there is a large difference between the two segments, and a good similarity within each segment. On each obtained video clip, the frame most similar to other frames is used as a key frame, which can keep a large dissimilarity between key frames; and, the number of video frames in this video clip accounts for the proportion of the entire lens frame number The value is used as the weight of the keyframe on the lens, so that the obtained keyframe expresses the original video more accurately. At the same time, the method of the invention is simple in design and easy to implement.

附图说明 Description of drawings

图1是本发明一种基于图分割的关键帧提取方法实施例步骤流程图；Fig. 1 is a kind of key frame extracting method embodiment step flow chart based on graph segmentation of the present invention;

图2是本发明一种基于图分割的关键帧提取系统结构框图；Fig. 2 is a kind of structural block diagram of key frame extraction system based on graph segmentation of the present invention;

图3是本发明实施例对采访录像提取关键帧的结果与应用现有技术提取关键帧的结果的对比示意图。FIG. 3 is a schematic diagram of a comparison between the result of extracting key frames from the interview video according to the embodiment of the present invention and the result of extracting key frames using the prior art.

具体实施方式 Detailed ways

为使本发明的上述目的、特征和优点能够更加明显易懂，下面结合附图和具体实施方式对本发明作进一步详细的说明。In order to make the above objects, features and advantages of the present invention more comprehensible, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

参照图1，示出了本发明一种基于图分割的关键帧提取方法实施例，具体可以包括：Referring to Fig. 1, it shows an embodiment of a key frame extraction method based on graph segmentation of the present invention, which may specifically include:

步骤101、解析视频镜头，对视频镜头中的所有视频帧提取帧的特征，进行各帧图像之间相似度的计算，并形成镜头内视频帧相似度矩阵D_N×N，其中，D_ij中保存第i帧与第j帧的综合相似度。Step 101, analyze the video shot, extract frame features for all the video frames in the video shot, calculate the similarity between the images of each frame, and form the intra-shot video frame similarity matrix D _N×N , wherein, in D _ij Save the comprehensive similarity between frame i and frame j.

镜头，是一组时间上连续的帧序列，它代表一个场景中在时间上和空间上连续的动作，对应着摄像机的一次记录起停操作，也称为剪裁或拍摄。镜头是视频数据的最小单元。场景是一组语义上相关联及在时间上相邻的镜头的集合。A shot is a set of time-continuous frame sequences, which represent continuous actions in time and space in a scene, corresponding to a recording start-stop operation of the camera, also known as cutting or shooting. A shot is the smallest unit of video data. A scene is a collection of semantically related and temporally adjacent shots.

优选的，可以通过比较两帧图像的色彩直方图计算所述两帧图像之间的相似度。Preferably, the similarity between the two frames of images can be calculated by comparing the color histograms of the two frames of images.

色彩直方图，是最常用的表达图像颜色特征的方法，其优点是不受图像旋转和平移变化的影响，进一步借助归一化还可不受图像尺度变化的影响。颜色直方图简单描述一幅图像中颜色的全局分布，即不同色彩在整幅图像中所占的比例，特别适用于描述那些难以自动分割的图像和不需要考虑物体空间位置的图像。一般来说，两帧图像的色彩差异越大，则它们之间的相似程度越小，反之则相似程度越大。The color histogram is the most commonly used method to express the color characteristics of an image. Its advantage is that it is not affected by image rotation and translation changes, and it is not affected by image scale changes by further normalization. The color histogram simply describes the global distribution of colors in an image, that is, the proportion of different colors in the entire image, and is especially suitable for describing images that are difficult to automatically segment and images that do not need to consider the spatial position of objects. Generally speaking, the greater the color difference between two frames of images, the smaller the similarity between them, and vice versa.

步骤102、应用镜头内的全部视频帧建立一个图G＝(V，E)，其中镜头内的每一帧作为V的一个节点，节点i和节点j之间的边由第i帧和第j帧的相似度及位置关系确定。Step 102, apply all video frames in the shot to establish a graph G=(V, E), wherein each frame in the shot is used as a node of V, and the edge between node i and node j is composed of the i-th frame and the j-th frame The similarity and positional relationship of frames are determined.

无向图G＝(V，E)由两个集合V和E组成，其中，V是节点在有穷非空集合，E是边的有穷集合。本发明中，镜头内的每一帧作为V的一个节点。进一步，所述图G＝(V，E)中节点i和节点j之间的边通过以下步骤确定：An undirected graph G=(V, E) consists of two sets V and E, where V is a finite non-empty set of nodes, and E is a finite set of edges. In the present invention, each frame in the shot is regarded as a node of V. Further, the edge between node i and node j in the graph G=(V, E) is determined through the following steps:

步骤103、对图G＝(V，E)应用Normalized Cuts方法进行分割，将图分成若干段，具体可以包括以下步骤：Step 103, apply the Normalized Cuts method to graph G=(V, E) to segment, and the graph is divided into several sections, which may specifically include the following steps:

其中，V′及V＂是V中的任意两段，此两分段之间的相似度定义为V′中任一节点i和V＂中任一节点j之间的边e(i，j)之和；段之间的连接指数assoc(X，V)表示X中的节点到V中的点的位置权重的总和；当约束条件Ncut(V′，V＂)值最小时，保证了获得的图的分割满足段内视频帧有较大的相似度，段之间的视频帧有较小的相似度，此时获得最佳的分割。Among them, V' and V" are any two segments in V, and the similarity between these two segments is defined as the edge e(i, j ) sum; the connection index assoc(X, V) between segments represents the sum of the position weights of the nodes in X to the points in V; when the value of the constraint condition Ncut(V', V") is the smallest, it is guaranteed to obtain The segmentation of the graph satisfies that the video frames within the segment have a large similarity, and the video frames between the segments have a small similarity, and the best segmentation is obtained at this time.

Normalized Cuts方法是一种基于图划分，建立节点之间相似性度量的方法，其将图划分问题转化为一个利用求矩阵特征值得到次优解的问题。在具体的分割过程中，通过约束条件的建立，可以控制分割的程度，从而决定各分割块所携带信息量的大小。这种方法保证了全局最优性，因此可以得到较好的分割结果。有关该方法的具体理论属于本领域技术人员所熟知的，因此，不再赘述。另外，本领域技术人员当然也可以根据实际需要应用其他任何可行的数学模型对图进行分割，得到分割后段内视频帧有较大的相似度，段之间的视频帧有较小的相似度的最佳分割。The Normalized Cuts method is a method based on graph partitioning to establish a similarity measure between nodes, which transforms the graph partitioning problem into a problem of obtaining a suboptimal solution by finding the eigenvalues of the matrix. In the specific segmentation process, the degree of segmentation can be controlled through the establishment of constraint conditions, so as to determine the amount of information carried by each segmented block. This method guarantees the global optimality, so better segmentation results can be obtained. The specific theory of this method is well known to those skilled in the art, so it will not be repeated here. In addition, of course, those skilled in the art can also apply any other feasible mathematical model to segment the graph according to actual needs, so that after segmentation, the video frames in the segment have a relatively large similarity, and the video frames between segments have a small similarity. best split.

步骤104、从获得最佳分割的图中每段选取与所述段中其他帧最相似的一帧作为关键帧。Step 104: Select a frame most similar to other frames in the segment from each segment in the image obtained from the optimal segmentation as a key frame.

关键帧，是反映一组镜头中主要信息内容的一帧或若干帧图像，可以简洁地表达镜头内容。A key frame is one or several frames of images that reflect the main information content in a group of shots, and can express the contents of the shots concisely.

优选的，通过将所述段中所有视频帧的色彩直方图取平均，然后选择与该平均色彩直方图最接近的视频帧作为关键帧。该方法满足了针对关键帧的选取的两个基本要求：所选帧必须能够反映所在段中的主要事件，描述应尽可能准确完全；关键帧的抽取应尽量少而精确，保证了数据处理量应尽量小，计算不宜过于复杂。虽然在本发明实施例中采用色彩直方图作为特征进行关键帧的提取，但本发明并不对此进行限制，其他常用的图像比对算法如直接比较法、颜色特征法、纹理特征法、形状特征法、以及压缩域的图像比对算法等都在本发明的保护范围内。Preferably, the color histograms of all video frames in the segment are averaged, and then the video frame closest to the average color histogram is selected as the key frame. This method meets two basic requirements for the selection of key frames: the selected frame must be able to reflect the main events in the segment, and the description should be as accurate and complete as possible; the extraction of key frames should be as few and accurate as possible to ensure the amount of data processing It should be as small as possible, and the calculation should not be too complicated. Although in the embodiment of the present invention, the color histogram is used as a feature to extract the key frame, the present invention is not limited to this, other commonly used image comparison algorithms such as direct comparison method, color feature method, texture feature method, shape feature method method, and the image comparison algorithm in the compressed domain, etc. are all within the protection scope of the present invention.

此外，该步骤还包括计算所述关键帧在该镜头中的权重：In addition, this step also includes calculating the weight of the keyframe in the shot:

假设整个镜头内的视频帧数为NT，当前段内的视频帧数为NK，则当前关键帧在整个镜头中的权重为 $W = \frac{NK}{NT} .$ 通过为每个关键帧引入了权重因子，使得其对相应视频内容的描述更加精确有效。Assuming that the number of video frames in the entire shot is NT, and the number of video frames in the current segment is NK, the weight of the current key frame in the entire shot is $W = \frac{NK}{NT} .$ By introducing a weight factor for each key frame, it makes the description of the corresponding video content more accurate and effective.

应用本发明的方法，在检索所需的视频资料时，不必从头到尾地查找一段视频，而是通过关键帧的浏览来快速定位查询的内容，同时这种方式也有助于我们快速理解一段原始视频的内容，以决定是否为所需的视频资料。Applying the method of the present invention, when retrieving the required video material, it is not necessary to search for a section of video from the beginning to the end, but to quickly locate the content of the query through the browsing of key frames, and this method also helps us quickly understand a section of original video. The content of the video to determine whether it is the desired video material.

参照图2，示出了本发明一种基于图分割的关键帧提取系统实施例，具体可以包括：Referring to FIG. 2 , it shows an embodiment of a key frame extraction system based on graph segmentation of the present invention, which may specifically include:

视频帧相似度矩阵计算模块201，用于解析视频镜头，对视频镜头中的所有视频帧提取帧的特征，进行各帧图像之间相似度的计算，并形成镜头内视频帧相似度矩阵D_N×N；The video frame similarity matrix calculation module 201 is used to analyze the video shot, extract the frame features for all video frames in the video shot, perform the calculation of the similarity between each frame image, and form the video frame similarity matrix D _N in the shot _×N ;

建图模块202，用于应用镜头内的全部视频帧建立一个图G＝(V，E)，其中镜头内的每一帧作为V的一个节点；The graph building module 202 is used to set up a graph G=(V, E) using all the video frames in the shot, wherein each frame in the shot is used as a node of V;

节点之间边计算模块203，用于根据第i帧和第j帧的相似度及位置关系确定节点i和节点j之间的边；The edge calculation module 203 between nodes is used to determine the edge between node i and node j according to the similarity and positional relationship between the i frame and the j frame;

图分割模块204，用于对图G＝(V，E)应用Normalized Cuts方法进行分割，将图分成若干段；Figure segmentation module 204, is used for dividing into figure G=(V, E) application Normalized Cuts method, figure is divided into several sections;

关键帧选取模块205，用于从图中每段选取与所述段中其他帧最相似的一帧作为关键帧，并计算所述关键帧在该镜头中的权重。The key frame selection module 205 is configured to select a frame most similar to other frames in the segment from each segment in the figure as a key frame, and calculate the weight of the key frame in the shot.

优选的，对于所述节点之间边计算模块203的实现方式之一，其进一步包括：Preferably, for one of the implementations of the edge calculation module 203 between nodes, it further includes:

优选的，对于所述图分割模块204的实现方式，其进一步包括：Preferably, for the implementation of the graph segmentation module 204, it further includes:

本发明一种基于图分割的关键帧提取系统200在具体实现时，首先，由视频帧相似度矩阵计算模块201解析视频镜头，对视频镜头中的所有视频帧提取帧的特征，如色彩直方图相似度，进行各帧图像之间相似度的计算，并形成镜头内视频帧相似度矩阵D_N×N；然后，建图模块202将应用镜头内的全部视频帧建立无向图G＝(V，E)，其中镜头内的每一帧作为V的一个节点；利用节点之间边计算模块203，根据第i帧和第j帧的相似度及位置关系确定节点i和节点j之间的边；通过图分割模块204将图G＝(V，E)应用Normalized Cuts方法进行最佳分割，将图分成若干段；最后，利用关键帧选取模块205，从图中每段选取与所述段中其他帧最相似的一帧作为关键帧，并计算所述关键帧在该镜头中的权重。When a key frame extraction system 200 based on graph segmentation of the present invention is specifically implemented, first, the video frame similarity matrix calculation module 201 analyzes the video shots, and extracts frame features from all video frames in the video shots, such as a color histogram Similarity, to calculate the similarity between each frame of images, and form the video frame similarity matrix D _{N × N} in the shot; then, the mapping module 202 will use all the video frames in the shot to establish an undirected graph G=(V , E), wherein each frame in the shot is used as a node of V; the edge calculation module 203 between nodes is used to determine the edge between node i and node j according to the similarity and positional relationship between the i frame and the j frame ;By the graph segmentation module 204, graph G=(V, E) is applied to the Normalized Cuts method to carry out optimal segmentation, and the graph is divided into several sections; at last, utilize the key frame selection module 205 to select and describe the segment from each segment in the graph The frame to which the other frames are most similar is used as a keyframe, and the weight of said keyframe in that shot is calculated.

对于系统实施例而言，由于其与方法实施例基本相似，所以描述的比较简单，相关之处参见方法实施例的部分说明即可。As for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for the related parts, please refer to the part of the description of the method embodiment.

图3(a)所示是本发明实施例中对一段采访录像提取的3个关键帧示意图。第一个关键帧提取了视频镜头中的第60帧，第二个关键帧提取了视频镜头中的第190帧，第三个关键帧提取了视频镜头中的第270帧。其中，该段采访录像仅包含一个镜头，长度为300帧，镜头中前一部分是一个建筑工人在镜头前说话，在视频的中后一部分镜头转向建筑工地。这期间镜头中的画面有一个比较大的变化，镜头从人脸到天空再到工地的变化。图3(b)中给出了依据IEEE多媒体汇刊(IEEE Transaction on Multimedia，vol.7，no.6p.1097-1105，2005)上提出的方法提取的关键帧结果示意图；提取出的关键帧分别为视频镜头中的第91、156、199、258帧。同时，图3(c)是采用随机抽取的方法获得的关键帧示意图；提取出的关键帧分别为视频镜头中的第0、99、199、299帧。从图3中可以看出，应用本发明的方法获得的关键帧能够对原始视频镜头做相对更好的表示，同时关键帧之间距离较大。Fig. 3(a) is a schematic diagram of three key frames extracted from an interview video in the embodiment of the present invention. The first keyframe extracts frame 60 of the video footage, the second keyframe extracts frame 190 of the video footage, and the third keyframe extracts frame 270 of the video footage. Among them, the interview video contains only one shot with a length of 300 frames. The first part of the shot is a construction worker talking in front of the camera, and the second part of the video turns to the construction site. During this period, there was a relatively big change in the scene in the shot, from the face to the sky to the construction site. Figure 3(b) shows the schematic diagram of the key frame results extracted according to the method proposed in IEEE Transaction on Multimedia (IEEE Transaction on Multimedia, vol.7, no.6p.1097-1105, 2005); the extracted key frame Frames 91, 156, 199, and 258 in the video footage, respectively. At the same time, Fig. 3(c) is a schematic diagram of key frames obtained by random extraction; the extracted key frames are frames 0, 99, 199, and 299 in the video footage, respectively. It can be seen from FIG. 3 that the key frames obtained by applying the method of the present invention can represent the original video shot relatively better, and at the same time, the distance between the key frames is relatively large.

以上对本发明所提供的一种基于图分割的关键帧提取方法和系统，进行了详细介绍，本文中应用了具体个例对本发明的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本发明的方法及其核心思想；同时，对于本领域的一般技术人员，依据本发明的思想，在具体实施方式及应用范围上均会有改变之处，综上所述，本说明书内容不应理解为对本发明的限制。The method and system for extracting key frames based on graph segmentation provided by the present invention have been introduced in detail above. In this paper, specific examples have been used to illustrate the principle and implementation of the present invention. The description of the above embodiments is only for To help understand the method of the present invention and its core idea; at the same time, for those of ordinary skill in the art, according to the idea of the present invention, there will be changes in the specific implementation and scope of application. In summary, the content of this specification It should not be construed as limiting the invention.

Claims

1, a kind of extraction method of key frame of cutting apart based on figure is characterized in that, may further comprise the steps:

Resolve video lens,, carry out calculation of similarity degree between each two field picture, and form frame of video similarity matrix D in the camera lens the feature of all frame of video extraction frames in the video lens _{N * N}, wherein, D _IjMiddle comprehensive similarity of preserving i frame and j frame;

Use all videos frame in the camera lens set up a figure G=(V, E), wherein each frame in the camera lens is as the node of V, the limit between node i and the node j is determined by the similarity and the position relation of i frame and j frame;

(V E) uses Normalized Cuts method and cuts apart, and figure is divided into plurality of sections to figure G=;

From figure, choose a frame the most similar as key frame for every section to other frames in described section.

2, the extraction method of key frame of cutting apart based on figure according to claim 1 is characterized in that, calculates similarity between described two two field pictures by the color histogram of two two field pictures relatively.

3, the extraction method of key frame of cutting apart based on figure according to claim 1 is characterized in that, described figure G=(V, E) limit between middle node i and the node j is determined by following steps:

Calculate the similarity D of i frame and j frame _Ij

Position weight between computing node i and the node j

ω (i, j) = e^{- \frac{1}{σ} {(i - j)}^{2}}

, wherein σ is a parameter;

Limit e between computing node i and the node j (i, j)=ω (i, j) * D _Ij

4, the extraction method of key frame of cutting apart based on figure according to claim 1 is characterized in that, to figure G=(V, E) use Normally Cuts method and carry out figure and cut apart and may further comprise the steps:

Similarity between two segmentation V ', the V ＂ of definition figure is

cut (V', V'') = Σ_{i &Element; V', j &Element; V''}^{n} e (i, j);

The connection index of definition figure is

assoc (X, V) = Σ_{i &Element; X, j &Element; V}^{n} ω (i, j);

Set up constraint condition

Ncut (V', V'') = \frac{cut (V', V'')}{assoc (V', V)} + \frac{cut (V', V'')}{assoc (V'', V)};

Iterate, obtain the minimum Ncut of the overall situation (V ', V ＂), obtain optimal segmentation this figure.

5, the extraction method of key frame of cutting apart based on figure according to claim 1 is characterized in that, is averaged by the color histogram with all frame of video in described section, select then with the immediate frame of video of this average color histogram as key frame.

6, the extraction method of key frame of cutting apart based on figure according to claim 1 is characterized in that, also comprises calculating the weight of described key frame in this camera lens.

7, the extraction method of key frame of cutting apart based on figure according to claim 6 is characterized in that, the weight of described key frame in camera lens obtains by following computing method:

Video frame number in the whole camera lens is NT, and the video frame number in the present segment is NK;

The weight of current key frame in whole camera lens is

W = \frac{NK}{NT} .

8, a kind of key-frame extraction system of cutting apart based on figure is characterized in that, comprising:

Frame of video similarity matrix computing module is used to resolve video lens, to the feature of all frame of video extraction frames in the video lens, carries out calculation of similarity degree between each two field picture, and forms frame of video similarity matrix D in the camera lens _{N * N}

Build module, be used to use all videos frame in the camera lens set up a figure G=(V, E), wherein each frame in the camera lens is as the node of V;

Limit computing module between the node is used for according to the similarity of i frame and j frame and the limit between definite node i of position relation and the node j;

Figure is cut apart module, is used for that (V E) uses Normalized Cuts method and cuts apart, and figure is divided into plurality of sections to figure G=;

Key frame is chosen module, is used for choosing a frame the most similar to other frames described section as key frame from every section in figure, and calculates the weight of described key frame in this camera lens.

9, the key-frame extraction system of cutting apart based on figure according to claim 8 is characterized in that, calculates similarity between described two two field pictures by the color histogram of two two field pictures relatively.

10, the key-frame extraction system of cutting apart based on figure according to claim 8 is characterized in that the limit computing module further comprises between the described node:

Calculate the similarity D of i frame and j frame _IjSubmodule;

Position weight between computing node i and the node j

ω (i, j) = e^{- \frac{1}{σ} {(i - j)}^{2}}

Submodule, wherein σ is a parameter;

Limit e between computing node i and the node j (i, j)=ω (i, j) * D _IjSubmodule.

11, the key-frame extraction system of cutting apart based on figure according to claim 8 is characterized in that described figure is cut apart module and further comprised:

The similarity that is used between two segmentation V ', the V ＂ of definition figure is

cut (V', V'') = Σ_{i &Element; V', j &Element; V''}^{n} e (i, j)

Submodule;

The connection index that is used for definition figure is

assoc (X, V) = Σ_{i &Element; X, j &Element; V}^{n} ω (i, j)

Submodule;

Be used to set up constraint condition

Ncut (V', V'') = \frac{cut (V', V'')}{assoc (V', V)} + \frac{cut (V', V'')}{assoc (V'', V)}

Submodule;

Be used to iterate, obtain the minimum Ncut of the overall situation (V ', V ＂), obtain submodule the optimal segmentation of this figure.