CN104050247A

CN104050247A - Method for realizing quick retrieval of mass videos

Info

Publication number: CN104050247A
Application number: CN201410245315.2A
Authority: CN
Inventors: 逯利军; 钱培专; 董建磊; 张树民; 曹晶; 李克民; 高瑞
Original assignee: SHANGHAI MEIQI PUYUE COMMUNICATION TECHNOLOGY Co Ltd
Current assignee: SHANGHAI MEIQI PUYUE COMMUNICATION TECHNOLOGY Co Ltd
Priority date: 2014-06-04
Filing date: 2014-06-04
Publication date: 2014-09-17
Anticipated expiration: 2034-06-04
Also published as: CN104050247B

Abstract

The invention relates to a method for realizing fast retrieval of massive videos, which includes extracting spatial feature vectors for each frame of video images in a video stream in a video library to obtain a video feature sequence; extracting key feature vectors from the spatial feature vectors; The key feature vectors of all video files establish a distributed storage index library; extract the key feature vector set of the video to be retrieved and extract the video index file of the video to be retrieved; according to the video index file of the video to be retrieved in the distributed storage index library Video similarity retrieval and output video retrieval results whose similarity is greater than the system preset value. Using this structure to achieve fast retrieval of massive videos, representative visual words are used instead of key frames to completely represent video information, without a lot of redundancy, and very compact, speeding up the retrieval speed, and having the ability to concurrently process massive data , has a wider range of applications.

Description

A Method for Fast Retrieval of Massive Videos

技术领域 technical field

本发明涉及多媒体信息技术领域，尤其涉及多媒体信息检索、数据挖掘和视频处理领域，具体是指一种实现海量视频快速检索的方法。 The invention relates to the technical field of multimedia information, in particular to the fields of multimedia information retrieval, data mining and video processing, and specifically refers to a method for realizing fast retrieval of massive videos. the

背景技术 Background technique

随着多媒体信息技术飞速发展，视频分享网站的出现，互联网视频数量迅速增长，且成几何级数上升。通过网络发布、分享和检索视频成为了人们的一种生活方式。面对海量的多媒体数据，如何快速的检索到相同或相似的视频成为了当前业界和学术界研究的热点。 With the rapid development of multimedia information technology and the emergence of video sharing websites, the number of Internet videos has grown rapidly, and has increased exponentially. Publishing, sharing and retrieving videos through the Internet has become a way of life for people. In the face of massive multimedia data, how to quickly retrieve the same or similar videos has become a research hotspot in the industry and academia. the

传统的基于关键帧的视频检索方法主要倾向于视频检索的准确性，但计算复杂度极高，要花费若干分钟才能完成一次检索任务。面对海量的网路视频，传统的视频比对技术已不能胜任。当前面向互联网的视频检索技术，借鉴了传统文本搜索引擎的核心思想，把视频特征看作视频单词(visual word)，构建视频文件的倒排索引，实现对海量视频文件的快速索引。 Traditional video retrieval methods based on key frames are mainly inclined to the accuracy of video retrieval, but the computational complexity is extremely high, and it takes several minutes to complete a retrieval task. In the face of massive online videos, traditional video comparison technology is no longer competent. The current Internet-oriented video retrieval technology draws on the core ideas of traditional text search engines, regards video features as visual words, constructs an inverted index of video files, and realizes fast indexing of massive video files. the

成功的匹配有赖于检索视频和参考视频自身信息的丰富程度和自身信息被表达、描述的程度。面向互联网视频检索方法在提取关键帧的时候，往往不是按照传统的方法，先进行镜头切分，再提取镜头关键帧，因为提取关键帧的位置会受到视频帧率、分辨率等因素的影响，关键帧不能稳定、可靠的提取。更为简单可行方法是对视频每隔1秒做一次采样，作为关键帧。其实相当于增加了采样的频率，采样频率越高，原有信息表达的越充分，但计算量就会越大。通过增加采样频率来增加信息表达的程度，这样会导致，既有信息被过度表达产生冗余，又有信息没有被充分表达，导致信息丢失。而线性采样会使丢失的信息具有随机性，因为视频信息并不是线性表达的。随机丢失的信息会降低检索的准确性和稳定性。另外一方面，传统的关键帧提取方法，一般信息变化较小的地方提取较少的关键帧，在视频帧变过较大的地方提取较多的关键帧，会产生比较紧致而较完整的表达信息，其程度取决于聚类或分割的阈值。检索视频和参考视频往往会受到各种噪声干扰，比如视频分辨率变差，网络丢包，视频丢帧，低帧率，视频插入，视频编辑等，会使原有的视频信息混有噪声，或导致部分信息丢失而不再完整。传统的视频关键帧提取方法过于理想化，a)未考虑外界干扰的复杂性，适当冗余度是必要的，b)其用于提取关键帧的特征并未针对海量检索任务而构建，相关的方法并不合适直接用来提取关键帧。如何选择恰当的检索特征，使得构建的关键帧序列的帧数量最少，视频镜头信息表达的相对完整而又有适当冗余，成为了面向海量数据检索技术丞待解决的关键问题。 Successful matching depends on the richness of the retrieval video and the reference video's own information and the degree to which its own information is expressed and described. Internet-oriented video retrieval methods often do not follow the traditional method when extracting key frames, first segment the shot, and then extract the key frames of the shot, because the location of the key frame is affected by factors such as video frame rate and resolution. Keyframes cannot be extracted stably and reliably. A simpler and feasible method is to sample the video every 1 second as a key frame. In fact, it is equivalent to increasing the sampling frequency. The higher the sampling frequency, the more fully the original information can be expressed, but the greater the amount of calculation will be. By increasing the sampling frequency to increase the degree of information expression, this will lead to both information being over-expressed and resulting in redundancy, and information not being fully expressed, resulting in information loss. However, linear sampling will make the lost information random, because the video information is not expressed linearly. Randomly missing information reduces the accuracy and stability of retrieval. On the other hand, the traditional key frame extraction method generally extracts fewer key frames where the information changes less, and extracts more key frames where the video frame changes too much, resulting in a more compact and complete image. Express information, the degree of which depends on the clustering or segmentation threshold. Retrieval videos and reference videos are often disturbed by various noises, such as video resolution degradation, network packet loss, video frame loss, low frame rate, video insertion, video editing, etc., which will make the original video information mixed with noise, Or cause partial information to be lost and no longer complete. The traditional video key frame extraction method is too idealistic, a) the complexity of external interference is not considered, and appropriate redundancy is necessary, b) the features used to extract key frames are not constructed for massive retrieval tasks, related method is not suitable for directly extracting keyframes. How to select the appropriate retrieval features to minimize the number of frames in the constructed key frame sequence and to express relatively complete and redundant video shot information has become a key problem to be solved for mass data retrieval technology. the

发明内容 Contents of the invention

本发明的目的是克服了上述现有技术的缺点，提供了一种能够实现采用代表性的视觉单词代替关键帧、既无大量冗余、又十分紧凑、加快检索速度、具有海量数据并发处理能力、具有更广泛应用范围的实现海量视频快速检索的方法。 The purpose of the present invention is to overcome the shortcomings of the above-mentioned prior art, and to provide a method that can realize the use of representative visual words instead of key frames, has neither a large amount of redundancy, but is also very compact, speeds up retrieval, and has massive data concurrent processing capabilities. , A method for realizing fast retrieval of massive videos with wider application scope. the

为了实现上述目的，本发明的实现海量视频快速检索的方法具有如下构成： In order to achieve the above object, the method for realizing the fast retrieval of massive videos of the present invention has the following composition:

该实现海量视频快速检索的方法，其主要特点是，所述的方法包括以下步骤： The method for realizing the fast retrieval of massive video, its main feature is that described method comprises the following steps:

(1)对视频库的视频流中各帧视频图像分别提取空间特征向量得到视频特征序列； (1) Extract the spatial feature vectors for each frame of video images in the video stream of the video library to obtain the video feature sequence;

(2)在所述的视频特征序列的空间特征向量中提取关键特征向量； (2) extract key feature vector in the spatial feature vector of described video feature sequence;

(3)根据视频库中所有视频文件的关键特征向量建立所有视频文件的分布式存储索引库； (3) set up the distributed storage index library of all video files according to the key feature vector of all video files in the video library;

(4)提取待检索视频的关键特征向量集并提取该待检索视频的视频索引文件； (4) Extract the key feature vector set of the video to be retrieved and extract the video index file of the video to be retrieved;

(5)根据所述的待检索视频的视频索引文件在所述的分布式存储索引库中进行视频相似度检索并输出相似度大于系统预设值的视频检索结果。 (5) Perform video similarity retrieval in the distributed storage index library according to the video index file of the video to be retrieved, and output video retrieval results whose similarity is greater than the system preset value. the

较佳地，所述的空间特征向量包括所对应帧图像的灰度空间分布特征和纹理空间分布特征，所述的对视频库的视频流中各帧视频图像分别提取空间特征向量，包括以下步骤： Preferably, the spatial feature vector includes the grayscale spatial distribution feature and the texture spatial distribution feature of the corresponding frame image, and the extraction of the spatial feature vector for each frame video image in the video stream of the video library includes the following steps :

(11)计算得到视频库的视频流中各帧视频图像的灰度图像和边缘纹理图像； (11) calculate the grayscale image and the edge texture image of each frame video image in the video stream of the video storehouse;

(12)计算各帧视频图像的灰度图像的中心空间特征和边界空间特征并得到由所述的中心空间特征和边界空间特征构成的该帧视频图像的灰度空间分布特征； (12) calculate the central space feature and the boundary space feature of the grayscale image of each frame video image and obtain the grayscale space distribution feature of this frame video image formed by the center space feature and the boundary space feature;

(13)计算各帧视频图像的边缘纹理图像的纹理空间分布特征。 (13) Calculate the texture spatial distribution feature of the edge texture image of each frame of video image. the

更佳地，所述的计算得到视频库的视频流中各帧视频图像的灰度图像和边缘纹理图像，包括以下步骤： Preferably, the grayscale image and the edge texture image of each frame video image in the video stream of the video storehouse are obtained by the calculation, comprising the following steps:

(111)将视频库的视频流中的各帧视频图像划分成数个同样大小的子图像并计算各个子图像的灰度值和纹理边缘点数目； (111) divide each frame video image in the video stream of the video storehouse into several sub-images of the same size and calculate the gray value and the number of texture edge points of each sub-image;

(112)计算各帧视频图像的各个子图像的灰度值得到该帧视频图像的灰度图像； (112) calculate the grayscale value of each sub-image of each frame video image to obtain the grayscale image of this frame video image;

(113)计算各帧视频图像的各个子图像的纹理边缘点数目得到该帧视频图像的边缘纹理图像。 (113) Calculate the number of texture edge points of each sub-image of each frame of video image to obtain the edge texture image of the frame of video image. the

更佳地，所述的计算各帧视频图像的灰度图像的中心空间特征和边界空间特征，具体为： More preferably, the central space feature and the boundary space feature of the grayscale image of each frame video image are calculated, specifically:

计算各帧视频图像的灰度图像的局部二值模式的中心空间特征和边界空间特征； Calculate the center space feature and boundary space feature of the local binary pattern of the grayscale image of each frame video image;

所述的计算各帧视频图像的边缘纹理图像的纹理空间分布特征，具体为： The texture space distribution feature of the edge texture image of the described calculation each frame video image is specifically:

计算各帧视频图像的边缘纹理图像的局部二值模式的纹理空间分布特征。 The texture spatial distribution feature of the local binary pattern of the edge texture image of each frame video image is calculated. the

更佳地，所述的空间特征向量还包括颜色直方图特征，所述的对视频库的视频流中各帧视频图像分别提取空间特征向量，还包括以下步骤： Preferably, said spatial feature vector also includes a color histogram feature, and said extracting a spatial feature vector for each frame video image in the video stream of the video library also includes the following steps:

(14)计算各帧视频图像的颜色直方图特征。 (14) Calculate the color histogram feature of each frame of video image. the

较佳地，所述的在所述的视频特征序列的空间特征向量中提取关键特征向量，包括以下步骤： Preferably, the extraction of the key feature vector in the spatial feature vector of the video feature sequence comprises the following steps:

(21)将所述的视频特征序列的第一个空间特征向量默认为关键特征向量； (21) the first spatial feature vector of the video feature sequence is defaulted as the key feature vector;

(22)计算各个空间特征向量与前一关键特征向量的马氏距离； (22) Calculate the Mahalanobis distance between each spatial feature vector and the previous key feature vector;

(23)将大于系统预设阈值的马氏距离所对应的空间特征向量提取为关键特征向量。 (23) Extract the spatial feature vector corresponding to the Mahalanobis distance greater than the system preset threshold as the key feature vector. the

较佳地，所述的根据视频库中所有视频文件的关键特征向量建立所有视频文件的分布式存储索引库，包括以下步骤： Preferably, the described key feature vector of all video files in the video storehouse is set up the distributed storage index storehouse of all video files, comprises the following steps:

(31)建立所述的视频特征序列中关键特征向量的子空间投影直方图并记录各个关键特征向量在所对应视频中出现的频次； (31) establish the subspace projection histogram of the key feature vector in the video feature sequence and record the frequency of occurrence of each key feature vector in the corresponding video;

(32)建立视频库的所有视频文件的倒排索引文件； (32) set up the inverted index file of all video files of video storehouse;

(33)建立视频库的所有视频文件的分布式索引数据库。 (33) Establish a distributed index database of all video files in the video library. the

更佳地，所述的建立视频特征序列中关键特征向量的子空间投影直方图，具体为： Preferably, the subspace projection histogram of the key feature vector in the described establishment of the video feature sequence is specifically:

将视频特征序列中关键特征向量投影到灰度子空间、纹理子空间和颜色子空间中并获得各个关键特征向量的子空间投影直方图。 The key feature vectors in the video feature sequence are projected into the gray subspace, texture subspace and color subspace, and the subspace projection histogram of each key feature vector is obtained. the

更进一步地，所述的记录各个关键特征向量在所对应视频中出现的频次，具体为： Furthermore, the frequency of recording each key feature vector in the corresponding video is specifically:

记录各个关键特征向量所对应的子空间投影直方图中表示该关键特征向量在视频中出现频次的特征值。 A feature value representing the frequency of appearance of the key feature vector in the video is recorded in the subspace projection histogram corresponding to each key feature vector. the

更进一步地，所述的建立视频库的所有视频文件的倒排索引文件，包括以下步骤： Further, the described inverted index file of all video files of building video storehouse, comprises the following steps:

(321)统计视频库中各个视频文件所对应的关键特征向量集合并构成该视频库的统计关键特征向量库； (321) the corresponding key feature vector set of each video file in the statistical video storehouse and constitute the statistical key feature vector storehouse of this video storehouse;

(322)建立所述的统计关键特征向量库中的各个关键特征向量对应的拥有该关键特征向量的文档集合； (322) Establishing a document set corresponding to each key feature vector in the statistical key feature vector library having the key feature vector;

(323)将关键特征向量集合的文档按照所含关键特征向量的数量从多到少进行排序； (323) sort the documents of the key feature vector set according to the number of key feature vectors contained in them from many to few;

(324)根据各个子空间建立视频库的所有视频文件的倒排索引文件。 (324) Establish an inverted index file of all video files in the video library according to each subspace. the

再进一步地，所述的建立视频库的所有视频文件的分布式索引数据库，包括以下步骤： Further still, the described distributed index database of all video files of setting up video storehouse, comprises the following steps:

(331)利用基于p-stable的局部敏感哈希算法将各个子空间的关键特征向量映射到一维空间； (331) Utilize the local sensitive hash algorithm based on p-stable to map the key feature vectors of each subspace to one-dimensional space;

(332)基于Hadoop分布式文件系统架构采用name_node维护哈希表并采用data_node保存索引数据为所有视频文件的分布式索引数据库。 (332) Based on the Hadoop distributed file system architecture, name_node is used to maintain the hash table and data_node is used to store index data as a distributed index database for all video files. the

更佳地，所述的根据所述的待检索视频的视频索引文件在所述的分布式存储索引库中进行视频相似度检索，具体为： More preferably, the video similarity retrieval is performed in the distributed storage index library according to the video index file of the video to be retrieved, specifically:

(51)计算待检索视频子空间投影直方图和视频库中各个视频子空间投影直方图的交作为待检索视频和视频库中各个视频的相似度； (51) Computing the intersection of the video subspace projection histogram to be retrieved and each video subspace projection histogram in the video library as the similarity of each video in the video to be retrieved and the video library;

(52)根据待检索视频和视频库中各个视频的关键特征向量的时空结构一致性剔除不符合时空结构一致性要求的视频文件。 (52) According to the time-space structure consistency of the video to be retrieved and the key feature vectors of each video in the video library, video files that do not meet the time-space structure consistency requirements are eliminated. the

更进一步地，所述的输出相似度大于系统预设值的视频检索结果，包括以下步骤： Furthermore, the video retrieval result of the output similarity greater than the system preset value includes the following steps:

(52)提取待检索视频的关键特征向量的各子空间投影直方图并将各个关键特征向量在各个子空间内映射为哈希值； (52) Extract each subspace projection histogram of the key feature vector of the video to be retrieved and map each key feature vector into a hash value in each subspace;

(53)通过所述的倒排索引文件选取分布式索引数据库中相似度符合系统预设要求的视频文件作为输出； (53) select the video file whose similarity in the distributed index database meets the system preset requirements through the inverted index file as output;

(54)计算待检索视频和视频库中各个视频的关键特征向量的时空结构一致性并输出与所述的待检索视频的相似度大于系统预设值的视频文件。 (54) Calculate the spatio-temporal structure consistency of the video to be retrieved and the key feature vectors of each video in the video database and output a video file whose similarity with the video to be retrieved is greater than the system preset value. the

采用了该发明中的实现海量视频快速检索的方法，具有如下有益效果： Adopting the method for realizing massive video fast retrieval in this invention has the following beneficial effects:

本发明主要针对构建视频索引信息的完整性和索引特征的选择问题，提出了一种基于视频指纹的子空间方法，解决当前面向海量数据的快速、鲁棒的检索问题。首先，本专利采用新颖的关键帧提取方法，用关键特征向量的提取代替关键帧提取，直接用代表性的视觉特征代替关键帧，相当于在特征空间对原始视频进行了编码，完整的表达了视频信息，既无大量冗余，又非常紧凑，并克服了当前关键帧提取参数选择问题。其次，把每一个视觉特征映射成一维哈希值，按照视觉特征的哈希值所在范围，选择合适的HDFS(Hadoop Distributed File System，Hadoop分布式文件系统)的name_node(名称节点)和data_node(数据节点)，即加快检索速度，又使之具有海量数据并发处理的能力，具有更广泛的应用范围。 The invention mainly aims at the integrity of video index information and the selection of index features, and proposes a subspace method based on video fingerprints to solve the current fast and robust retrieval problem facing massive data. First of all, this patent adopts a novel key frame extraction method, replaces key frame extraction with key feature vector extraction, and directly replaces key frames with representative visual features, which is equivalent to encoding the original video in the feature space, completely expressing The video information has neither a large amount of redundancy, but also is very compact, and overcomes the current key frame extraction parameter selection problem. Secondly, each visual feature is mapped into a one-dimensional hash value, and the name_node (name node) and data_node (data node) of the appropriate HDFS (Hadoop Distributed File System, Hadoop Distributed File System) are selected according to the range of the hash value of the visual feature. node), which not only speeds up the retrieval speed, but also enables it to have the ability to process massive data concurrently, and has a wider range of applications. the

附图说明 Description of drawings

图1为本发明的实现海量视频快速检索的方法的流程图。 FIG. 1 is a flowchart of a method for realizing fast retrieval of massive videos in the present invention. the

图2为本发明的实现海量视频快速检索的方法应用于具体实施例的流程图。 FIG. 2 is a flow chart of a method for realizing fast retrieval of massive videos of the present invention applied to a specific embodiment. the

图3为本发明的将视频帧序列映射到视频特征序列的流程图。 Fig. 3 is a flowchart of mapping a video frame sequence to a video feature sequence in the present invention. the

图4为本发明的计算灰度空间分布特征的流程图。 Fig. 4 is a flow chart of calculating gray-scale spatial distribution characteristics in the present invention. the

图5为本发明的提取关键特征向量的流程图。 Fig. 5 is a flow chart of extracting key feature vectors in the present invention. the

具体实施方式 Detailed ways

为了能够更清楚地描述本发明的技术内容，下面结合具体实施例来进行进一步的描述。 In order to describe the technical content of the present invention more clearly, further description will be given below in conjunction with specific embodiments. the

本发明公开了一种海量视频快速检索方法及系统，其中该方法包括：把视频帧序列映射到空间特征向量组成的视频特征序列，提取其中具有代表性的特征作为视频特征序列的关键特征向量；通过哈希函数映射所述关键特征向量，根据映射得到的哈希值所在的哈希桶，构建分布式索引；根据待检索视频的关键特征向量集，计算各对应哈希值所在哈希桶编号，提取对应特征的视频索引文件，通过投票方式获取候选视频文件，计算待检索视频和候选视频文件的相似度，输出相似度大于一定阈值的作为检索结果。 The invention discloses a method and system for fast retrieval of massive videos, wherein the method includes: mapping a video frame sequence to a video feature sequence composed of spatial feature vectors, and extracting representative features among them as key feature vectors of the video feature sequence; Map the key feature vector through a hash function, construct a distributed index according to the hash bucket where the mapped hash value is located; calculate the number of the hash bucket where each corresponding hash value is located according to the key feature vector set of the video to be retrieved , extract the video index files corresponding to the features, obtain candidate video files by voting, calculate the similarity between the video to be retrieved and the candidate video files, and output the similarity greater than a certain threshold as the retrieval result. the

如图1所示，本发明的实现海量视频快速检索的方法包括以下步骤： As shown in Figure 1, the method for realizing massive video fast retrieval of the present invention comprises the following steps:

在一种优选的实施方式中，所述的空间特征向量包括所对应帧图像的灰度空间分布特征和纹理空间分布特征，因此，所述的对视频库的视频流中各帧视频图像分别提取空间特征向量得到视频特征序列，包括以下步骤： In a preferred embodiment, the spatial feature vector includes the grayscale spatial distribution feature and the texture spatial distribution feature of the corresponding frame image, therefore, each frame video image in the video stream of the video library is extracted respectively The spatial feature vector obtains the video feature sequence, including the following steps:

在一种优选的实施方式中，计算灰度图像和边缘纹理图像可以采用以下这种方式，即所述的计算得到视频库的视频流中各帧视频图像的灰度图像和边缘纹理图像，包括以下步骤： In a preferred embodiment, the grayscale image and the edge texture image can be calculated in the following manner, that is, the grayscale image and the edge texture image of each frame of the video image in the video stream of the video library are calculated, including The following steps:

其中中心空间特征和边界空间特征可以是基于局部二值模式的中心空间特征和边界空间特征。 The central spatial feature and the boundary spatial feature may be the central spatial feature and the boundary spatial feature based on local binary patterns. the

其中，纹理空间分布特征可以是基于局部二值模式的纹理空间分布特征。 Wherein, the texture spatial distribution feature may be a texture spatial distribution feature based on a local binary mode. the

在一种更优选的实施方式中，所述的空间特征向量还可以进一步包括颜色直方图特征，使得空间特征向量更能代表视频特征，即所述的对视频库的视频流中各帧视频图像分别提取空间特征向量，还包括以下步骤： In a more preferred embodiment, the spatial feature vector can further include color histogram features, so that the spatial feature vector can better represent the video features, that is, each frame video image in the video stream of the video library Extracting spatial feature vectors respectively, also includes the following steps:

在一种优选的实施方式中，提取关键特征向量包括以下步骤： In a preferred embodiment, extracting key feature vectors includes the following steps:

在一种优选的实施方式中，建立分布式存储索引库包括以下步骤： In a preferred embodiment, establishing a distributed storage index library includes the following steps:

更进一步地，子空间可以是灰度子空间和纹理子空间，还可以包括颜色子空间，因此所述的建立视频特征序列中关键特征向量的子空间投影直方图，具体为： Furthermore, the subspace can be a grayscale subspace and a texture subspace, and can also include a color subspace, so the subspace projection histogram of the key feature vector in the described establishment of the video feature sequence is specifically:

(322)建立所述的统计关键特征向量库中的各个关键特征向量对应的拥有该关键特征向量的文档集合； (322) set up the document set corresponding to each key feature vector in the statistical key feature vector library with the key feature vector;

更进一步地，所述的建立视频库的所有视频文件的分布式索引数据库，包括以下步骤： Further, the described distributed index database of all video files of building video storehouse, comprises the following steps:

(4)提取待检索视频的关键特征向量集并提取该待检索视频的视频索引文件；在具体实施应用中，此处提取待检索视频的关键特征向量可以采用如步骤(1)和(2)中的关键特征向量提取方法。 (4) Extract the key feature vector set of the video to be retrieved and extract the video index file of the video to be retrieved; in specific implementation applications, extract the key feature vector of the video to be retrieved here as steps (1) and (2) The key feature vector extraction method in . the

在一种优选的实施方式中，所述的根据所述的待检索视频的视频索引文件在所述的分布式存储索引库中进行视频相似度检索，具体为： In a preferred embodiment, the video similarity retrieval is performed in the distributed storage index library according to the video index file of the video to be retrieved, specifically:

在一种优选的实施方式中，所述的输出相似度大于系统预设值的视频检索结果，包括以下步骤： In a preferred embodiment, the video retrieval result whose similarity is greater than the system preset value includes the following steps:

下面以一个具体实施例来进一步阐述本发明的实现海量视频快速检索的方法，如图2所示，在具体应用中，该方法包括以下步骤： The method for realizing massive video fast retrieval of the present invention is further described below with a specific embodiment, as shown in Figure 2, in specific application, this method comprises the following steps:

(1)视频空间特征编码，即将视频帧序列映射到视频特征序列；如图3所示，具体包括以下子步骤： (1) Video space feature encoding, which is to map the video frame sequence to the video feature sequence; as shown in Figure 3, it specifically includes the following sub-steps:

(11)从视频流读取一帧视频图像，把图像划分成MxN个同样大小的子图像，计算各子图像灰度值和纹理边缘点数目； (11) Read a frame of video image from the video stream, divide the image into MxN sub-images of the same size, and calculate the gray value of each sub-image and the number of texture edge points;

(12)计算灰度图像两种类型的LBP(Local binary pattern，局部二值模式)空间特征v_gray，如图4所示中心特征(f)和边界特征(g)，由中心特征和边界特征共同构成8位的视频帧的空间分布特征，见图4中的(h)； (12) Calculate the two types of LBP (Local binary pattern, local binary pattern) spatial features v_gray of the grayscale image, as shown in Figure 4. The center feature (f) and the boundary feature (g) are shared by the center feature and the boundary feature Constitute the spatial distribution characteristics of 8-bit video frames, see (h) in Fig. 4;

(13)同上，计算边缘纹理图像的LBP空间分布特征v_texture，简单起见，可统计图像块内部边缘的个数作为纹理复杂度的度量值，其计算结果同上为一8位空间纹理分布特征； (13) As above, calculate the LBP spatial distribution feature v_texture of the edge texture image. For simplicity, the number of internal edges of the image block can be counted as a measure of texture complexity, and the calculation result is an 8-bit spatial texture distribution feature as above;

(14)结合v_gray和v_texture特征，构造多元帧特征v＝(v_gray,v_texture)，我们把一个帧特征v称为一个帧视觉单词(visual word)； (14) Combining v_gray and v_texture features to construct multiple frame features v=(v_gray, v_texture), we call a frame feature v a frame visual word (visual word);

(15)除计算图形的灰度和纹理LBP空间特征外，还可以添加其他帧特征，比如8或16bins的颜色直方图v_color_his_16，此时v＝(v_gray,v_texture,v_color_his_16)；该帧特征构成方式可以克服单个特征子空间不能很好表达视频帧的缺陷。 (15) In addition to calculating the grayscale and texture LBP spatial features of graphics, other frame features can also be added, such as the color histogram v_color_his_16 of 8 or 16 bins, at this time v=(v_gray, v_texture, v_color_his_16); the frame feature composition method It can overcome the defect that a single feature subspace cannot express video frames well. the

本专利不考虑时间特征，因为检索视频的时间特征受低帧率或缺帧等其他干扰因素的影响具有很大的不确定性，根据时间序列构造的帧的时空特征很有可能是错误的。而是在相似检索过程中验证时间顺序的一致性。 This patent does not consider the time feature, because the time feature of the retrieved video is affected by other interference factors such as low frame rate or lack of frames, which has great uncertainty, and the spatiotemporal features of the frames constructed according to the time series are likely to be wrong. Instead, temporal order consistency is verified during similarity retrieval. the

(2)视频关键特征提取，即提取其中具有代表性的特征作为视频特征序列的关键特征向量；如图5所示，具体包括以下子步骤： (2) Video key feature extraction, that is, extracting representative features as the key feature vector of the video feature sequence; as shown in Figure 5, specifically including the following sub-steps:

(21)将视频特征序列的第一个空间特征向量作为默认关键特征向量； (21) The first spatial feature vector of the video feature sequence is used as the default key feature vector;

(22)提取当前第n帧的空间特征向量v(n)，如果当前特征v(n)与前一关键特征向量(v(m),m)的马氏距离大于阈值thrsh，考虑到噪声因素，本文1<＝thrsh<＝2，则当前帧为关键特征向量，记为(v(n),n)。 (22) Extract the spatial feature vector v(n) of the current nth frame, if the Mahalanobis distance between the current feature v(n) and the previous key feature vector (v(m),m) is greater than the threshold thrsh, considering the noise factor , in this paper 1<=thrsh<=2, then the current frame is the key feature vector, denoted as (v(n),n). the

两个不同的特征向量v1和v2表达了不同的视频内容。用具有代表性的关键特征向量key vecotor代替传统的关键帧向量，不但省去了关键帧提取这一步骤，而用源生特征来表达视频内容更加直接、准确。解决了视频索引信息的完整性和索引特征的选择问题。 Two different feature vectors v1 and v2 express different video contents. Replacing the traditional key frame vector with a representative key feature vector key vector not only saves the step of key frame extraction, but also expresses video content more directly and accurately with original features. The completeness of video index information and the selection of index features are solved. the

我们把关键特征向量(key vector)，称为视觉单词(visual word)，visual word的集合称为视觉词汇表(visual vocabulary)。单个视频文件的特征向量集的直方图称为特征直方图 (vector histogram或者visual word histogram)。为了使key vector具有丰富的表达能力和抽象概括能力，key vector由不同但独立的sub vector构成空间灰度分布特征Gray-LBP vector，空间纹理分布特征Texture-LBP vector和color vector组成，可简单表示为key vector＝{Gray-LBP,Texture-LBP,Color}。由不同的抽象特征概念空间共同构成乘性描述空间实现了key vector丰富的表达能力和抽象概括能力。 We call the key feature vector (key vector) a visual word, and the collection of visual words is called a visual vocabulary. The histogram of the feature vector set of a single video file is called a feature histogram (vector histogram or visual word histogram). In order to make the key vector have rich expression ability and abstract generalization ability, the key vector is composed of different but independent sub vectors to form the gray-LBP vector of the spatial gray distribution feature, the Texture-LBP vector and the color vector of the spatial texture distribution feature, which can be simply expressed It is key vector={Gray-LBP, Texture-LBP, Color}. The multiplicative description space composed of different abstract feature concept spaces realizes the rich expressive ability and abstract generalization ability of key vector. the

本专利与其他关键帧提取不同之处在于，本发明是直接在视频流中提取关键特征，而不是传统意义上的关键帧提取。 The difference between this patent and other key frame extraction is that the present invention directly extracts key features from video streams, rather than traditional key frame extraction. the

传统关键帧提取是利用关键帧提取算法提取关键帧，然后利用提取的关键帧再提取检索特征，提取关键帧所用的方法和提取关键帧后计算的检索特征并不完全等同，有时候差异很大，这样会导致描述不准确；这也是传统检索特征准确性不够高的原因之一。 Traditional key frame extraction is to use the key frame extraction algorithm to extract key frames, and then use the extracted key frames to extract retrieval features. The method used to extract key frames is not exactly the same as the retrieval features calculated after extracting key frames, and sometimes the difference is very large , which will lead to inaccurate description; this is one of the reasons why the accuracy of traditional retrieval features is not high enough. the

(3)视频帧序列到视频视觉单词直方图的映射，具体包括以下子步骤： (3) The mapping of the video frame sequence to the video visual word histogram specifically includes the following sub-steps:

(31)由于视觉单词可能具有很高的维度，比如(f_gray,f_texture,f_color_his_16)的维数(8,8,16)共32维，其内存需求近1GB，我们把32维空间分别再重投到f_gray8位子空间，f_texture8位子空间，f_color_his_16位子空间中，分别统计他们在子空间的直方图，其内存需求显著降低，不足70MB，单个视频文件的直方图大小几乎不超过10MB； (31) Since visual words may have very high dimensions, for example, the dimensions (8, 8, 16) of (f_gray, f_texture, f_color_his_16) have a total of 32 dimensions, and its memory requirement is nearly 1GB. We re-project the 32-dimensional space respectively Go to the f_gray8-bit subspace, f_texture8-bit subspace, and f_color_his_16-bit subspace, respectively count their histograms in the subspaces, the memory requirements are significantly reduced, less than 70MB, and the histogram size of a single video file hardly exceeds 10MB;

(32)子空间投影直方图的bin(某个子空间特征，比如8位LBP特征)的数值代表该特征在视频中出现的频次，为了保持同一个bin内部该特征在时间的分布，采用如下方式记录bin内容： (32) The value of the bin (a subspace feature, such as an 8-bit LBP feature) of the subspace projection histogram represents the frequency of the feature appearing in the video. In order to maintain the time distribution of the feature in the same bin, the following method is used Record bin content:

bin：(该特征出现的频率为n1+n2+…+nk的和，帧号T1，连续出现次数n1，T2,n2,…,Tk,nk)。 bin: (The frequency of occurrence of this feature is the sum of n1+n2+...+nk, frame number T1, number of consecutive occurrences n1, T2, n2,..., Tk, nk). the

(4)建立视频文件倒排索引文件，具体包括以下几个子步骤： (4) Establish the video file inverted index file, which specifically includes the following sub-steps:

(41)统计视频库中每一个视频对应的视觉单词集合，构成视频库的统计视觉词库VwSet。根据 (41) Statistically count the set of visual words corresponding to each video in the video library to form the statistical visual word library VwSet of the video library. according to

Vw_i(在视觉词库中第i个视觉单词)建立拥有该视觉单词的文档集合{vf1,vf2,vf3,…,vfni}，ni为文档集合大小； Vw_i (the i-th visual word in the visual lexicon) establishes a document collection {vf1, vf2, vf3,...,vfni} with the visual word, ni is the size of the document collection;

(42)视觉单词文档集合的文档按所含视觉单词的多少从大到小排序； (42) The documents of the visual word document collection are sorted from large to small according to the number of visual words contained;

(43)由于高维视觉单词投影到低维特征子空间，根据各子空间建立倒排索引文件。 (43) Since high-dimensional visual words are projected into low-dimensional feature subspaces, an inverted index file is established according to each subspace. the

(5)建立分布式存储索引库，具体包括以下步骤： (5) Establish a distributed storage index library, specifically including the following steps:

(51)利用基于p-stable的局部敏感哈希算法(LSH)把子空间特征f_v，(比如 f_colo_his_16)映射到一维空间[0-Range)； (51) Use the p-stable-based Local Sensitive Hash Algorithm (LSH) to map the subspace feature f_v, (such as f_colo_his_16) to a one-dimensional space [0-Range);

(52)采用hadoop的HDFS文件系统架构，用name_node维护LSH表，用data_node保存索引数据。 (52) Adopt the HDFS file system architecture of hadoop, maintain the LSH table with name_node, and save index data with data_node. the

(6)视频相似度计算，具体包括以下步骤： (6) Video similarity calculation, specifically including the following steps:

检索视频Vq子空间直方图为{Bin_q_1,Bin_q_2,…,Bin_q_M}，M为特征子空间大小，视频库视频Vi直方图为{Bin_i_1,Bin_i_2,…,Bin_i_M}，Bin_id_n中，id为视频唯一编号，n为直方图bin的序号， The histogram of the retrieved video Vq subspace is {Bin_q_1, Bin_q_2,...,Bin_q_M}, M is the size of the feature subspace, the histogram of the video library Vi is {Bin_i_1, Bin_i_2,...,Bin_i_M}, in Bin_id_n, id is the unique number of the video , n is the serial number of the histogram bin,

Bin_id_n为该特征出现的次数； Bin_id_n is the number of occurrences of this feature;

(61)视频相似度为直方图的交， (61) Video similarity is the intersection of histograms,

$sim sim ((Vq wxya,, Vi Vi)) = = {Σ Σ}_{11}^{M m} min min ((Bin Bin__q q__k k,, Bin Bin__i i__k k)) / / {Σ Σ}_{11}^{M m} max max ((Bin Bin__q q__k k,, Bin Bin__i i__k k))$

(62)如果相似度大于阈值thrsh_sim，比较视觉单词的时间序列关系。直方图时间序列信息在步骤(32)已做记录，其算法如下： (62) If the similarity is greater than the threshold thrsh_sim, compare the time series relationship of visual words. Histogram time series information has been recorded in step (32), and its algorithm is as follows:

按照检索视频视觉单词在时间的出现的顺序表示视频，比如{(Vq_vw1,Bin_k1),(Vq_vw2,Bin_k2),…, Represent the video in the order in which the visual words of the retrieved video appear in time, such as {(Vq_vw1,Bin_k1),(Vq_vw2,Bin_k2),…,

(Vq_vwl,Bin_kl))},其中vw1为在时间上第一个出现的视觉单词，vw2是随后出现的，Bin_k1表示该视觉单词所在直方图Bin的序号为k1,kl直方图bin的总数目； (Vq_vwl, Bin_kl))}, where vw1 is the first visual word that occurs in time, vw2 occurs subsequently, and Bin_k1 represents that the sequence number of the histogram Bin where the visual word is located is k1, the total number of kl histogram bins;

(63)如果检索视频中的视觉单词Vq_vw(x)出现的时间早于Vq_vw(y),x<y,则匹配的视频直方图对应的序号为Bin_kx的Bin所包含的相同视觉单词的所有序列号中，至少有一个小于Bin_ky对应的其中的一个序号；我们认为检索视觉单词出现的先后顺序应该和相似视频中同样的视觉单词出现的先后顺序一致，即相应的时空结构具有一致性，通过时间顺序可取去除大量疑似相似视频。 (63) If the visual word Vq_vw(x) in the retrieved video appears earlier than Vq_vw(y), x<y, then the matching video histogram corresponds to all sequences of the same visual word contained in the Bin with the sequence number Bin_kx Among the serial numbers, at least one is smaller than one of the serial numbers corresponding to Bin_ky; we believe that the sequence of appearance of the retrieval visual words should be consistent with the sequence of appearance of the same visual words in similar videos, that is, the corresponding spatio-temporal structures are consistent, and through time The order is desirable to remove a large number of suspected similar videos. the

(7)对视频进行检索，具体采用以下方式： (7) Retrieve the video, specifically in the following ways:

提取检索视频视觉单词直方图，把视觉单词特征在各个子空间映射为哈希值，确定访问哈希桶所在的name_node和data_nodes，通过倒排索引视频文件，选取最为相似的前20％作为输出，然后计算时空结构的一致性，按相似度大小输出相似度大于0.7的所有被检索到的视频文件。 Extract the histogram of visual words from the retrieved video, map the visual word features into hash values in each subspace, determine the name_node and data_nodes where the access hash bucket is located, and select the most similar top 20% as output through inverted index video files. Then calculate the consistency of the space-time structure, and output all retrieved video files with a similarity greater than 0.7 according to the similarity. the

本发明主要针对构建视频索引信息的完整性和索引特征的选择问题，提出了一种基于视频指纹的子空间方法，解决当前面向海量数据的快速、鲁棒的检索问题。首先，本专利采用新颖的关键帧提取方法，用关键特征向量的提取代替关键帧提取，直接用代表性的视觉特征代替关键帧，相当于在特征空间对原始视频进行了编码，完整的表达了视频信息，既无大量冗余，又非常紧凑，并克服了当前关键帧提取参数选择问题。其次，把每一个视觉特征映射成一维哈希值，按照视觉特征的哈希值所在范围，选择合适的HDFS(Hadoop Distributed File System，Hadoop分布式文件系统)的name_node(名称节点)和data_node(数据节点)，即加快检索速度，又使之具有海量数据并发处理的能力，具有更广泛的应用范围。 The invention mainly aims at the integrity of video index information and the selection of index features, and proposes a subspace method based on video fingerprints to solve the current fast and robust retrieval problem facing massive data. First of all, this patent adopts a novel key frame extraction method, replaces key frame extraction with key feature vector extraction, and directly replaces key frames with representative visual features, which is equivalent to encoding the original video in the feature space and completely expressing The video information has neither a large amount of redundancy, but also is very compact, and overcomes the current key frame extraction parameter selection problem. Secondly, each visual feature is mapped into a one-dimensional hash value, and the name_node (name node) and data_node (data node) of the appropriate HDFS (Hadoop Distributed File System, Hadoop Distributed File System) are selected according to the range of the hash value of the visual feature. node), which not only speeds up the retrieval speed, but also enables it to have the ability to process massive data concurrently, and has a wider range of applications. the

在此说明书中，本发明已参照其特定的实施例作了描述。但是，很显然仍可以作出各种修改和变换而不背离本发明的精神和范围。因此，说明书和附图应被认为是说明性的而非限制性的。 In this specification, the invention has been described with reference to specific embodiments thereof. However, it is obvious that various modifications and changes can be made without departing from the spirit and scope of the invention. Accordingly, the specification and drawings are to be regarded as illustrative rather than restrictive. the

Claims

1. A method for realizing massive video fast retrieval, is characterized in that, described method comprises the following steps:

(1) Each frame of video images in the video stream of the video library extracts the spatial feature vector to obtain the video feature sequence;

(2) extract key feature vector in the spatial feature vector of described video feature sequence;

(3) set up the distributed storage index library of all video files according to the key feature vector of all video files in the video library;

(4) extract the key feature vector set of the video to be retrieved and extract the video index file of the video to be retrieved;

(5) Perform video similarity retrieval in the distributed storage index library according to the video index file of the video to be retrieved, and output video retrieval results whose similarity is greater than the system preset value.

2. the method for realizing massive video fast retrieval according to claim 1, is characterized in that, described spatial characteristic vector comprises the grayscale spatial distribution characteristic and the texture spatial distribution characteristic of corresponding frame image, and described pair video storehouse Each frame video image in the video stream extracts the spatial feature vector respectively, comprises the following steps:

(11) calculate the grayscale image and the edge texture image of each frame video image in the video stream of the video storehouse;

(12) calculate the center space feature and boundary space feature of the grayscale image of each frame video image and obtain the grayscale space distribution feature of this frame video image that is made up of described center space feature and boundary space feature;

(13) Calculate the texture spatial distribution feature of the edge texture image of each frame of video image.

3. the method for realizing massive video fast retrieval according to claim 2, is characterized in that, described calculation obtains the grayscale image and the edge texture image of each frame video image in the video stream of video storehouse, comprises the following steps:

(111) divide each frame video image in the video flow of video bank into several sub-images of the same size and calculate the gray value and texture edge point number of each sub-image;

(112) calculate the grayscale value of each sub-image of each frame video image to obtain the grayscale image of this frame video image;

(113) Calculate the number of texture edge points of each sub-image of each frame of video image to obtain the edge texture image of the frame of video image.

4. the method for realizing mass video fast retrieval according to claim 2, is characterized in that, the central space feature and the boundary space feature of the grayscale image of each frame video image of the described calculation, specifically:

Calculate the central space feature and the boundary space feature of the local binary pattern of the grayscale image of each frame video image;

The texture space distribution feature of the edge texture image of the described calculation each frame video image is specifically:

The texture spatial distribution feature of the local binary pattern of the edge texture image of each frame video image is calculated.

5. the method for realizing massive video fast retrieval according to claim 2, is characterized in that, described spatial feature vector also comprises color histogram feature, and each frame video image extracts respectively in the video flow of described video storehouse The spatial feature vector also includes the following steps:

(14) Calculate the color histogram feature of each frame of video image.

6. the method for realizing massive video fast retrieval according to claim 1, is characterized in that, described extracting key feature vector in the spatial feature vector of described video feature sequence, comprises the following steps:

(21) the first spatial feature vector of the video feature sequence is defaulted as the key feature vector;

(22) Calculate the Mahalanobis distance between each spatial eigenvector and the previous key eigenvector;

(23) Extract the spatial feature vector corresponding to the Mahalanobis distance greater than the system preset threshold as the key feature vector.

7. the method for realizing massive video fast retrieval according to claim 1, is characterized in that, described according to the key feature vector of all video files in the video library, the distributed storage index storehouse of all video files is set up, comprises the following steps:

(31) establish the subspace projection histogram of the key feature vector in the video feature sequence and record the frequency of occurrence of each key feature vector in the corresponding video;

(32) set up the inverted index file of all video files of video storehouse;

(33) Establish a distributed index database of all video files in the video library.

8. the method for realizing mass video fast retrieval according to claim 7, is characterized in that, the subspace projection histogram of the key feature vector in the described establishment video feature sequence is specifically:

The key feature vectors in the video feature sequence are projected into the gray subspace, texture subspace and color subspace, and the subspace projection histogram of each key feature vector is obtained.

9. the method for realizing massive video fast retrieval according to claim 8, is characterized in that, the frequency that each key feature vector of described recording occurs in the corresponding video is specifically:

A feature value representing the frequency of appearance of the key feature vector in the video is recorded in the subspace projection histogram corresponding to each key feature vector.

10. the method for realizing massive video fast retrieval according to claim 8, is characterized in that, the inverted index file of all video files of described setting up video storehouse, comprises the following steps:

(321) the corresponding key feature vector set of each video file in the statistical video library and constitute the statistical key feature vector library of the video library;

(322) Establishing a collection of documents corresponding to each key feature vector in the statistical key feature vector library having the key feature vector;

(323) sorting the documents of the key feature vector set according to the number of key feature vectors contained therein from more to less;

(324) Establish an inverted index file of all video files in the video library according to each subspace.

11. the method for realizing massive video fast retrieval according to claim 10, is characterized in that, the distributed index database of all video files of described setting up video storehouse, comprises the following steps:

(331) utilizing a locality-sensitive hash algorithm based on p-stable to map the key feature vectors of each subspace to a one-dimensional space;

(332) Based on the Hadoop distributed file system architecture, name_node is used to maintain the hash table and data_node is used to store index data as a distributed index database for all video files.

12. The method for realizing fast retrieval of massive videos according to claim 7, wherein the video similarity retrieval is performed in the distributed storage index library according to the video index file of the video to be retrieved ,Specifically:

(51) Computing the intersection of each video subspace projection histogram in the video subspace projection histogram to be retrieved and the video library as the similarity of each video in the video to be retrieved and the video library;

(52) According to the time-space structure consistency of the video to be retrieved and the key feature vectors of each video in the video library, video files that do not meet the time-space structure consistency requirements are eliminated.

13. the method for realizing massive video fast retrieval according to claim 12, is characterized in that, described output similarity is greater than the video retrieval result of system default value, comprises the following steps:

(52) extract each subspace projection histogram of the key feature vector of the video to be retrieved and map each key feature vector into a hash value in each subspace;

(53) selecting a video file whose similarity in the distributed index database meets the system preset requirements through the inverted index file as output;

(54) Calculate the spatio-temporal structure consistency of the video to be retrieved and the key feature vectors of each video in the video database and output a video file whose similarity with the video to be retrieved is greater than the system preset value.