CN111368143A

CN111368143A - Video similarity retrieval method and device, electronic equipment and storage medium

Info

Publication number: CN111368143A
Application number: CN202010177728.7A
Authority: CN
Inventors: 李沁
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2020-03-13
Filing date: 2020-03-13
Publication date: 2020-07-03

Abstract

The application relates to a video similarity retrieval method, a video similarity retrieval device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a video clip to be retrieved of a video to be retrieved; analyzing the video clip to be retrieved to obtain a first feature vector; and comparing the first feature vector with a video feature library to obtain a retrieval result of the video to be retrieved. The technical scheme carries out retrieval by extracting the characteristic vector of the video segment to be retrieved to replace the traditional image characteristic vector. On the basis of ensuring the feasibility, the stability and the robustness of the characteristics are enhanced, and meanwhile, the retrieval precision is also improved.

Description

A video similarity retrieval method, device, electronic device and storage medium

技术领域technical field

本申请涉及视频检索领域，尤其涉及一种视频相似度检索方法方法、装置、电子设备及存储介质。The present application relates to the field of video retrieval, and in particular, to a method, apparatus, electronic device and storage medium for video similarity retrieval.

背景技术Background technique

视频检索是解决处理版权检测、侵权查询、视频查重的一种重要手段，目前，视频检索方法是以视频中的图像为基础进行的，主要通过从待检索视频片段中提取关键帧，将此关键帧与库内的视频关键帧进行比对，进而衡量其相似度。此种处理手段存在以下缺陷：Video retrieval is an important means to deal with copyright detection, infringement query, and video duplication check. At present, video retrieval methods are based on images in videos. The keyframes are compared with the video keyframes in the library to measure their similarity. This approach has the following drawbacks:

一、单帧图像不能代替待检索视频片段的内容，以帧之间的对比结果作为视频片段之间的映射关系存在较大的局限，这种局限会降低算法的鲁棒性，如果关键帧的轻微位移和畸变都可能导致结果不准确。1. A single frame of image cannot replace the content of the video clip to be retrieved. There is a big limitation in using the comparison result between frames as the mapping relationship between video clips. This limitation will reduce the robustness of the algorithm. Slight displacement and distortion can cause inaccurate results.

二、以提取关键帧的方式进行检索，使检索精度和速度呈拮抗关系，如果要保持较高的检索精度就需要减小关键帧的提取间隔，选取多帧进行比对，必将耗费更多的时间。反之，加快检索速度则需加大提取关键帧的间隔，缩减比对次数，可能导致更高的误检率。2. Retrieval is performed by extracting key frames, so that the retrieval accuracy and speed are antagonistic. If you want to maintain a high retrieval accuracy, you need to reduce the extraction interval of key frames. Selecting multiple frames for comparison will definitely cost more time. Conversely, to speed up the retrieval speed, it is necessary to increase the interval for extracting key frames and reduce the number of comparisons, which may lead to a higher false detection rate.

三、受图像特征提取方法，特征对比方法的影响，需要根据情况调整特征，增加超参的复杂性。3. Affected by the image feature extraction method and the feature comparison method, it is necessary to adjust the features according to the situation and increase the complexity of the hyperparameters.

发明内容SUMMARY OF THE INVENTION

为了解决上述技术问题或者至少部分地解决上述技术问题，本申请提供了一种视频相似度检索方法、装置、电子设备及存储介质。In order to solve the above technical problems or at least partially solve the above technical problems, the present application provides a video similarity retrieval method, apparatus, electronic device and storage medium.

第一方面，本申请实施例提供了一种视频相似度检索方法，包括：In a first aspect, an embodiment of the present application provides a video similarity retrieval method, including:

获取待检索视频的待检索视频片段；Obtain the to-be-retrieved video clip of the to-be-retrieved video;

对所述待检索视频片段进行解析得到第一特征向量；Analyzing the to-be-retrieved video segment to obtain a first feature vector;

将所述第一特征向量与视频特征库进行比对，得到所述待检索视频的检索结果。The first feature vector is compared with the video feature library to obtain the retrieval result of the video to be retrieved.

可选的，所述对所述待检索视频片段进行解析得到第一特征向量，包括：Optionally, the first feature vector obtained by parsing the to-be-retrieved video clip includes:

将所述待检索视频片段输入特征提取模型，所述特征提取模型包括：多个3D卷积层；Inputting the video segment to be retrieved into a feature extraction model, the feature extraction model includes: a plurality of 3D convolutional layers;

由所述多个3D卷积层依次对所述待检索视频片段进行卷积，得到所述第一特征向量。The plurality of 3D convolution layers sequentially convolve the to-be-retrieved video segments to obtain the first feature vector.

可选的，所述视频特征库包括：第二特征向量，所述第二特征向量是将视频库中的视频片段输入所述特征提取模型得到的。Optionally, the video feature library includes: a second feature vector, where the second feature vector is obtained by inputting video clips in the video library into the feature extraction model.

可选的，所述将所述第一特征向量与视频特征库进行比对，得到所述待检索视频的检索结果，包括：Optionally, comparing the first feature vector with a video feature library to obtain a retrieval result of the video to be retrieved, including:

从所述第二特征向量中获取与所述第一特征向量相匹配的第三特征向量；Obtain a third eigenvector matching the first eigenvector from the second eigenvector;

获取所述第三特征向量对应的视频集合；Obtain the video set corresponding to the third feature vector;

建立所述待检索视频与所述视频集合中所有视频的映射关系；establishing a mapping relationship between the video to be retrieved and all videos in the video set;

将所述映射关系作为所述待检索视频的检索结果。The mapping relationship is used as the retrieval result of the video to be retrieved.

可选的，所述从所述第二特征向量中获取与所述第一特征向量相匹配的第三特征向量，包括：Optionally, obtaining a third eigenvector matching the first eigenvector from the second eigenvector includes:

计算所述第一特征向量与所述第二特征向量的相似度；calculating the similarity between the first feature vector and the second feature vector;

将所述相似度满足预设条件的第二特征向量作为所述第三特征向量。The second feature vector whose similarity satisfies a preset condition is used as the third feature vector.

可选的，所述方法还包括：Optionally, the method further includes:

获取至少两个相邻的第一特征向量；obtain at least two adjacent first feature vectors;

当所述视频集合中的视频包括与所述至少两个相邻的第一特征向量相匹配，且时序信息相同的至少两个相邻第三特征向量时，则确认所述待检索视频与所述视频集合中的视频相同或部分相同。When the videos in the video set include at least two adjacent third feature vectors that match the at least two adjacent first feature vectors and have the same timing information, then confirm that the to-be-retrieved video matches the The videos in the video set are identical or partially identical.

可选的，所述方法还包括：Optionally, the method further includes:

根据所述检索结果确定所述待检索视频的重复率；Determine the repetition rate of the video to be retrieved according to the retrieval result;

当所述重复率大于或等于预设阈值时，对所述待检索视频执行相应的处理操作。When the repetition rate is greater than or equal to a preset threshold, a corresponding processing operation is performed on the video to be retrieved.

第二方面，本申请实施例提供了一种视频相似度检索装置，包括：In a second aspect, an embodiment of the present application provides a video similarity retrieval device, including:

确定模块，用于确定待检索视频的待检索视频片段；A determination module, used to determine the to-be-retrieved video segment of the to-be-retrieved video;

提取模块，用于基于所述待检索视频提取至少一个第一特征向量；an extraction module for extracting at least one first feature vector based on the video to be retrieved;

检索模块，用于将所述第一特征向量与预设视频库进行比对，得到所述待检索视频的检索结果。A retrieval module, configured to compare the first feature vector with a preset video library to obtain a retrieval result of the video to be retrieved.

第三方面，本申请提供了一种电子设备，包括：处理器、通信接口、存储器和通信总线，其中，处理器，通信接口，存储器通过通信总线完成相互间的通信；In a third aspect, the present application provides an electronic device, including: a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory communicate with each other through the communication bus;

所述存储器，用于存放计算机程序；the memory for storing computer programs;

所述处理器，用于执行计算机程序时，实现上述方法步骤。The processor implements the above method steps when executing the computer program.

第四方面，本申请提供了一种计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现上述方法步骤。In a fourth aspect, the present application provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the above method steps.

本申请实施例提供的上述技术方案与现有技术相比具有如下优点：通过提取待检索视频片段的特征向量代替传统的图像特征向量进行检索。在保证可行性的基础上，增强了特征的稳定性、鲁棒性，同时也提高了检索精度。Compared with the prior art, the above technical solutions provided by the embodiments of the present application have the following advantages: the retrieval is performed by extracting the feature vector of the video segment to be retrieved instead of the traditional image feature vector. On the basis of ensuring the feasibility, the stability and robustness of the features are enhanced, and the retrieval accuracy is also improved.

附图说明Description of drawings

此处的附图被并入说明书中并构成本说明书的一部分，示出了符合本发明的实施例，并与说明书一起用于解释本发明的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description serve to explain the principles of the invention.

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，对于本领域普通技术人员而言，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. In other words, on the premise of no creative labor, other drawings can also be obtained from these drawings.

图1为本申请实施例提供的一种视频相似度检索方法的流程图；1 is a flowchart of a video similarity retrieval method provided by an embodiment of the present application;

图2为本申请实施例提供的特征提取模型的工作示意图；2 is a working schematic diagram of a feature extraction model provided by an embodiment of the present application;

图3为本申请实施例提供的待检索视频与视频集合的映射关系示意图；3 is a schematic diagram of a mapping relationship between a video to be retrieved and a video set provided by an embodiment of the present application;

图4为本申请另一实施例提供的一种视频相似度检索方法的流程图；4 is a flowchart of a video similarity retrieval method provided by another embodiment of the present application;

图5为本申请另一实施例提供的一种视频相似度检索方法的流程图；5 is a flowchart of a video similarity retrieval method provided by another embodiment of the present application;

图6为本申请实施例提供的一种视频相似度检索装置的框图；6 is a block diagram of a video similarity retrieval apparatus provided by an embodiment of the present application;

图7为本申请实施例提供的一种电子设备的结构示意图。FIG. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

具体实施方式Detailed ways

为使本申请实施例的目的、技术方案和优点更加清楚，下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本申请的一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例，都属于本申请保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments It is a part of the embodiments of this application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present application.

本申请一种视频相似度检索方法、装置、电子设备及存储介质，本发明实施例所提供的方法可以应用于任意需要的电子设备，例如，可以为服务器、终端等电子设备，在此不做具体限定，为描述方便，后续简称为电子设备。The present application provides a video similarity retrieval method, device, electronic device, and storage medium. The method provided by the embodiment of the present invention can be applied to any required electronic device, for example, it can be an electronic device such as a server and a terminal, which is not described here. For specific limitations, for the convenience of description, it is hereinafter referred to as an electronic device.

下面首先对本发明实施例所提供的一种视频相似度检索方法进行介绍。The following first introduces a video similarity retrieval method provided by an embodiment of the present invention.

图1为本申请实施例提供的一种视频相似度检索方法的流程图。如图1所示，该方法包括以下步骤：FIG. 1 is a flowchart of a video similarity retrieval method provided by an embodiment of the present application. As shown in Figure 1, the method includes the following steps:

步骤S11，获取待检索视频的待检索视频片段；Step S11, obtaining to-be-retrieved video clips of the to-be-retrieved video;

步骤S12，对待检索视频片段进行解析得到第一特征向量；Step S12, analyzing the video segment to be retrieved to obtain a first feature vector;

步骤S13，将第一特征向量与视频特征库进行比对，得到待检索视频的检索结果。Step S13, comparing the first feature vector with the video feature library to obtain a retrieval result of the video to be retrieved.

本实施例所提供的视频相似度检索方法，是通过提取待检索视频片段的特征向量，来代替传统的图像特征进行检索的方式，以此在保证可行性的基础上，增强了特征的稳定性、鲁棒性，同时也提高了检索精度。The video similarity retrieval method provided in this embodiment replaces the traditional image feature retrieval method by extracting the feature vector of the video segment to be retrieved, thereby enhancing the stability of the feature on the basis of ensuring the feasibility , robustness, and also improve the retrieval accuracy.

本实施例中，获取待检索视频，然后按照连续的预设帧数对待检索视频进行分割，得到待检索视频的待检索视频片段。In this embodiment, the to-be-retrieved video is acquired, and then the to-be-retrieved video is divided according to a continuous preset number of frames to obtain to-be-retrieved video segments of the to-be-retrieved video.

其中，本实施例所涉及的预设帧数是16帧，将连续16帧的图像作为待检索视频片段，以不足一秒的1秒视频片段为检索单位进行视频检索，显著缩小了检索粒度，同时提高了识别精度，能够对同一视频的不同片段给出检索结果。Among them, the preset number of frames involved in this embodiment is 16 frames, the images of 16 consecutive frames are used as the video clips to be retrieved, and the video clips of 1 second less than one second are used as the retrieval unit to perform video retrieval, which significantly reduces the retrieval granularity. At the same time, the recognition accuracy is improved, and retrieval results can be given for different segments of the same video.

本实施例中，对待检索视频片段进行解析得到第一特征向量，具体通过以下方式实现：将待检索视频片段输入特征提取模型，特征提取模型包括：多个3D卷积层；由多个3D卷积层依次对待检索视频片段进行卷积，得到第一特征向量。In this embodiment, the first feature vector is obtained by parsing the video clip to be retrieved, which is specifically implemented in the following manner: inputting the video clip to be retrieved into a feature extraction model, and the feature extraction model includes: multiple 3D convolution layers; The convolution layer sequentially convolves the video segments to be retrieved to obtain the first feature vector.

图2为本申请实施例提供的特征提取模型的工作示意图，如图2所示，将连续16帧的图像输入预先训练好的特征提取模型，由特征提取模型进行卷积计算，输出256维数组，将256维数据作为第一特征向量。Fig. 2 is a working schematic diagram of a feature extraction model provided by an embodiment of the present application. As shown in Fig. 2, images of 16 consecutive frames are input into a pre-trained feature extraction model, the feature extraction model performs convolution calculation, and outputs a 256-dimensional array , taking the 256-dimensional data as the first feature vector.

其中，本实施例中连续16帧图像的参数包括：长112像素，宽112像素，3通道图像。本实施例采用的特征提取模型包括6层卷积层，卷积层功能是对输入数据进行特征提取，其内部包含多个卷积核，组成卷积核的每个元素都对应一个权重系数和一个偏差量(biasvector)，类似于一个前馈神经网络的神经元(neuron)。卷积层的参数包括卷积核大小、步长和填充，三者共同决定了卷积层输出特征图的尺寸。其中卷积核大小可以指定为小于输入图像尺寸的任意值，卷积核越大，可提取的输入特征越复杂。The parameters of the 16 consecutive frames of images in this embodiment include: a length of 112 pixels, a width of 112 pixels, and a 3-channel image. The feature extraction model used in this embodiment includes 6 layers of convolution layers. The function of the convolution layer is to perform feature extraction on the input data, and it contains multiple convolution kernels. Each element constituting the convolution kernel corresponds to a weight coefficient and A bias vector, similar to the neuron of a feedforward neural network. The parameters of the convolutional layer include convolution kernel size, stride and padding, which together determine the size of the output feature map of the convolutional layer. The size of the convolution kernel can be specified as an arbitrary value smaller than the size of the input image. The larger the convolution kernel, the more complex the input features that can be extracted.

本实施例所采用的卷积层为3D卷积层，通过使用3D卷积压缩数据，保证特征提取速度快，压缩效率高，节约运算时间和降低存储成本，其各层参数如下：The convolution layer used in this embodiment is a 3D convolution layer. By using 3D convolution to compress data, it can ensure fast feature extraction, high compression efficiency, save operation time and reduce storage cost. The parameters of each layer are as follows:

Conv1 卷积核：1×3×3×3×64，stride[1,1,2,2,1]，padding 0；Conv1 convolution kernel: 1×3×3×3×64, stride[1,1,2,2,1], padding 0;

Conv2 卷积核：1×5×5×64×128，stride[1,1,5,5,1]，padding 0；Conv2 convolution kernel: 1×5×5×64×128, stride[1,1,5,5,1], padding 0;

Conv3 卷积核：3×1×1×128×256，stride[1,2,1,1,1]，padding 0；Conv3 convolution kernel: 3×1×1×128×256, stride[1,2,1,1,1], padding 0;

Conv4 卷积核：3×1×1×256×512，stride[1,2,1,1,1]，padding 0；Conv4 convolution kernel: 3×1×1×256×512, stride[1,2,1,1,1], padding 0;

Conv5 卷积核：2×2×2×512×1536，stride[1,2,2,2,1]，padding 0；Conv5 convolution kernel: 2×2×2×512×1536, stride[1,2,2,2,1], padding 0;

Conv6 卷积核：2×7×7×1536×256，stride[1,1,1,1,1]，padding 0。Conv6 convolution kernel: 2×7×7×1536×256, stride[1,1,1,1,1], padding 0.

特征提取模型通过以下方式进行训练：获取训练样本，其中训练样本可以是连续的16帧图像，采用训练样本对预设卷积神经网络模型进行训练，得到特征提取模型。The feature extraction model is trained by the following methods: obtaining training samples, wherein the training samples may be 16 consecutive frames of images, and using the training samples to train a preset convolutional neural network model to obtain a feature extraction model.

本实施例中，将第一特征向量与视频特征库进行比对，得到待检索视频的检索结果，具体通过以下方式实现：In this embodiment, the first feature vector is compared with the video feature library to obtain the retrieval result of the video to be retrieved, which is specifically implemented in the following manner:

首先，获取视频特征库以及视频库，可以理解的，视频库是由完整的视频组成的集合，视频特征库存储有第二特征向量，第二特征向量通过以下方式获得：按照16帧为一个视频片段对视频库中的视频进行分割，得到多个视频片段，将得到的视频片段输入预先训练好的特征提取模型，得到第二特征向量。First, obtain the video feature library and the video library. It can be understood that the video library is a collection of complete videos. The video feature library stores a second feature vector, and the second feature vector is obtained in the following way: according to 16 frames as a video The segment segments the videos in the video library to obtain multiple video segments, and input the obtained video segments into a pre-trained feature extraction model to obtain a second feature vector.

需要说明的是，本实施例通过特征提取模型中的3D卷积层对连续16帧的视频片段进行进行卷积，一方面，保证视频特征提取的速度快，卷积效率高，节省了时间成本。另一方面，本实施例以连续16帧的视频片段为一个检索单位，与现有技术相比显著缩小了检索粒度，提高了检索结果的准确度。It should be noted that, in this embodiment, the 3D convolution layer in the feature extraction model is used to convolve the video clips of 16 consecutive frames. On the one hand, the speed of video feature extraction is ensured, the convolution efficiency is high, and the time cost is saved. . On the other hand, the present embodiment uses video clips of 16 consecutive frames as a retrieval unit, which significantly reduces the retrieval granularity and improves the accuracy of retrieval results compared with the prior art.

然后，从第二特征向量中获取与第一特征向量相匹配的第三特征向量，获取第三特征向量对应的视频集合，建立待检索视频与视频集合中所有视频的映射关系，将映射关系作为待检索视频的检索结果。Then, obtain a third feature vector matching the first feature vector from the second feature vector, obtain a video set corresponding to the third feature vector, establish a mapping relationship between the video to be retrieved and all videos in the video set, and use the mapping relationship as The retrieval result of the video to be retrieved.

可选的，从第二特征向量中获取与第一特征向量相匹配的第三特征向量，是通过计算第一特征向量与第二特征向量的相似度。将相似度满足预设条件的第二特征向量作为第三特征向量。作为一个示例，计算第一特征向量与视频特征库中所有第二特征向量的相似度，将相似度大于等于0的第二特征向量，作为第三特征向量。本实施例中所指的相似度为余弦相似度。设现有256维第一特征向量A和第二特征向量B，其余弦相似度计算公式如下：Optionally, obtaining a third eigenvector matching the first eigenvector from the second eigenvector is by calculating the similarity between the first eigenvector and the second eigenvector. The second feature vector whose similarity satisfies the preset condition is used as the third feature vector. As an example, the similarity between the first feature vector and all the second feature vectors in the video feature library is calculated, and the second feature vector with the similarity greater than or equal to 0 is used as the third feature vector. The similarity referred to in this embodiment is the cosine similarity. Assuming the existing 256-dimensional first eigenvector A and second eigenvector B, the cosine similarity calculation formula is as follows:

式中，||A||为向量A的模，||B||为向量B的模。In the formula, ||A|| is the modulus of vector A, and ||B|| is the modulus of vector B.

在得到第三特征向量后，确定每个第三特征向量对应的视频片段,查询每个视频片段所属的视频，根据每个视频片段所属的视频得到视频集合，即第三特征向量对应的视频集合。After obtaining the third feature vector, determine the video clip corresponding to each third feature vector, query the video to which each video clip belongs, and obtain a video set according to the video to which each video clip belongs, that is, the video set corresponding to the third feature vector .

建立待检索视频与视频集合中所有视频的映射关系，可以是统计待检索视频片段对应的视频片段，并将视频片段按照相似度进行排序，同时将相似度小于预设相似度的视频剔除。由此，得到了待检索视频中每个待检索视频片段与视频集合中所有视频的映射关系。Establishing the mapping relationship between the video to be retrieved and all the videos in the video set may be to count the video clips corresponding to the video clips to be retrieved, sort the video clips according to the similarity, and eliminate videos with a similarity less than a preset similarity. Thus, the mapping relationship between each to-be-retrieved video segment in the to-be-retrieved video and all the videos in the video set is obtained.

例如：待检索视频M包括：待检索视频片段M1，待检索视频片段M2，待检索视频片段M3以及待检索视频片段M4，视频集合中包括：视频Q、视频P、视频O、视频G、视频K以及视频R。For example: the video M to be retrieved includes: the video segment to be retrieved M1, the video segment to be retrieved M2, the video segment to be retrieved M3 and the video segment to be retrieved M4, the video set includes: video Q, video P, video O, video G, video K and video R.

映射关系为：待检索视频片段M1对应视频片段Q2(相似度95％)以及视频片段P2(相似度90％)。The mapping relationship is: the video segment M1 to be retrieved corresponds to the video segment Q2 (similarity 95%) and the video segment P2 (similarity 90%).

待检索视频片段M2对应视频片Q3(相似度99％)以及视频片段P3(相似度96％)。The video segment M2 to be retrieved corresponds to the video segment Q3 (similarity 99%) and video segment P3 (similarity 96%).

待检索视频片段M3相匹配的视频片段O1(相似度98％)以及视频片段G1(相似度94％)。The video segment O1 (similarity 98%) and the video segment G1 (similarity 94%) matching the to-be-retrieved video segment M3.

待检索视频片段M4相匹配的视频片段K2(相似度97％)以及视频片段R3(相似度90％)。The video segment K2 (similarity 97%) and the video segment R3 (similarity 90%) matching the to-be-retrieved video segment M4.

由此，得到待检索视频M与视频集合中所有视频的映射关系。Thus, the mapping relationship between the to-be-retrieved video M and all the videos in the video set is obtained.

图3为本申请另一实施例提供的一种视频相似度检索方法的流程图。如图3所示，该方法还包括以下步骤：FIG. 3 is a flowchart of a video similarity retrieval method provided by another embodiment of the present application. As shown in Figure 3, the method further includes the following steps:

步骤S21，获取至少两个相邻的第一特征向量；Step S21, obtaining at least two adjacent first feature vectors;

步骤S22，当视频集合中的视频包括与至少两个相邻的第一特征向量相匹配，且时序信息相同的至少两个相邻第三特征向量时，则确认待检索视频与视频集合中的视频相同或部分相同。Step S22, when the video in the video set includes at least two adjacent third feature vectors that match with at least two adjacent first feature vectors and have the same timing information, then confirm that the video to be retrieved is the same as the video set in the video set. The videos are identical or partially identical.

本实施例通过相邻的第一特征向量和视频库中的第三特征向量进行校验，能够得到与待检索视频某一部分相对应的视频，从而提高检索结果的有效性和准确性。In this embodiment, a video corresponding to a certain part of the video to be retrieved can be obtained by verifying the adjacent first feature vector and the third feature vector in the video library, thereby improving the validity and accuracy of the retrieval result.

图4为本申请实施例提供的待检索视频与视频集合的映射关系示意图，如图4所示，待检索视频包括：第一部分Z₁，第二部分Z₂以及第三部分Z₃。FIG. 4 is a schematic diagram of a mapping relationship between a video to be retrieved and a video set provided by an embodiment of the present application. As shown in FIG. 4 , the video to be retrieved includes: a first part Z ₁ , a second part Z ₂ and a third part Z ₃ .

设待检索视频的第一部分包括：待检索视频片段Z₁₁，待检索视频片段Z₁₂，待检索视频片段Z₁₃以及待检索视频片段Z₁₄。经过检索得出待检索视频片段Z₁₁与视频集合中视频A的片段A_j1对应，待检索视频片段Z₁₂与视频集合中视频A的片段A_j2对应，待检索视频片段Z₁₃与视频集合中视频A的片段A_j3对应，待检索视频片段Z₁₄与视频集合中视频A的片段A_j4对应。It is assumed that the first part of the video to be retrieved includes: the video segment to be retrieved Z ₁₁ , the video segment to be retrieved Z ₁₂ , the video segment to be retrieved Z ₁₃ and the video segment to be retrieved Z ₁₄ . After retrieval, the video segment Z ₁₁ to be retrieved corresponds to the segment A _j1 of the video A in the video set, the video segment Z ₁₂ to be retrieved corresponds to the segment A _j2 of the video A in the video set, and the video segment Z ₁₃ to be retrieved corresponds to the segment A j2 of the video A in the video set. The segment A _j3 of the video A corresponds to the segment A j4 of the video A to be retrieved, and the to-be-retrieved video segment Z ₁₄ corresponds to the segment A _j4 of the video A in the video set.

此时利用时序信息进行判断，若待检索视频的第一部分中的四个片段与视频集合中视频A的四个片段的时序信息相同，则认为待检索视频的第一部分与视频集合中视频A部分相同。At this time, the timing information is used to judge. If the timing information of the four segments in the first part of the video to be retrieved is the same as that of the four segments of the video A in the video set, it is considered that the first part of the video to be retrieved is the same as the video A part in the video set. same.

同时，待检索视频片段Z₁₁与视频集合中视频B的片段B_j1对应，待检索视频片段Z₁₂与视频集合中视频B的片段B_j2对应，待检索视频片段Z₁₃与视频集合中视频B的片段B_j3对应，待检索视频片段Z₁₄与视频集合中视频B的片段B_j4对应。At the same time, the to-be-retrieved video segment Z ₁₁ corresponds to the segment B _j1 of the video B in the video set, the to-be-retrieved video segment Z ₁₂ corresponds to the segment B _j2 of the video B in the video set, and the to-be-retrieved video segment Z ₁₃ corresponds to the video B in the video set The segment B _j3 of the to-be-retrieved video segment Z ₁₄ corresponds to the segment B _j4 of the video B in the video set.

此时利用时序信息进行判断，若待检索视频的第一部分中的四个片段与视频集合中视频B中四个片段的时序信息相同，则认为待检索视频的第一部分与视频集合中视频B部分相同。At this time, the timing information is used to judge. If the timing information of the four segments in the first part of the video to be retrieved is the same as the timing information of the four segments in the video B in the video set, it is considered that the first part of the video to be retrieved is the same as the video B in the video set. same.

待检视频第二部分包括：待检索视频片段Z₂₁，待检索视频片段Z₂₂，待检索视频片段Z₂₃以及待检索视频片段Z₂₄，经过检索得出待检索视频片段Z₂₁与视频集合中视频C的片段C_j1对应，待检索视频片段Z₂₂与视频集合中视频C的片段C_j2对应，以及待检索视频片段Z₂₃与视频集合中视频C的片段C_j3对应。The second part of the video to be checked includes: the video segment to be retrieved Z ₂₁ , the video segment to be retrieved Z ₂₂ , the video segment to be retrieved _Z ₂₃ and the video segment to be retrieved Z ₂₄ . The segment C j1 of the video C corresponds to the segment C _j1 of the video C to be retrieved, the segment Z ₂₂ of the video to be retrieved corresponds to the segment C _j2 of the video C in the video set, and the segment Z ₂₃ of the video to be retrieved corresponds to the segment C _j3 of the video C in the video set.

此时利用时序信息进行判断，若待检索视频的第一部分中的四个片段与视频集合中视频C中四个片段的时序信息相同，则认为待检索视频的第一部分与视频集合中视频C部分相同。At this time, the timing information is used to judge. If the timing information of the four segments in the first part of the video to be retrieved is the same as that of the four segments in the video C in the video set, it is considered that the first part of the video to be retrieved is the same as the video C in the video set. same.

待检视频第二部分包括：待检索视频片段Z₃₁以及待检索视频片段Z₃₂，经过检索得出待检索视频片段Z₃₁与视频集合中视频D的片段D_j1对应，待检索视频片段Z₃₂与视频集合中视频D的片段D_j2对应。The second part of the video to be checked includes: a video segment Z ₃₁ to be retrieved and a video segment Z ₃₂ to be retrieved. After retrieval, it is obtained that the video segment Z ₃₁ to be retrieved corresponds to the segment D j1 of the video D in the video set, and the video segment Z ₃₂ to be retrieved corresponds to the segment D _j1 of the video D in the video set. Corresponds to segment D _j2 of video D in the video set.

此时利用时序信息进行判断，若待检索视频的第一部分中的四个片段与视频集合中视频D中四个片段的时序信息相同，则认为待检索视频的第一部分与视频集合中视频D部分相同。At this time, the timing information is used to judge. If the timing information of the four segments in the first part of the video to be retrieved is the same as that of the four segments in the video D in the video set, it is considered that the first part of the video to be retrieved is the same as the video D in the video set. same.

图5为本申请另一实施例提供的一种视频相似度检索方法的流程图。如图5所示，该方法还包括以下步骤：FIG. 5 is a flowchart of a video similarity retrieval method provided by another embodiment of the present application. As shown in Figure 5, the method further includes the following steps:

步骤31，根据检索结果确定待检索视频的重复率；Step 31, determine the repetition rate of the video to be retrieved according to the retrieval result;

步骤32，当重复率大于或等于预设阈值时，对待检索视频执行相应的处理操作。Step 32, when the repetition rate is greater than or equal to the preset threshold, perform corresponding processing operations on the video to be retrieved.

本实施例中，根据检索结果确定待检索视频的重复率，通过重复率确定待检索视频是否为冗余视频。作为一个示例：待检索视频包括：第一部分，第二部分以及第三部分，其中第一部分与第二部分与视频集合的某一视频对应，然后计算第一部分与第二部分的视频时长，根据视频时长与待检索视频的总时长计算重复率。比如：第一部分与第二部分的视频时长为4.5min，待检索视频的总时长15min，则重复率为：30％。In this embodiment, the repetition rate of the video to be retrieved is determined according to the retrieval result, and whether the video to be retrieved is a redundant video is determined by the repetition rate. As an example: the video to be retrieved includes: a first part, a second part and a third part, where the first part and the second part correspond to a certain video in the video set, and then calculate the video duration of the first part and the second part, according to the video The repetition rate is calculated based on the duration and the total duration of the video to be retrieved. For example, if the video duration of the first part and the second part is 4.5 minutes, and the total duration of the video to be retrieved is 15 minutes, the repetition rate is 30%.

当重复率大于或等于预设阈值时，对待检索视频执行相应的处理操作。作为一个示例：当重复率大于预设阈值90％时，则确认待检索视频为冗余视频，可以将其剔除，以此避免大量冗余视频侵占视频库的资源。When the repetition rate is greater than or equal to the preset threshold, a corresponding processing operation is performed on the video to be retrieved. As an example: when the repetition rate is greater than the preset threshold of 90%, it is determined that the video to be retrieved is a redundant video, and it can be eliminated, so as to prevent a large number of redundant videos from occupying the resources of the video library.

本实施例公开的视频相似度检索方法，还可以应用于视频打点，比如：有一24h直播流，其中包含多个用户感兴趣节目和其他普通视频，现希望能从中自动定位感兴趣节目开始时间以便提醒观看或进行其他操作。考虑到感兴趣节目有着相对固定的片头，因此可提取出该片头并将其作为样本存于视频库中，而后以直播流为待检视频在视频库内进行检索，当某一片段与视频库内样本相似度大于阈值时即可认为当前片段正是节目片头，提示用户开始后续操作。The video similarity retrieval method disclosed in this embodiment can also be applied to video management. For example, there is a 24-hour live stream, which includes multiple programs of interest to users and other common videos. Reminders to watch or perform other actions. Considering that the program of interest has a relatively fixed title, the title can be extracted and stored as a sample in the video library, and then retrieved in the video library using the live stream as the video to be checked. When the similarity of the inner samples is greater than the threshold, it can be considered that the current segment is the title of the program, and the user is prompted to start subsequent operations.

此外本实施例公开的视频相似度检索方法，还可用于相似视频查找推荐，比如：用户观看某一视频后希望能看到跟过与此视频相关的内容，此时即可以用户观看的视频为待检索视频，在整个视频库中进行比对并按相似度由高到低的顺序对检索到的视频进行排序，最后将排序结果作为向用户推荐的视频内容。In addition, the video similarity retrieval method disclosed in this embodiment can also be used to find and recommend similar videos. For example, after a user watches a certain video, he hopes to see content related to this video. At this time, the video watched by the user can be The videos to be retrieved are compared in the entire video library and the retrieved videos are sorted in descending order of similarity, and finally the sorting result is used as the video content recommended to the user.

图6为本申请实施例提供的一种视频相似度检索装置的框图，该装置可以通过软件、硬件或者两者的结合实现成为电子设备的部分或者全部。如图6所示，该装置包括：FIG. 6 is a block diagram of a video similarity retrieval apparatus provided by an embodiment of the present application, and the apparatus may be implemented as part or all of an electronic device through software, hardware, or a combination of the two. As shown in Figure 6, the device includes:

确定模块41，用于确定待检索视频的待检索视频片段；A determination module 41, configured to determine the to-be-retrieved video segment of the to-be-retrieved video;

提取模块42，用于基于待检索视频提取至少一个第一特征向量；Extraction module 42, for extracting at least one first feature vector based on the video to be retrieved;

检索模块43，用于将所述第一特征向量与预设视频库进行比对，得到待检索视频的检索结果。The retrieval module 43 is configured to compare the first feature vector with a preset video library to obtain a retrieval result of the video to be retrieved.

本实施例中，提取模块42，具体用于获取预先训练的特征提取模型，将待检索视频片段输入特征提取模型，得到第一特征向量。In this embodiment, the extraction module 42 is specifically configured to obtain a pre-trained feature extraction model, and input the video segment to be retrieved into the feature extraction model to obtain a first feature vector.

本实施例中，视频特征库中包括：第二特征向量，第二特征向量是将视频库中的视频片段输入特征提取模型得到的。In this embodiment, the video feature library includes: a second feature vector, where the second feature vector is obtained by inputting the video clips in the video library into the feature extraction model.

本实施例中，检索模块43，包括：In this embodiment, the retrieval module 43 includes:

第一获取子模块，用于从第二特征向量中获取与第一特征向量相匹配的第三特征向量；The first obtaining submodule is used to obtain the third feature vector matching the first feature vector from the second feature vector;

第二获取子模块，用于获取第三特征向量对应的视频集合；The second acquisition sub-module is used to acquire the video set corresponding to the third feature vector;

存储子模块，用于建立待检索视频与视频集合中所有视频的映射关系，将映射关系作为待检索视频的检索结果。The storage submodule is used to establish a mapping relationship between the video to be retrieved and all the videos in the video set, and use the mapping relationship as the retrieval result of the video to be retrieved.

其中，第一获取子模块，具体用于：计算第一特征向量与第二特征向量的相似度，将相似度满足预设条件的第二特征向量作为第三特征向量。Wherein, the first obtaining sub-module is specifically configured to: calculate the similarity between the first feature vector and the second feature vector, and use the second feature vector whose similarity meets a preset condition as the third feature vector.

本实施例提供的视频相似度检索装置还包括：获取模块，用于获取至少两个相邻的第一特征向量，当视频集合中的视频包括与至少两个相邻的第一特征向量相匹配，且时序信息相同的至少两个相邻第三特征向量时，则确认待检索视频与视频集合中的视频相同或部分相同。The video similarity retrieval apparatus provided in this embodiment further includes: an acquisition module, configured to acquire at least two adjacent first feature vectors, when the videos in the video set include at least two adjacent first feature vectors that match , and at least two adjacent third feature vectors with the same timing information, it is confirmed that the video to be retrieved is the same or partially the same as the video in the video set.

本实施例提供的视频相似度检索装置还包括：处理模块，用于根据检索结果确定待检索视频的重复率，当重复率大于或等于预设阈值时，对待检索视频执行相应的处理操作。The video similarity retrieval apparatus provided in this embodiment further includes: a processing module configured to determine the repetition rate of the video to be retrieved according to the retrieval result, and perform a corresponding processing operation on the video to be retrieved when the repetition rate is greater than or equal to a preset threshold.

本申请实施例还提供一种电子设备，如图7所示，电子设备可以包括：处理器1501、通信接口1502、存储器1503和通信总线1504，其中，处理器1501，通信接口1502，存储器1503通过通信总线1504完成相互间的通信。This embodiment of the present application further provides an electronic device. As shown in FIG. 7 , the electronic device may include: a processor 1501, a communication interface 1502, a memory 1503, and a communication bus 1504, wherein the processor 1501, the communication interface 1502, and the memory 1503 pass through The communication bus 1504 performs communication with each other.

存储器1503，用于存放计算机程序；The memory 1503 is used to store computer programs;

处理器1501，用于执行存储器1503上所存放的计算机程序时，实现上述实施例的步骤。The processor 1501 is configured to implement the steps of the foregoing embodiments when executing the computer program stored in the memory 1503 .

上述电子设备提到的通信总线可以是外设部件互连标准(Peripheral ComponentInterconnect，P C I)总线或扩展工业标准结构(Extended Industry StandardArchitecture，EISA)总线等。该通信总线可以分为地址总线、数据总线、控制总线等。为便于表示，图中仅用一条粗线表示，但并不表示仅有一根总线或一种类型的总线。The communication bus mentioned in the above electronic device may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA) bus or the like. The communication bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.

通信接口用于上述电子设备与其他设备之间的通信。The communication interface is used for communication between the above electronic device and other devices.

存储器可以包括随机存取存储器(Random Access Memory，RAM)，也可以包括非易失性存储器(Non-Volatile Memory，NVM)，例如至少一个磁盘存储器。可选的，存储器还可以是至少一个位于远离前述处理器的存储装置。The memory may include random access memory (Random Access Memory, RAM), and may also include non-volatile memory (Non-Volatile Memory, NVM), such as at least one disk memory. Optionally, the memory may also be at least one storage device located away from the aforementioned processor.

上述的处理器可以是通用处理器，包括中央处理器(Central Processing Unit，CPU)、网络处理器(Network Processor，NP)等；还可以是数字信号处理器(DigitalSignalProcessing，DSP)、专用集成电路(Application Specific Integrated Circuit，ASIC)、现场可编程门阵列(Field-Programmable Gate Array，FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。The above-mentioned processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; it may also be a digital signal processor (Digital Signal Processing, DSP), an application-specific integrated circuit ( Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

本申请还提供一种计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现上述实施例的步骤。The present application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the steps of the above-mentioned embodiments.

需要说明的是，对于上述装置、电子设备及计算机可读存储介质实施例而言，由于其基本相似于方法实施例，所以描述的比较简单，相关之处参见方法实施例的部分说明即可。It should be noted that, for the above embodiments of the apparatus, electronic device and computer-readable storage medium, since they are basically similar to the method embodiments, the description is relatively simple, and for related details, please refer to some descriptions of the method embodiments.

进一步需要说明的是，在本文中，诸如“第一”和“第二”等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It is further to be noted that in this document, relational terms such as "first" and "second" etc. are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that Any such actual relationship or sequence exists between these entities or operations. Moreover, the terms "comprising", "comprising" or any other variation thereof are intended to encompass a non-exclusive inclusion such that a process, method, article or device that includes a list of elements includes not only those elements, but also includes not explicitly listed or other elements inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.

以上所述仅是本发明的具体实施方式，使本领域技术人员能够理解或实现本发明。对这些实施例的多种修改对本领域的技术人员来说将是显而易见的，本文中所定义的一般原理可以在不脱离本发明的精神或范围的情况下，在其它实施例中实现。因此，本发明将不会被限制于本文所示的这些实施例，而是要符合与本文所申请的原理和新颖特点相一致的最宽的范围。The above descriptions are only specific embodiments of the present invention, so that those skilled in the art can understand or implement the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features claimed herein.

Claims

1. a video similarity retrieval method, is characterized in that, comprises:

Obtain the to-be-retrieved video clip of the to-be-retrieved video;

Analyzing the to-be-retrieved video segment to obtain a first feature vector;

The first feature vector is compared with the video feature library to obtain the retrieval result of the video to be retrieved.

2. The method according to claim 1, wherein the obtaining the first feature vector by parsing the to-be-retrieved video segment comprises:

Inputting the video segment to be retrieved into a feature extraction model, the feature extraction model includes: a plurality of 3D convolutional layers;

The plurality of 3D convolution layers sequentially convolve the to-be-retrieved video segments to obtain the first feature vector.

3. The method according to claim 2, wherein the video feature library comprises: a second feature vector, wherein the second feature vector is obtained by inputting video clips in the video library into the feature extraction model of.

4. The method according to claim 3, wherein the comparing the first feature vector with a video feature library to obtain a retrieval result of the video to be retrieved comprises:

Obtain a third eigenvector matching the first eigenvector from the second eigenvector;

Obtain the video set corresponding to the third feature vector;

establishing a mapping relationship between the video to be retrieved and all videos in the video set;

The mapping relationship is used as the retrieval result of the video to be retrieved.

5. The method according to claim 4, wherein the obtaining a third eigenvector matching the first eigenvector from the second eigenvector comprises:

calculating the similarity between the first feature vector and the second feature vector;

The second feature vector whose similarity satisfies a preset condition is used as the third feature vector.

6. The method according to claim 4, wherein the method further comprises:

obtain at least two adjacent first feature vectors;

When the videos in the video set include at least two adjacent third feature vectors that match the at least two adjacent first feature vectors and have the same timing information, then confirm that the to-be-retrieved video matches the The videos in the video set are identical or partially identical.

7. The method of claim 1, wherein the method further comprises:

Determine the repetition rate of the video to be retrieved according to the retrieval result;

When the repetition rate is greater than or equal to a preset threshold, a corresponding processing operation is performed on the video to be retrieved.

8. A video similarity retrieval device, characterized in that, comprising:

A determination module, used to determine the to-be-retrieved video segment of the to-be-retrieved video;

an extraction module for extracting at least one first feature vector based on the video to be retrieved;

A retrieval module, configured to compare the first feature vector with a preset video library to obtain a retrieval result of the video to be retrieved.

9. An electronic device, comprising: a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory communicate with each other through the communication bus;

the memory for storing computer programs;

The processor, when executing the computer program, implements the method steps of any one of claims 1-7.

10. A computer-readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the method steps of any one of claims 1-7 are implemented.