CN106874827A

CN106874827A - Video frequency identifying method and device

Info

Publication number: CN106874827A
Application number: CN201510925602.2A
Authority: CN
Inventors: 不公告发明人
Original assignee: Beijing Qihoo Technology Co Ltd; Qizhi Software Beijing Co Ltd
Current assignee: Beijing Qihoo Technology Co Ltd; Qizhi Software Beijing Co Ltd
Priority date: 2015-12-14
Filing date: 2015-12-14
Publication date: 2017-06-20

Abstract

The present disclosure relates to a video recognition method and device, by acquiring a target video; dividing the target video according to a first preset frame interval to obtain multiple video clips; according to a second preset frame interval, from each video clip extract the first frame image; extract the first frame image containing face information to obtain the second face frame image; based on the preset recognition model, identify the face identity in the second face frame image, Determine the face identification contained in the second human face frame image; according to the correspondence between the human face identification and the second human face frame image, according to the correspondence between the second human face frame image and the video clip to which the second human face frame image belongs relationship to form a tripartite correspondence table of face identifiers, second face frame images, and video clips. Therefore, according to the determined correspondence between the video clips and the face identifiers, the user can be pushed a video clip containing only appearances of actors he wishes to watch.

Description

Video recognition method and device

技术领域technical field

本公开涉及图像处理技术领域，尤其涉及一种视频识别方法和装置。The present disclosure relates to the technical field of image processing, and in particular to a video recognition method and device.

背景技术Background technique

随着社会的快速发展，科学技术的不断进步，人们所能触及到的信息呈现几何式的增长，人们越来越需要借助信息搜索技术在海量的信息中挖掘出有效信息。With the rapid development of society and the continuous advancement of science and technology, the information that people can touch has shown a geometric growth. People increasingly need to use information search technology to dig out effective information from massive amounts of information.

目前的信息搜索技术对于文字的搜索效果较好，可以快速定位到包含有用户预设关键词的文章，但是对于一段视频而言，若用户希望观看仅包含自己喜爱演员出场的视频片断，则只能通过拖动视频进度按钮或按下视频快进键进行查找，费时费力，且定位不准确。The current information search technology has a better effect on text search, and can quickly locate articles containing user-preset keywords. It can be searched by dragging the video progress button or pressing the video fast forward key, which is time-consuming and laborious, and the positioning is not accurate.

发明内容Contents of the invention

为了解决现有技术中，对视频中演员出场时间段无法定位的问题，本公开提供一种视频识别方法和装置，通过对视频进行片断划分，并在划分后的视频片断中进行人脸识别，确定出视频片断与人脸身份的对应关系，从而可以为用户推送仅包含有其所希望观看的演员出场的视频片断，该方法有效且快速实现人脸识别，人脸视频片断定位，提升用户观赏视频的用户体验。In order to solve the problem in the prior art that the actor's appearance time period in the video cannot be located, the present disclosure provides a video recognition method and device, by dividing the video into segments, and performing face recognition in the divided video segments, Determine the corresponding relationship between video clips and face identities, so that users can push video clips containing only the actors they want to watch. This method is effective and fast to realize face recognition, face video clip positioning, and improve user viewing Video user experience.

本公开提供一种视频识别方法和装置，所述技术方案如下：The present disclosure provides a video recognition method and device, and the technical solution is as follows:

根据本公开实施例的第一方面，提供一种视频识别方法，包括：According to a first aspect of an embodiment of the present disclosure, a video recognition method is provided, including:

获取目标视频；Get the target video;

根据第一预设帧间隔，对所述目标视频进行划分，得到多个视频片断；According to the first preset frame interval, the target video is divided to obtain a plurality of video clips;

根据第二预设帧间隔，从每个所述视频片断中提取出第一帧图像；extracting a first frame image from each of the video clips according to a second preset frame interval;

检测所述第一帧图像中是否包含人脸信息，将包含有人脸信息的第一帧图像提取出来，得到第二人脸帧图像；Detecting whether the first frame image contains human face information, extracting the first frame image containing human face information, and obtaining the second human face frame image;

基于预设识别模型，对所述第二人脸帧图像中的人脸身份进行识别，确定所述第二人脸帧图像中包含的人脸标识；Based on the preset identification model, identify the face identity in the second frame image of human face, and determine the face identity contained in the second frame image of human face;

根据所述人脸标识与所述第二人脸帧图像的对应关系，根据所述第二人脸帧图像与所述第二人脸帧图像所归属的视频片断的对应关系，形成所述人脸标识、所述第二人脸帧图像、所述视频片断的三方对应关系表。According to the corresponding relationship between the face identification and the second human face frame image, according to the corresponding relationship between the second human face frame image and the video clip to which the second human face frame image belongs, the person is formed. A three-party correspondence table of the face identifier, the frame image of the second human face, and the video clip.

根据本公开实施例的第二方面，提供一种视频识别装置，包括：According to a second aspect of an embodiment of the present disclosure, a video recognition device is provided, including:

第一获取模块，用于获取目标视频；The first obtaining module is used to obtain the target video;

第一划分模块，用于根据第一预设帧间隔，对所述目标视频进行划分，得到多个视频片断；A first division module, configured to divide the target video according to a first preset frame interval to obtain a plurality of video clips;

第二划分模块，用于根据第二预设帧间隔，从每个所述视频片断中提取出第一帧图像；The second dividing module is used to extract the first frame image from each of the video clips according to the second preset frame interval;

检测模块，用于检测所述第一帧图像中是否包含人脸信息，将包含有人脸信息的第一帧图像提取出来，得到第二人脸帧图像；A detection module, configured to detect whether the first frame image contains face information, extract the first frame image containing face information, and obtain a second face frame image;

识别模块，用于基于预设识别模型，对所述第二人脸帧图像中的人脸身份进行识别，确定所述第二人脸帧图像中包含的人脸标识；A recognition module, configured to recognize the face identity in the second face frame image based on a preset recognition model, and determine the face identity contained in the second face frame image;

匹配模块，用于根据所述人脸标识与所述第二人脸帧图像的对应关系，根据所述第二人脸帧图像与所述第二人脸帧图像所归属的视频片断的对应关系，形成所述人脸标识、所述第二人脸帧图像、所述视频片断的三方对应关系表。A matching module, configured to, according to the correspondence between the face identification and the second face frame image, and according to the correspondence between the second face frame image and the video segment to which the second face frame image belongs and forming a tripartite correspondence table of the face identifier, the second face frame image, and the video clip.

本公开的实施例提供的方法及装置可以包括以下有益效果：通过获取目标视频；根据第一预设帧间隔，对目标视频进行划分，得到多个视频片断；根据第二预设帧间隔，从每个视频片断中提取出第一帧图像；检测第一帧图像中是否包含人脸信息，将包含有人脸信息的第一帧图像提取出来，得到第二人脸帧图像；基于预设识别模型，对第二人脸帧图像中的人脸身份进行识别，确定第二人脸帧图像中包含的人脸标识；根据人脸标识与第二人脸帧图像的对应关系，根据第二人脸帧图像与第二人脸帧图像所归属的视频片断的对应关系，形成人脸标识、第二人脸帧图像、视频片断的三方对应关系表。从而可以根据确定出的视频片断与人脸标识的对应关系，为用户推送仅包含有其所希望观看的演员出场的视频片断，该方法有效且快速实现人脸识别，人脸视频片断定位，提升用户观赏视频的用户体验。The method and device provided by the embodiments of the present disclosure may include the following beneficial effects: by acquiring the target video; dividing the target video according to the first preset frame interval to obtain multiple video clips; according to the second preset frame interval, from Extract the first frame image from each video clip; detect whether the first frame image contains face information, extract the first frame image containing face information, and obtain the second face frame image; based on the preset recognition model , identify the face identity in the second face frame image, and determine the face identifier contained in the second face frame image; according to the corresponding relationship between the face identifier and the second face frame image, according to the second face The corresponding relationship between the frame image and the video segment to which the second human face frame image belongs forms a tripartite correspondence table of the face identifier, the second human face frame image, and the video segment. Thereby, according to the corresponding relationship between the determined video clips and the face identification, the user can push the video clips that only include the appearance of the actors they want to watch. This method effectively and quickly realizes face recognition, face video clip positioning, and improves The user experience of the user watching the video.

应当理解的是，以上的一般描述和后文的细节描述仅是示例性和解释性的，并不能限制本公开。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.

附图说明Description of drawings

此处的附图被并入说明书中并构成本说明书的一部分，示出了符合本公开的实施例，并与说明书一起用于解释本公开的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description serve to explain the principles of the disclosure.

图1是根据一示例性实施例示出的一种视频识别方法的流程图；Fig. 1 is a flowchart of a video recognition method shown according to an exemplary embodiment;

图2是根据另一示例性实施例示出的一种视频识别方法的流程图；Fig. 2 is a flowchart of a video recognition method shown according to another exemplary embodiment;

图3是图2所示实施例的一种视频划分方式的示意图；Fig. 3 is a schematic diagram of a video division method of the embodiment shown in Fig. 2;

图4是根据一示例性实施例示出的一种视频识别装置的流程图；Fig. 4 is a flowchart of a video recognition device according to an exemplary embodiment;

图5是根据另一示例性实施例示出的一种视频识别装置的流程图。Fig. 5 is a flow chart of a video recognition device according to another exemplary embodiment.

通过上述附图，已示出本公开明确的实施例，后文中将有更详细的描述。这些附图和文字描述并不是为了通过任何方式限制本公开构思的范围，而是通过参考特定实施例为本领域技术人员说明本公开的概念。By means of the above-mentioned drawings, certain embodiments of the present disclosure have been shown and will be described in more detail hereinafter. These drawings and written description are not intended to limit the scope of the disclosed concept in any way, but to illustrate the disclosed concept for those skilled in the art by referring to specific embodiments.

具体实施方式detailed description

这里将详细地对示例性实施例进行说明，其示例表示在附图中。下面的描述涉及附图时，除非另有表示，不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反，它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatuses and methods consistent with aspects of the present disclosure as recited in the appended claims.

图1是根据一示例性实施例示出的一种视频识别方法的流程图，如图1所示，本实施例的视频识别方法可以应用于视频提供商的视频服务器中也可以应用于接收视频方的终端(客户端设备)中，以下以应用于视频服务器中来举例说明，本实施例的方法包括以下步骤：Fig. 1 is a flow chart of a video recognition method shown according to an exemplary embodiment. As shown in Fig. 1, the video recognition method of this embodiment can be applied to the video server of the video provider and can also be applied to the video receiving party In the terminal (client device), the following example is applied to a video server, and the method of this embodiment includes the following steps:

该视频处理方法包括以下步骤：The video processing method includes the following steps:

在步骤101中，获取目标视频。In step 101, a target video is acquired.

具体的，视频本质上是由一系列的静态影像连接而成，通常来说连续的图像变化每秒超过24帧画面以上时，根据视觉暂留原理，人眼无法辨别单幅的静态画面，看上去是平滑连续的视觉效果，这样连续的画面叫做视频。通过对构成目标视频的连续的帧图像进行人脸图像的识别，可以实现对目标视频中出现的演员的识别。Specifically, video is essentially connected by a series of static images. Generally speaking, when continuous image changes exceed 24 frames per second, according to the principle of persistence of vision, the human eye cannot distinguish a single static image. It looks like a smooth and continuous visual effect, and such a continuous picture is called a video. Recognition of actors appearing in the target video can be realized by performing face image recognition on the continuous frame images constituting the target video.

在步骤102中，根据第一预设帧间隔，对目标视频进行划分，得到多个视频片断。In step 102, the target video is divided according to the first preset frame interval to obtain multiple video clips.

具体的，如前所述，视频流之所以可以由一帧帧的静态画面构成，主要是因为人眼对于快速变化的单幅静态画面的识别能力有限，因此由静态画面组成的视频，人眼看上去可以是平滑连续的视觉效果。因此，可以根据一定间隔区间内包含的静态画面数量，将目标视频划分为一个个的视频片断，并且从用户观看视频的观感效果来说，第一预设帧间隔可以以分钟为单位，如0.5分钟、1分钟，这样当从目标视频中提取出包含有用户喜爱演员出场的视频片断时，每段视频的流动性较好，没有一帧帧画面的跳跃感与突兀感。Specifically, as mentioned above, the reason why a video stream can be composed of frames of static images is mainly because human eyes have limited ability to recognize rapidly changing single static images. Going up can be a smooth and continuous visual effect. Therefore, the target video can be divided into individual video clips according to the number of static pictures contained in a certain interval interval, and from the perspective of the user's viewing effect of the video, the first preset frame interval can be in minutes, such as 0.5 Minutes, 1 minute, so that when the video clips containing the appearance of the user's favorite actor are extracted from the target video, the fluidity of each video segment is better, without the sense of jumping and abruptness of each frame.

在步骤103中，根据第二预设帧间隔，从每个视频片断中提取出第一帧图像。In step 103, the first frame image is extracted from each video clip according to the second preset frame interval.

具体的，即使将完整视频进行分段处理后，每段视频片断内包含的帧图像的数量依旧很大，如前所述一秒钟的视频中可以包含有几十帧的静态图像，若对每个视频片断中的每帧图像都进行人脸识别操作，运算量巨大，识别速率不高。因此，可以在每个视频片断中抽取出一些特定的帧图像，对这些特定的帧图像进行扫描，得到图像中所包含的人脸特征信息，对特定帧图像的提取可以根据处理器的处理性能进行划分，若处理器的处理性能高，第二预设帧间隔可以较小，由于第一帧图像中可能包含有人脸信息，也可能未包含人脸信息，若第二预设帧间隔小则提高了从视频片断中提取到包含有人脸信息的帧图像的几率。优选的，第一预设帧间隔大于第二预设帧间隔。Specifically, even after the complete video is segmented, the number of frame images contained in each video segment is still very large. As mentioned above, a second of video can contain dozens of frames of static images. Face recognition is performed on each frame of image in each video clip, which requires a huge amount of computation and a low recognition rate. Therefore, some specific frame images can be extracted in each video clip, and these specific frame images are scanned to obtain the facial feature information contained in the image. The extraction of specific frame images can be based on the processing performance of the processor. To divide, if the processing performance of the processor is high, the second preset frame interval can be smaller, because the first frame image may contain face information, or may not contain face information, if the second preset frame interval is small then Increased the probability of extracting frame images containing human face information from video clips. Preferably, the first preset frame interval is greater than the second preset frame interval.

在步骤104中，检测第一帧图像中是否包含人脸信息，将包含有人脸信息的第一帧图像提取出来，得到第二人脸帧图像。In step 104, it is detected whether the first frame image contains human face information, and the first frame image containing human face information is extracted to obtain a second human face frame image.

具体的，检测第一帧图像中是否包含人脸信息是指在该图像中以一定的策略进行搜索，以确定其中是否含有人脸信息，其中的人脸信息可以是单张人脸信息或多张人脸信息，并在该帧图像中对人脸信息出现的位置进行标定，以确认各个人脸信息在帧图像中的坐标位置。对第一帧图像进行筛选，将包含有人脸信息的第一帧图像提取出来，得到第二人脸帧图像。Specifically, detecting whether the first frame image contains face information refers to searching the image with a certain strategy to determine whether it contains face information, where the face information can be single face information or multiple face information, and calibrate the position where the face information appears in the frame image to confirm the coordinate position of each face information in the frame image. The first frame image is screened, and the first frame image containing human face information is extracted to obtain a second human face frame image.

在步骤105中，基于预设识别模型，对第二人脸帧图像中的人脸身份进行识别，确定第二人脸帧图像中包含的人脸标识。In step 105, based on a preset recognition model, the identity of the face in the second frame image of the face is identified, and the identity of the face contained in the second frame image of the face is determined.

具体的，现有技术中存在多种用于对图像中的人脸身份进行识别的算法，基于不同的算法，可以得到不同的识别模型，例如，采集大量人脸图片作为样本数据，利用人工神经网络对样本数据进行训练，得到具有人工智能学习能力的神经网络模型，再采用该训练好的人工神经网络模型对待识别的人脸图像进行识别，得到识别结果。该训练好的人工神经网络模型即为预设识别模型。对所有第二人脸帧图像进行预处理后，作为输入数据输入到该预设识别模型中，可以得到每张第二人脸帧图像中出现的人脸图像的识别结果，即得到第二人脸帧图像中包含的人脸标识，人脸标识可以为视频中演员的名字。Specifically, in the prior art, there are a variety of algorithms for identifying the face identity in the image. Based on different algorithms, different recognition models can be obtained. For example, a large number of face pictures are collected as sample data, and artificial neural The network trains the sample data to obtain a neural network model with artificial intelligence learning ability, and then uses the trained artificial neural network model to recognize the face image to be recognized to obtain the recognition result. The trained artificial neural network model is the preset recognition model. After preprocessing all the second face frame images, they are input into the preset recognition model as input data, and the recognition results of the face images appearing in each second face frame image can be obtained, that is, the second person The face identifier included in the face frame image, where the face identifier can be the name of an actor in the video.

在步骤106中，根据人脸标识与第二人脸帧图像的对应关系，根据第二人脸帧图像与第二人脸帧图像所归属的视频片断的对应关系，形成人脸标识、第二人脸帧图像、视频片断的三方对应关系表。In step 106, according to the corresponding relationship between the human face identification and the second human face frame image, according to the corresponding relationship between the second human face frame image and the video segment to which the second human face frame image belongs, a human face identification, a second Three-party correspondence table of face frame images and video clips.

具体的，通过该三方对应关系表，可以对某一特定人脸标识进行快速定位，以获取到包含有该特定人脸标识的视频片断，从而将这些包含有特定人脸标识的视频片断提取并连续播放，以达到用户仅观看自己喜爱演员出场的视频片断的目的。Specifically, through the tripartite correspondence table, a specific face identification can be quickly located to obtain video clips containing the specific face identification, so that these video clips containing the specific face identification can be extracted and Continuous playback, so as to achieve the purpose of users only watching video clips of their favorite actors.

本实施例中，通过获取目标视频；根据第一预设帧间隔，对目标视频进行划分，得到多个视频片断；根据第二预设帧间隔，从每个视频片断中提取出第一帧图像；检测第一帧图像中是否包含人脸信息，将包含有人脸信息的第一帧图像提取出来，得到第二人脸帧图像；基于预设识别模型，对第二人脸帧图像中的人脸身份进行识别，确定第二人脸帧图像中包含的人脸标识；根据人脸标识与第二人脸帧图像的对应关系，根据第二人脸帧图像与第二人脸帧图像所归属的视频片断的对应关系，形成人脸标识、第二人脸帧图像、视频片断的三方对应关系表。从而可以根据确定出的视频片断与人脸标识的对应关系，为用户推送仅包含有其所希望观看的演员出场的视频片断，该方法有效且快速实现人脸识别，人脸视频片断定位，提升用户观赏视频的用户体验。In this embodiment, by acquiring the target video; according to the first preset frame interval, the target video is divided to obtain multiple video clips; according to the second preset frame interval, the first frame image is extracted from each video clip ; Detect whether the first frame image contains human face information, extract the first frame image containing human face information, and obtain the second human face frame image; based on the preset recognition model, the people in the second human face frame image Identify the face identity, determine the face identification contained in the second human face frame image; according to the corresponding relationship between the human face identification and the second human face frame image, according to the belonging The corresponding relationship of the video clips forms a tripartite correspondence table of the face identification, the second face frame image, and the video clips. Thereby, according to the corresponding relationship between the determined video clips and the face identification, the user can push the video clips that only include the appearance of the actors they want to watch. This method effectively and quickly realizes face recognition, face video clip positioning, and improves The user experience of the user watching the video.

图2是根据另一示例性实施例示出的一种视频识别方法的流程图，如图2所示，本实施例的视频处理方法可以应用于视频提供商的视频服务器中也可以应用于接收视频方的终端(客户端设备)中，以下以应用于视频服务器中来举例说明，本实施例的方法包括以下步骤：Fig. 2 is a flow chart of a video recognition method according to another exemplary embodiment. As shown in Fig. 2, the video processing method of this embodiment can be applied to a video server of a video provider or to receive video In the terminal (client device) of the party, the following is used in a video server as an example, the method of this embodiment includes the following steps:

在步骤201中，获取目标视频。In step 201, a target video is acquired.

在步骤202中，根据第一预设帧间隔，对目标视频进行划分，得到多个视频片断。In step 202, the target video is divided according to the first preset frame interval to obtain multiple video clips.

在步骤203中，根据第二预设帧间隔，从每个视频片断中提取出第一帧图像。In step 203, the first frame image is extracted from each video clip according to the second preset frame interval.

其中，第一预设帧间隔大于第二预设帧间隔。优选的，第二预设帧间隔为5帧静态画面。Wherein, the first preset frame interval is greater than the second preset frame interval. Preferably, the second preset frame interval is 5 frames of static images.

在步骤204中，检测第一帧图像中是否包含人脸信息，将包含有人脸信息的第一帧图像提取出来，得到第二人脸帧图像。In step 204, it is detected whether the first frame image contains human face information, and the first frame image containing human face information is extracted to obtain a second human face frame image.

在步骤205中，从目标视频的描述信息中获取与目标视频对应的目标人脸标识。In step 205, the target face identification corresponding to the target video is obtained from the description information of the target video.

具体的，描述信息指对于该目标视频的节目介绍，通常会包含视频中主要演员的演员表，该演员表中的演员名称可以被作为人脸标识，根据该人脸标识对目标视频进行识别，以确定哪些视频片断中包含该人脸标识。在一幅帧图像中，可能包含有多个人脸标识，对每个人脸标识均进行标定，相较于仅对图像中的指定人脸标识进行标定，两者的处理效率有很大的不同。因此，通过对目标人脸标识进行获取，可以加快在目标视频中定位目标人脸的效率。Specifically, the description information refers to the program introduction for the target video, which usually includes the cast list of the main actors in the video, and the names of the actors in the cast list can be used as face identification, and the target video is identified according to the face identification, to determine which video clips contain the face identification. In a frame image, there may be multiple face logos, and each face logo is calibrated. Compared with only the specified face logo in the image, the processing efficiency of the two is very different. Therefore, by acquiring the target face identification, the efficiency of locating the target face in the target video can be accelerated.

在步骤206中，根据目标人脸标识从识别模型数库中调取与目标人脸标识对应的第一预设识别模型。In step 206, a first preset recognition model corresponding to the target face identifier is called from the recognition model database according to the target face identifier.

在步骤207中，基于第一预设识别模型，对第二人脸帧图像中的人脸身份进行识别，在第二人脸帧图像中确定第三人脸帧图像，第三人脸帧图像为包含有目标人脸标识的第二人脸帧图像。In step 207, based on the first preset recognition model, the identity of the face in the second face frame image is identified, the third face frame image is determined in the second face frame image, and the third face frame image It is a second human face frame image containing the target human face identification.

具体的，第一预设识别模型为具有针对性的识别模型，其可以更有针对性地识别出给定的目标人脸。例如，用100位明星的10万张照片训练得到识别模型，训练后该识别模型可以对该100位明星的其他照片进行快速识别。也可以用10位明星或1位明星的10万或其他数量的照片训练得到相应的识别模型，通常来说，在相同训练条件下，识别模型所适用的范围越窄，其对人脸识别的准确度越高。因此，通过根据目标人脸标识从识别模型数库中调取与目标人脸标识对应的第一预设识别模型，并基于该特定的第一预设识别模型，对第二人脸帧图像中的人脸身份进行识别，从而可以在第二人脸帧图像中确定出包含目标人脸标识的第三人脸帧图像。从而使得对目标人脸的识别准确度提高。Specifically, the first preset recognition model is a targeted recognition model, which can identify a given target face more specifically. For example, a recognition model is obtained by training 100,000 photos of 100 celebrities. After training, the recognition model can quickly recognize other photos of the 100 celebrities. It is also possible to use 100,000 or other photos of 10 celebrities or 1 celebrity to train the corresponding recognition model. Generally speaking, under the same training conditions, the narrower the scope of application of the recognition model, the better its performance on face recognition. The higher the accuracy. Therefore, by calling the first preset recognition model corresponding to the target face identifier from the recognition model database according to the target face identifier, and based on the specific first preset recognition model, the second face frame image Identifying the identity of the face, so that the third frame image of the face containing the identity of the target face can be determined in the second frame image of the face. Therefore, the recognition accuracy of the target face is improved.

在步骤208中，根据目标人脸标识与第三人脸帧图像的对应关系，根据第三人脸帧图像与第三人脸帧图像所归属的视频片断的对应关系，形成目标人脸标识、第三人脸帧图像、视频片断的三方对应关系表。In step 208, according to the corresponding relationship between the target human face identification and the third human face frame image, according to the corresponding relationship between the third human face frame image and the video segment to which the third human face frame image belongs, form the target human face identification, A tripartite correspondence table of the third face frame image and the video clip.

可选的，在步骤206、根据目标人脸标识从识别模型数库中调取与目标人脸标识对应的第一预设识别模型之前，还可以包括：Optionally, before step 206, calling the first preset recognition model corresponding to the target face identifier from the recognition model database according to the target face identifier, it may also include:

根据目标人脸标识，从图片数库中调取与目标人脸标识对应的目标人脸图片数据包；According to the target face identification, the target human face picture data packet corresponding to the target human face identification is transferred from the picture database;

采用目标人脸图片数据包作为训练样本，训练得到与目标人脸标识对应的第一预设识别模型。A target face picture data package is used as a training sample, and a first preset recognition model corresponding to the target face identification is obtained through training.

具体的，目标人脸图片数据包中包含有预设数量的与该目标人脸标识对应的人脸图像，即训练样本；通常来说训练样本的数量越多训练得到的识别模型的识别准确率越高，但训练样本的具体数量还需要根据采用的算法属性而定。其中，训练算法可以采用深度卷积神经网络。Specifically, the target face picture data package contains a preset number of face images corresponding to the target face identification, that is, training samples; generally speaking, the more training samples, the greater the recognition accuracy of the recognition model. The higher, but the specific number of training samples also needs to be determined according to the properties of the algorithm used. Among them, the training algorithm can use a deep convolutional neural network.

可选的，形成人脸标识、第二人脸帧图像、视频片断的三方对应关系表之后，还包括：Optionally, after forming the tripartite correspondence table of the face identification, the second face frame image, and the video clip, it also includes:

接收终端发送的视频推送请求，视频推送请求中包含：待推送的人脸标识；Receive the video push request sent by the terminal, the video push request includes: the face ID to be pushed;

根据待推送的人脸标识在三方对应关系表进行查找，将与待推送的人脸标识对应的视频片断推送给终端。Search the three-party correspondence table according to the face ID to be pushed, and push the video clip corresponding to the face ID to be pushed to the terminal.

具体的，用户可以在终端(手机、PAD等)上安装适用于该视频识别方法的应用APP，输入其所希望观看的演员的名字，云端根据该演员的名字，在预先对目标视频分析得到的三方对应关系表中为用户确定与该演员名字对应的视频片断，并推送给终端，使用户可以仅观看该目标视频中有其喜爱演员出场的片断，提高观赏效果。Specifically, the user can install the application APP suitable for the video recognition method on the terminal (mobile phone, PAD, etc.), input the name of the actor he wants to watch, and the cloud will analyze the target video in advance according to the name of the actor. The three-party correspondence table determines the video segment corresponding to the actor's name for the user, and pushes it to the terminal, so that the user can only watch the segment in the target video in which his favorite actor appears, improving the viewing effect.

下面举例说明该视频识别方法对视频中特定演员的定位过程：请参照图3，在图3中，首先对目标视频(例如“奔跑吧兄弟”)进行视频片断划分，如图3中A1～A6所示，划分为6个视频片断；从该目标视频的描述信息中(如演职员介绍、影片介绍等)获取该目标视频中所包含的主要演员信息；(例如包含“杨颖”、“邓超”、“郑凯”等)；在数据库中调取与演员标识对应的人脸图片数据包，例如调取有关“杨颖”的海量图片；将有关“杨颖”的海量图片作为训练样本，训练得到能够识别目标人脸是否为“杨颖”的识别模型。由于每个视频片断是由一个个的帧图像构成，且通常来说一秒钟的视频中就包含了几十帧，根据视频的清晰度不同，高清视频中包含的帧图像数量更多，因此若对目标视频中的每帧图像进行人脸检测，浪费资源，且效率不高；同时对于用户来说观看自己喜欢演员的出场画面，以分钟为截断单位比较合理，以秒进行视频跳转，影响观看感受。因此，视频片断的长度优选的可以定位在半分钟、一分钟。对于从视频片断中提取出的用于检测人脸信息的第一帧图像也不必每帧必检，可以采用预设步长的第二预设帧间隔进行提取并检测。如图3中B所示，每个视频片断中提取出一定数量的帧图像作为待检测的第一帧图像B。对提取出的第一帧图像B进行人脸检测，检测算法可以采用AdaBoost迭代算法，该算法可以有效提高人脸图像的检出效率，同时提高检测的准确性。如图3中C所示，将第一帧图像B中检出包含人脸图像的第二人脸帧图像C提取出来，用于进行人脸识别。将第二人脸帧图像C分别输入到之前得到的“杨颖”、“邓超”、“郑凯”各自的识别模型中进行识别，得到如图3所示的第三人脸帧图像D1、D2、D3、D4，其中D1中包含“杨颖”、D2中包含“邓超”、D3中包含“杨颖”和“邓超”、D4中包含“邓超”、“郑凯”等识别结果。如表1所示：确定出第三人脸帧图像与视频片断的对应关系，形成目标人脸标识、第三人脸帧图像与视频片断的三方对应关系表。The following example illustrates how the video recognition method locates a specific actor in a video: Please refer to FIG. 3. In FIG. As shown, be divided into 6 video clips; Obtain the main actor information contained in this target video from the descriptive information of this target video (such as cast and crew introduction, film introduction etc.); (for example include " Yang Ying ", " Deng Chao ", "Zheng Kai", etc.); in the database, the face picture data package corresponding to the actor's identification is called, for example, a large number of pictures about "Yang Ying" are called; a large number of pictures about "Yang Ying" are used as training samples, A recognition model that can recognize whether the target face is "Yang Ying" is obtained through training. Because each video clip is composed of frame images one by one, and generally speaking, a second of video contains dozens of frames. According to the definition of the video, the number of frame images contained in high-definition video is more, so If face detection is performed on each frame of the target video, resources are wasted, and the efficiency is not high; at the same time, for users to watch the appearance screen of their favorite actors, it is more reasonable to use minutes as the truncation unit, and video jumps to be performed in seconds. affect the viewing experience. Therefore, the length of the video segment can preferably be positioned at half a minute or one minute. For the first frame image extracted from the video clip and used to detect the face information, it is not necessary to check every frame, and the second preset frame interval with a preset step size can be used for extraction and detection. As shown in B in FIG. 3 , a certain number of frame images are extracted from each video clip as the first frame image B to be detected. Face detection is performed on the extracted first frame image B, and the detection algorithm can use the AdaBoost iterative algorithm, which can effectively improve the detection efficiency of the face image and improve the detection accuracy at the same time. As shown in C in FIG. 3 , the second human face frame image C detected to contain the human face image in the first frame image B is extracted for face recognition. Input the second face frame image C into the respective recognition models of "Yang Ying", "Deng Chao" and "Zheng Kai" obtained before for recognition, and obtain the third face frame image D1, D2, D3, D4, wherein D1 contains "Yang Ying", D2 contains "Deng Chao", D3 contains "Yang Ying" and "Deng Chao", D4 contains "Deng Chao", "Zheng Kai" and other recognition results. As shown in Table 1: the corresponding relationship between the third human face frame image and the video clip is determined, and a tripartite correspondence table of the target face identifier, the third human face frame image and the video clip is formed.

表1、目标人脸标识、第三人脸帧图像与视频片断的三方对应关系表Table 1. The three-party correspondence table between the target face identification, the third face frame image and the video clip

若接收到用户选择观看有“杨颖”出现的视频片断的推送请求，则可以为其连续播放A2和A3视频片断，从而快速为用户定位到其所希望看到的演员的视频。If a push request is received that the user chooses to watch a video clip with "Yang Ying" appearing, the video clips A2 and A3 can be played continuously for the user, thereby quickly locating the video of the actor he wants to see for the user.

综上，本实施例通过对视频进行片断划分，并在划分后的视频片断中，针对特定人脸进行特定人脸识别模型建立，并基于该特定人脸识别模型对各个视频片断中的特定人脸进行识别，有效提高识别效率，可以为用户快速推送仅包含有其所希望观看的演员出场的视频片断，该方法有效且快速实现人脸识别，人脸视频片断定位，提升用户观赏视频的用户体验。To sum up, this embodiment divides the video into segments, and in the divided video segments, establishes a specific face recognition model for a specific face, and based on the specific face recognition model, detects the specific person in each video segment. Face recognition can effectively improve the recognition efficiency, and users can quickly push video clips that only contain the actors they want to watch. This method effectively and quickly realizes face recognition, face video clip positioning, and improves users' viewing experience. experience.

下述为本公开装置实施例，可以用于执行本公开方法实施例。对于本公开装置实施例中未披露的细节，请参照本公开方法实施例。The following are device embodiments of the present disclosure, which can be used to implement the method embodiments of the present disclosure. For details not disclosed in the disclosed device embodiments, please refer to the disclosed method embodiments.

图4是根据一示例性实施例示出的一种视频识别装置的流程图，如图4所示，该视频识别装置可以通过软件、硬件或者两者的结合实现成为电子设备的部分或者全部。该视频处理装置可以包括：Fig. 4 is a flow chart of a video recognition device according to an exemplary embodiment. As shown in Fig. 4, the video recognition device can be implemented as part or all of an electronic device through software, hardware or a combination of the two. The video processing device may include:

第一获取模块41，用于获取目标视频。第一划分模块42，用于根据第一预设帧间隔，对目标视频进行划分，得到多个视频片断。第二划分模块43，用于根据第二预设帧间隔，从每个视频片断中提取出第一帧图像。检测模块44，用于检测第一帧图像中是否包含人脸信息，将包含有人脸信息的第一帧图像提取出来，得到第二人脸帧图像。识别模块45，用于基于预设识别模型，对第二人脸帧图像中的人脸身份进行识别，确定第二人脸帧图像中包含的人脸标识。匹配模块46，用于根据人脸标识与第二人脸帧图像的对应关系，根据第二人脸帧图像与第二人脸帧图像所归属的视频片断的对应关系，形成人脸标识、第二人脸帧图像、视频片断的三方对应关系表。The first acquiring module 41 is configured to acquire the target video. The first division module 42 is configured to divide the target video according to the first preset frame interval to obtain multiple video clips. The second dividing module 43 is used for extracting the first frame image from each video segment according to the second preset frame interval. The detection module 44 is used to detect whether the first frame image contains human face information, extract the first frame image containing human face information, and obtain the second human face frame image. The identification module 45 is configured to identify the identity of the face in the second frame image of the face based on the preset identification model, and determine the identity of the face contained in the frame image of the second face. Matching module 46, is used for according to the corresponding relation of face mark and the second frame image of face, according to the corresponding relation of the video segment that the frame image of the second face and the frame image of the second face belong to, form face mark, the first frame image of face The tripartite correspondence table of two face frame images and video clips.

图5是根据另一示例性实施例示出的一种视频识别装置的流程图，该视频识别装置可以通过软件、硬件或者两者的结合实现成为电子设备的部分或者全部。基于上述装置实施例，第一预设帧间隔大于第二预设帧间隔。Fig. 5 is a flow chart of a video recognition device according to another exemplary embodiment. The video recognition device may be implemented as part or all of an electronic device through software, hardware or a combination of the two. Based on the above device embodiment, the first preset frame interval is greater than the second preset frame interval.

可选的，该视频识别装置还包括：Optionally, the video recognition device also includes:

第二获取模块47，用于从目标视频的描述信息中获取与目标视频对应的目标人脸标识。The second acquiring module 47 is configured to acquire the target face identification corresponding to the target video from the description information of the target video.

相应的，识别模块45包括：Correspondingly, the recognition module 45 includes:

调取子模块451，用于根据目标人脸标识从识别模型数库中调取与目标人脸标识对应的第一预设识别模型。The call sub-module 451 is configured to call the first preset recognition model corresponding to the target face mark from the recognition model database according to the target face mark.

识别子模块452，用于基于第一预设识别模型，对第二人脸帧图像中的人脸身份进行识别。The identification sub-module 452 is configured to identify the face identity in the second face frame image based on the first preset identification model.

确定子模块453，用于在第二人脸帧图像中确定第三人脸帧图像，第三人脸帧图像为包含有目标人脸标识的第二人脸帧图像。The determination sub-module 453 is configured to determine a third human face frame image in the second human face frame image, and the third human face frame image is the second human face frame image including the target human face identification.

相应的，匹配模块46，具体用于根据目标人脸标识与第三人脸帧图像的对应关系，根据第三人脸帧图像与第三人脸帧图像所归属的视频片断的对应关系，形成目标人脸标识、第三人脸帧图像、视频片断的三方对应关系表。Correspondingly, the matching module 46 is specifically configured to, according to the corresponding relationship between the target face identification and the third human face frame image, and according to the corresponding relationship between the third human face frame image and the video segment to which the third human face frame image belongs, to form A tripartite correspondence table of the target face identification, the third face frame image, and the video clip.

图片获取模块48，用于根据目标人脸标识，从图片数库中调取与目标人脸标识对应的目标人脸图片数据包。The picture acquisition module 48 is used for calling the target human face picture data package corresponding to the target human face identification from the picture database according to the target human face identification.

训练模块49，用于采用目标人脸图片数据包作为训练样本，训练得到与目标人脸标识对应的第一预设识别模型。The training module 49 is configured to use the target face picture data package as a training sample to train to obtain a first preset recognition model corresponding to the target face identification.

接收模块50，用于接收终端发送的视频推送请求，视频推送请求中包含：待推送的人脸标识。The receiving module 50 is configured to receive the video push request sent by the terminal, and the video push request includes: the face identification to be pushed.

查找模块51，用于根据待推送的人脸标识在三方对应关系表进行查找，将与待推送的人脸标识对应的视频片断推送给终端。The search module 51 is used to search in the three-party correspondence table according to the face identification to be pushed, and push the video clip corresponding to the face identification to be pushed to the terminal.

关于上述实施例中的装置，其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述，此处将不做详细阐述说明。Regarding the apparatus in the foregoing embodiments, the specific manner in which each module executes operations has been described in detail in the embodiments related to the method, and will not be described in detail here.

本领域技术人员在考虑说明书及实践这里公开的发明后，将容易想到本公开的其它实施方案。本申请旨在涵盖本公开的任何变型、用途或者适应性变化，这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的，本公开的真正范围和精神由下面的权利要求指出。Other embodiments of the present disclosure will be readily apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any modification, use or adaptation of the present disclosure, and these modifications, uses or adaptations follow the general principles of the present disclosure and include common knowledge or conventional technical means in the technical field not disclosed in the present disclosure . The specification and examples are to be considered exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

应当理解的是，本公开并不局限于上面已经描述并在附图中示出的精确结构，并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。It should be understood that the present disclosure is not limited to the precise constructions which have been described above and shown in the drawings, and various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A video recognition method, characterized in that the method comprises:

Get the target video;

According to the first preset frame interval, the target video is divided to obtain a plurality of video clips;

extracting a first frame image from each of the video clips according to a second preset frame interval;

Detecting whether the first frame image contains human face information, extracting the first frame image containing human face information, and obtaining the second human face frame image;

Based on the preset identification model, identify the face identity in the second frame image of human face, and determine the face identity contained in the second frame image of human face;

According to the corresponding relationship between the face identification and the second human face frame image, according to the corresponding relationship between the second human face frame image and the video clip to which the second human face frame image belongs, the person is formed. A three-party correspondence table of the face identifier, the frame image of the second human face, and the video clip.

2. The method according to claim 1, wherein the first preset frame interval is greater than the second preset frame interval.

3. The method according to claim 1, characterized in that, based on the preset recognition model, the identity of the face in the second frame image of the face is identified, and the identity of the face in the frame image of the second face is determined. Before the included face ID, also include:

Acquire the target face identification corresponding to the target video from the description information of the target video;

Correspondingly, the recognition of the face identity in the second face frame image based on the preset recognition model, and determining the face identity contained in the second face frame image include:

Call the first preset recognition model corresponding to the target face identifier from the recognition model database according to the target face identifier, and based on the first preset recognition model, frame the image of the second face Identify the identity of the face in the frame image, determine the third frame image of the face in the second frame image of the face, and the frame image of the third face is the second frame image of the face that contains the target face identification ;

Correspondingly, according to the correspondence between the face identification and the second face frame image, according to the correspondence between the second face frame image and the video clip to which the second face frame image belongs , forming the tripartite correspondence table of the face identification, the second face frame image, and the video clip includes:

Forming the A three-party correspondence table of the target face identifier, the third face frame image, and the video clip.

4. The method according to claim 3, wherein, before calling the first preset recognition model corresponding to the target face mark from the recognition model database according to the target face mark, further include:

According to the target human face identification, the target human face picture data package corresponding to the target human face identification is called from the picture data base;

Using the target face picture data package as a training sample, the first preset recognition model corresponding to the target face identifier is obtained through training.

5. The method according to any one of claims 1 to 4, characterized in that, after forming the tripartite correspondence table of the face identifier, the second face frame image, and the video clip, further include:

Receive the video push request sent by the terminal, the video push request includes: the face identification to be pushed;

Searching the tripartite correspondence table according to the face ID to be pushed, and pushing the video clip corresponding to the face ID to be pushed to the terminal.

6. A video recognition device, characterized in that the device comprises:

The first obtaining module is used to obtain the target video;

A first division module, configured to divide the target video according to a first preset frame interval to obtain a plurality of video clips;

The second dividing module is used to extract the first frame image from each of the video clips according to the second preset frame interval;

A detection module, configured to detect whether the first frame image contains face information, extract the first frame image containing face information, and obtain a second face frame image;

A recognition module, configured to recognize the face identity in the second face frame image based on a preset recognition model, and determine the face identity contained in the second face frame image;

A matching module, configured to, according to the correspondence between the face identification and the second face frame image, and according to the correspondence between the second face frame image and the video segment to which the second face frame image belongs and forming a tripartite correspondence table of the face identifier, the second face frame image, and the video clip.

7. The device according to claim 6, wherein the first preset frame interval is greater than the second preset frame interval.

8. The device according to claim 6, further comprising:

The second obtaining module is used to obtain the target face identification corresponding to the target video from the description information of the target video;

Correspondingly, the identification module includes:

The calling submodule is used to call the first preset recognition model corresponding to the target face mark from the recognition model database according to the target face mark;

An identification submodule, configured to identify the face identity in the second face frame image based on the first preset identification model;

A determining submodule, configured to determine a third face frame image in the second face frame image, where the third face frame image is a second face frame image containing the target face identification;

Correspondingly, the matching module is specifically configured to: according to the corresponding relationship between the target face identifier and the third face frame image, according to the third face frame image and the third face frame image The corresponding relationship of the attributed video clips forms a tripartite correspondence table of the target face identifier, the third human face frame image, and the video clips.

9. The device according to claim 8, further comprising:

The picture acquisition module is used to retrieve the target face picture data packet corresponding to the target face mark from the picture database according to the target face mark;

A training module, configured to use the target face picture data package as a training sample to train to obtain the first preset recognition model corresponding to the target face identification.

10. The device according to any one of claims 6-9, characterized in that the device further comprises:

The receiving module is used to receive the video push request sent by the terminal, and the video push request includes: the face identification to be pushed;

A search module, configured to search in the tripartite correspondence table according to the face ID to be pushed, and push the video clip corresponding to the face ID to be pushed to the terminal.