WO2018157746A1

WO2018157746A1 - Recommendation method and apparatus for video data

Info

Publication number: WO2018157746A1
Application number: PCT/CN2018/076784
Authority: WO
Inventors: 张亚楠; 王瑜
Original assignee: 阿里巴巴集团控股有限公司
Priority date: 2017-02-28
Filing date: 2018-02-14
Publication date: 2018-09-07
Also published as: CN108509457A; TWI753044B; TW201834463A

Abstract

Provided are a recommendation method and apparatus for video data. The recommendation method comprises: acquiring one or more pieces of video data to be detected; respectively extracting quality feature information about each piece of video data to be detected; recognising the quality feature information using a pre-set video data detection model, so as to obtain target video data; and recommending the target video data to a user. In the embodiments of the present application, video data of high quality can be quickly screened out using a deep learning model. The embodiments of the present application solve the problem in the art that a video clip can only be recommended to a user by relying on artificial recognition, thereby improving the recognition efficiency for video data and the accuracy rate of recommendation.

Description

Method and device for recommending video data

The present application claims priority to Chinese Patent Application Serial No. No. No. No. No. No. No. No. No. No. No. No. No. No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No

Technical field

The present invention relates to the field of data processing technologies, and in particular, to a method for recommending video data, a device for recommending video data, a method for generating a video data detection model, a device for generating a video data detection model, and a video. A method of identifying data and a device for identifying video data.

Background technique

The development of e-commerce has significantly improved the convenience of people's daily life. Through e-commerce websites, people can easily purchase goods and complete payment, saving shopping time.

In order to better help users understand the characteristics of the target products, e-commerce websites begin to use video content for shopping guide and marketing, that is, input corresponding text information according to operational needs, and then select appropriate video frames from the video library, and then adopt according to text semantics. The video frame constructs a video of the appropriate scene and is recommended to the target user.

However, in practical applications, after a large amount of video content is extracted and synthesized into a video, the quality of the synthesized video needs to be detected and evaluated to select the optimal video to be delivered to the target user. The detection and evaluation of video quality in the prior art mainly relies on manual auditing by operators, which not only consumes a large amount of operational resources, but also in most cases, manual auditing cannot perform real-time on synthesized video. deal with.

Summary of the invention

In view of the above problems, an embodiment of the present application is provided to provide a video data recommendation method, a video data recommendation device, and a video data detection model generation method, which overcome the above problems or at least partially solve the above problems. A device for generating a video data detection model, a method for identifying video data, and a corresponding device for identifying video data.

In order to solve the above problem, the present application discloses a method for recommending video data, including:

Obtaining one or more video data to be detected;

Extracting quality characteristic information of each video data to be detected separately;

Identifying the quality feature information by using a preset video data detection model to obtain target video data;

The target video data is recommended to the user.

Optionally, the preset video data detection model is generated by:

Extracting quality feature information of the plurality of sample video data, the plurality of sample video data including a plurality of forward sample video data and negative direction sample video data;

Training is performed by using quality feature information of the plurality of forward sample video data and negative sample video data to generate a video data detection model.

Optionally, the quality feature information includes image pixel feature information, continuous frame image object migration feature information, continuous frame image motion feature information, different frequency domain feature information of the image frame, image frame wavelet transform feature information, and/or, Image rotation operator feature information.

Optionally, the step of separately extracting quality feature information of the plurality of sample video data includes:

Extracting pixel information of each frame image of each sample video data;

The pixel information is separately subjected to convolution operation and pooling processing to obtain image pixel feature information.

Identifying an object object in each frame of image of each sample video data;

The number and frequency of occurrences of the object objects in the adjacent two frames of images are respectively determined to obtain continuous frame image object migration feature information.

Identifying a shape feature of the action object in each frame image of each sample video data;

The geometric parameters of the shape features of the motion objects in the adjacent two frames of images are respectively determined to obtain continuous frame image motion feature information.

Determining the amplitude and phase of each frame of image of each sample video data;

The amplitude difference and the phase difference of the adjacent two frames of images are respectively determined to obtain different frequency domain feature information of the image frame.

Determining a wavelet coefficient of each frame image of each sample video data;

The change values of the wavelet coefficients of the adjacent two frames of images are respectively determined to obtain image frame wavelet transform feature information.

Determining a rotation operator for each frame of image of each sample video data;

The change values of the rotation operators of the adjacent two frames of images are respectively determined to obtain image rotation operator feature information.

Optionally, the step of training by using the quality feature information of the plurality of forward sample video data and the negative sample video data to generate the video data detection model includes:

Normalizing the quality feature information of the plurality of forward sample video data and negative sample video data to obtain normalized quality feature information;

Completing the missing value of the normalized quality feature information;

Identifying target quality feature information from the normalized quality feature information;

The target quality feature information is used to train the neural network model to generate a video data detection model.

Optionally, the step of identifying target quality feature information from the normalized quality feature information includes:

Determining an information entropy of the normalized quality feature information;

The quality feature information identifying that the information entropy exceeds the first preset threshold is the target quality feature information.

Optionally, it also includes:

Obtain attribute information of multiple users;

And the plurality of users are clustered into a plurality of user groups according to the attribute information, and the user groups have corresponding user labels.

Optionally, the step of identifying the quality feature information by using a preset video data detection model to obtain target video data includes:

Determining quality characteristic information of the one or more video data to be detected by using a preset video data detection model to obtain a quality score of the one or more video data to be detected;

The video data whose quality score exceeds the second preset threshold is extracted as target video data.

Optionally, the step of recommending the target video data to a user includes:

Determining a target user group among the plurality of user groups;

The target video data is recommended to the target user group.

Optionally, the target video data has a corresponding video tag, and the step of determining a target user group among the multiple user groups includes:

Determining a user group corresponding to the same user tag of the video tag of the target video data as a target user group.

In order to solve the above problem, the present application discloses a method for generating a video data detection model, including:

In order to solve the above problem, the present application discloses a method for identifying video data, including:

Obtaining one or more video data to be detected;

Sending the one or more video data to be detected to a server, where the server is configured to separately identify the one or more video data to be detected to obtain a recognition result, where the recognition result includes one or more Candidate video data;

Receiving the one or more candidate video data returned by the server;

Determining target video data in the one or more candidate video data;

Presenting the target video data.

In order to solve the above problem, the present application discloses a recommendation device for video data, including:

An obtaining module, configured to acquire one or more video data to be detected;

An extraction module, configured to separately extract quality feature information of each video data to be detected;

An identification module, configured to identify the quality feature information by using a preset video data detection model to obtain target video data;

a recommendation module for recommending the target video data to a user.

Optionally, the preset video data detection model is generated by calling the following module:

a quality feature information extraction module, configured to separately extract quality feature information of the plurality of sample video data, where the plurality of sample video data includes a plurality of forward sample video data and negative sample video data;

The video data detection model generating module is configured to perform training by using the quality feature information of the plurality of forward sample video data and the negative sample video data to generate a video data detection model.

Optionally, the quality feature information extraction module includes:

a pixel information extraction submodule, configured to extract pixel information of each frame image of each sample video data;

The pixel information processing sub-module is configured to perform convolution operation and pooling processing on the pixel information to obtain image pixel feature information.

Optionally, the quality feature information extraction module further includes:

An object object recognition sub-module for identifying an object object in each frame image of each sample video data;

The object object processing sub-module is configured to respectively determine the number and frequency of occurrences of the object objects in the adjacent two frames of images to obtain continuous frame image object migration feature information.

Optionally, the quality feature information extraction module further includes:

a motion object recognition submodule, configured to identify a shape feature of the motion object in each frame image of each sample video data;

The action object processing sub-module is configured to respectively determine geometric parameters of the shape features of the action objects in the adjacent two frames of images to obtain continuous frame image action feature information.

Optionally, the quality feature information extraction module further includes:

An amplitude and phase determination sub-module for determining a magnitude and a phase of each frame image of each sample video data;

The amplitude and phase processing sub-module is configured to respectively determine amplitude difference and phase difference of adjacent two frames of images to obtain different frequency domain feature information of the image frame.

Optionally, the quality feature information extraction module further includes:

a wavelet coefficient determining submodule for determining a wavelet coefficient of each frame image of each sample video data;

The wavelet coefficient processing sub-module is configured to respectively determine the variation values of the wavelet coefficients of the adjacent two frames of images to obtain image frame wavelet transform feature information.

Optionally, the quality feature information extraction module further includes:

a rotation operator determining sub-module for determining a rotation operator of each frame image of each sample video data;

The rotation operator processing sub-module is configured to respectively determine a variation value of a rotation operator of the adjacent two frames of images to obtain image rotation operator feature information.

Optionally, the video data detection model generating module includes:

a normalization processing sub-module, configured to normalize quality characteristic information of the plurality of forward sample video data and negative-direction sample video data to obtain normalized quality feature information;

a missing value completion sub-module for complementing the missing value of the normalized quality feature information;

a target quality feature information identifying submodule, configured to identify target quality feature information from the normalized quality feature information;

The video data detection model generation submodule is configured to perform neural network model training by using the target quality feature information, and generate a video data detection model.

Optionally, the target quality feature information identifying submodule includes:

An information entropy determining unit, configured to determine an information entropy of the normalized quality feature information;

The target quality feature information identifying unit is configured to identify the quality feature information that the information entropy exceeds the first preset threshold as the target quality feature information.

Optionally, generating the preset video data detection model further invokes the following modules:

An attribute information obtaining module, configured to acquire attribute information of multiple users;

The user group clustering module is configured to cluster the plurality of users into a plurality of user groups according to the attribute information, where the user group has a corresponding user label.

Optionally, the identifying module includes:

a quality feature information identifying sub-module, configured to identify, by using a preset video data detection model, quality characteristic information of the one or more video data to be detected, respectively, to obtain the one or more video data to be detected. Quality score

The target video data extraction sub-module is configured to extract video data whose quality score exceeds a second preset threshold as target video data.

Optionally, the recommendation module includes:

a target user group determining submodule, configured to determine a target user group among the plurality of user groups;

The target video data recommendation submodule is configured to recommend the target video data to the target user group.

Optionally, the target video data has a corresponding video tag, and the target user group determining submodule includes:

The target user group determining unit is configured to determine a user group corresponding to the same user tag of the video tag of the target video data as a target user group.

In order to solve the above problem, the present application discloses a device for generating a video data detection model, including:

In order to solve the above problem, the present application discloses an apparatus for identifying video data, including:

a sending module, configured to send the one or more video data to be detected to a server, where the server is configured to separately identify the one or more video data to be detected to obtain a recognition result, and the identifying The result includes one or more candidate video data;

a receiving module, configured to receive the one or more candidate video data returned by the server;

a determining module, configured to determine target video data in the one or more candidate video data;

a presentation module for presenting the target video data.

Compared with the background art, the embodiments of the present application include the following advantages:

In the embodiment of the present application, one or more video data to be detected are acquired, and quality characteristic information of each video data to be detected is separately extracted, and then the quality feature information is identified by using a preset video data detection model. The target video data is obtained, and the target video data is recommended to the user, and the high-quality video data can be quickly selected by using the deep learning model. The embodiment of the present application solves the problem that the prior art can only rely on manual identification and recommend the video to the user. The problem of the segment improves the recognition efficiency of the video data and the accuracy of the recommendation.

DRAWINGS

1 is a flow chart showing the steps of Embodiment 1 of a method for recommending video data according to the present application;

2 is a flow chart of steps of a second embodiment of a method for recommending video data according to the present application;

3 is a schematic block diagram of a method for recommending video data according to the present application;

4 is a flow chart showing the steps of an embodiment of a method for generating a video data detection model according to the present application;

5 is a flow chart showing the steps of an embodiment of a method for identifying video data according to the present application;

6 is a structural block diagram of an embodiment of a device for recommending video data according to the present application;

7 is a structural block diagram of an embodiment of a device for generating a video data detection model according to the present application;

FIG. 8 is a structural block diagram of an embodiment of an apparatus for identifying video data according to the present application.

detailed description

The above described objects, features and advantages of the present application will become more apparent and understood.

Referring to FIG. 1 , a flow chart of a first embodiment of a method for recommending video data according to the present application is shown. Specifically, the method may include the following steps:

Step 101: Acquire one or more video data to be detected;

In the embodiment of the present application, the video data to be detected may be an off-the-shelf video segment obtained from various ways, or may be a video segment that is synthesized in real time by extracting multiple video frames according to a certain rule in the video library. The application embodiment does not limit the specific source and type of video data.

Step 102: Extract quality characteristic information of each video data to be detected, respectively.

In the embodiment of the present application, the quality feature information of the video data may be feature information for identifying the quality of the video data, for example, image pixels of the video data, content displayed by the image, and the like. By identifying the quality characteristic information of the video data, it is possible to check the fluency, consistency, and the like of the video clip.

Certainly, the type of the quality feature information to be extracted and the manner of the extraction are determined by a person skilled in the art according to actual needs, which is not limited by the embodiment of the present application.

Step 103: Identify the quality feature information by using a preset video data detection model to obtain target video data.

In the embodiment of the present application, the preset video data detection model may be generated by training a plurality of sample video data in the training sample set, so that each quality feature information of the video data to be detected may be identified.

In a specific implementation, the plurality of sample video data in the training sample set may include a plurality of forward sample video data and a plurality of negative sample video data, and the forward sample video data may be a video segment with better video quality, for example, A video clip with better fluency and coherence and a more uniform overall style between video frames. Usually such forward sample video data can be obtained by manual marking or web crawling; contrary to the forward sample video data, The negative sample video data is a video segment with poor fluency, coherence, and overall style consistency between video frames. Generally, such negative sample video data can be obtained by randomly synthesizing multiple video frames. The source and the identification manner of the forward sample video data and the negative sample video data are not limited in the embodiment of the present application.

After the plurality of forward sample video data and the negative sample video data are aggregated to form a training sample set, the quality feature information of the forward sample video data and the negative sample video data may be respectively extracted, and model training is performed to generate a video. The data detection model is further configured to: after extracting the quality feature information of the video data to be detected, use the video data detection model to identify the quality feature information to obtain target video data.

In the embodiment of the present application, the target video data may be a video clip of good quality obtained after being identified by the video data detection model.

Step 104, recommending the target video data to a user.

In a specific implementation, the target video data is recommended to the user, and the target video segment may be played in the user interface, or the target video segment may be pushed to the user. The specific manner of recommending the target video data is not limited in this embodiment of the present application. .

In the embodiment of the present application, one or more video data to be detected are acquired, and quality characteristic information of each video data to be detected is separately extracted, and then the quality feature information is performed by using a preset video data detection model. The target video data is obtained to obtain the target video data, and the target video data is recommended to the user. The deep learning model in the embodiment of the present application can quickly screen out high-quality video data, and solves the problem that the prior art can only rely on manual identification and recommend to the user. The problem of video clips improves the efficiency of recognition of video data and the accuracy of recommendations.

Referring to FIG. 2, a flow chart of the steps of the second embodiment of the method for recommending video data of the present application is shown. Specifically, the method may include the following steps:

Step 201: Extract quality feature information of a plurality of sample video data, where the plurality of sample video data includes a plurality of forward sample video data and negative direction sample video data;

As shown in FIG. 3, it is a functional block diagram of a method for recommending video data of the present application. The embodiment of the present application performs feature extraction on the training sample set, and then performs deep learning modeling, and then uses the trained model to evaluate the detected video data, outputs corresponding quality scores, and simultaneously integrates users in the modeling process. The attribute information clusters the user groups to implement video recommendations to the user community.

In the embodiment of the present application, the forward sample video data may be a video segment with better video quality, for example, a video segment with better fluency and coherence and a uniform overall style between video frames, usually The class forward sample video data can be obtained by manual marking. The operator checks the fluency and consistency of the video segment and the overall style between the video frames, so that the fluency and coherence are better, and the video frames are better. The video clips with more consistent overall style are marked as forward sample video data, and can also be obtained through web crawling, that is, by capturing some high-quality videos with high click-through rate and many praises from the video website, as a network crawling Forward sample video data.

Contrary to the forward sample video data, the negative sample video data is a video segment with poor fluency, coherence, and overall style consistency between video frames. Usually such negative sample video data can pass through Multiple video frames are obtained by random synthesis. For example, some scattered video frame segments can be randomly extracted from multiple categories (such as travel, religion, and electronic products), and then the extracted video frame segments can be randomly combined and spliced. There are a large number of inconsistencies and semantic inconsistencies, so that such spliced video segments can be used as negative sample video data.

Of course, those skilled in the art can also obtain the forward sample video data and the negative sample video data in other manners, which is not limited in this embodiment of the present application.

The obtained forward sample video data and negative sample video data can then be used as a training sample set for subsequent model training.

In a specific implementation, the quality feature information of the plurality of sample video data in the training sample set may be separately extracted first.

As an example of the embodiment of the present application, the quality feature information may include image pixel feature information, continuous frame image object migration feature information, continuous frame image motion feature information, different frequency domain feature information of the image frame, and image frame wavelet transform. Feature information, and/or image rotation operator feature information.

The following describes a method for extracting the above six kinds of feature information one by one.

In the embodiment of the present application, for image pixel feature information, pixel information of each frame image of each sample video data may be extracted, and then the pixel information is separately subjected to convolution operation and pooling processing to obtain image pixel features. information.

Generally, an image is obtained by intercepting each frame of a video segment. Therefore, pixel information in each frame image can be extracted separately as a feature set to be processed, and then the pixel information in the feature set is convoluted. And further performing pooling processing (max-pooling) on the feature set obtained after the convolution operation, thereby obtaining image pixel feature information.

After the image pixels are processed in the embodiment of the present application, the most significant description of the pixel information can be obtained. After the processing, the corresponding features not only have a reduced dimension, but also can express the original semantic meaning of the image.

In the embodiment of the present application, the object objects in each frame image of each sample video data may be identified, and then the number and frequency of occurrences of the object objects in the adjacent two frames of images may be respectively determined to obtain continuous frame image object migration. Feature information.

In a specific implementation, each frame image may be separately analyzed, and object objects in each frame image are identified and extracted, and then sorted according to the chronological order of each frame, thereby determining objects in adjacent two frames of images. The number of times the object appears, the frequency, and the number of occurrences of the association between the object objects, the probability of occurrence of the association, and the like, as the continuous frame image object migration feature information.

It should be noted that when determining the number of times, the frequency, and the like of the object object in the two adjacent frames, the partial adjacent image frames may be selected according to actual needs, and the number of adjacent image frames selected in this embodiment of the present application. Not limited.

Similar to the extraction method of the continuous frame image object migration feature information, when extracting the continuous frame image motion feature information, the embodiment of the present application can identify the shape feature of the action object in each frame image of each sample video data, and then respectively The geometric parameters of the shape features of the motion objects in the adjacent two frames of images are determined to obtain continuous frame image motion feature information.

For example, the motion object in each frame image can be separately identified, and the geometric boundary of the motion object can be determined, and then the geometric boundary of the motion in each frame image and the geometry of the motion in the previous frame image can be determined. The shape boundaries are compared, and the geometric parameters of the shape features of the motion object are calculated according to the geometric affine transformation, and the geometric parameters are used as continuous frame image motion feature information.

In the embodiment of the present application, for different frequency domain feature information of an image frame, the amplitude and phase of each frame image of each sample video data may be determined, and then the amplitude difference and phase of the adjacent two frames of images may be respectively determined. Poor to obtain different frequency domain feature information of the image frame.

In a specific implementation, the Fourier transform of each frame image may be first performed and the spectrum system features are extracted, and then the amplitude and phase features of each of the plurality of different spectrum systems are extracted, and these features are used as feature sets of each frame image. Then, the amplitude difference and the phase difference of the adjacent two frames are calculated, and the amplitude difference and phase difference of the adjacent two frames of images are obtained.

For the wavelet transform feature information, the embodiment of the present application may determine the wavelet coefficients of each frame image of each sample video data, and then determine the change values of the wavelet coefficients of the adjacent two frames respectively to obtain the image frame wavelet transform feature information. .

Specifically, wavelet transform processing may be performed on each frame image to obtain corresponding wavelet coefficients, and then each frame image is sorted in time series, respectively, and wavelet coefficients of adjacent two frames are calculated, and wavelet coefficients are extracted. The changed difference is used as wavelet transform feature information.

In the embodiment of the present application, for the image rotation operator feature information, the rotation operator of each frame image of each sample video data may be first determined, and then the change values of the rotation operators of the adjacent two frame images are respectively determined, and obtained. Image rotation operator feature information.

Specifically, the rotation operator of each frame image may be first calculated, and then each frame image is sorted in time series, and the change value of the rotation operator between the adjacent two frames of images is determined to obtain image rotation operator feature information.

In a specific implementation, the rotation operator for calculating each frame image may adopt a SIFT (Scale-invariant feature transform) algorithm, which is an algorithm for detecting local features, by seeking a picture The feature points and their scale and direction descriptors obtain features and perform image feature point matching. The essence is to find key points (feature points) in different scale spaces and calculate the direction of the key points.

The above is how to extract image pixel feature information of video data, continuous frame image object migration feature information, continuous frame image motion feature information, different frequency domain feature information of image frame, image frame wavelet transform feature information and image rotation operator feature information. In the introduction, the above-mentioned feature information may be extracted by other methods in the art, and the comparison of the embodiments of the present application is not limited.

Step 202: Perform training by using quality feature information of the plurality of forward sample video data and negative sample video data to generate a video data detection model.

After obtaining the plurality of types of quality feature information of the sample video data respectively, the quality feature information may be used for model training to generate a video data detection model.

In a specific implementation, the quality feature information of the plurality of forward sample video data and the negative sample video data may be normalized to obtain normalized quality feature information, and the normalization may be complemented. The missing value of the quality feature information is then identified from the normalized quality feature information, and then the target quality feature information is used for neural network model training to generate a video data detection model.

In the embodiment of the present application, the identifying the target quality feature information may be screening out the high discriminative feature information. Specifically, the information entropy of the normalized quality feature information may be first determined. Due to the larger the information entropy, the richer information is enriched, and the importance of the feature is greater, and the more it should be retained. Therefore, the quality feature information whose information entropy exceeds the first preset threshold can be identified as the target quality feature. information.

In the embodiment of the present application, when the video data detection model is generated, the personalized feature information of the user may also be integrated, so that when the video data to be detected is identified, the evaluation of the video data and the user attribute may be combined. Improve the relevance and effectiveness of recommended video data.

In a specific implementation, attribute information of multiple users may be acquired, and then, according to the attribute information, the multiple users are clustered into multiple user groups, and the user groups have corresponding user labels, so that the training samples are When the centralized video data is used for model training, the attribute information of the user can be effectively integrated.

Step 203: Acquire one or more video data to be detected.

In the embodiment of the present application, the video data to be detected may be a video segment synthesized in real time by extracting a plurality of video frames according to a certain rule in a video library. For example, when the e-commerce website uses the video content for shopping guide and marketing, a plurality of video frames matching the text content may be extracted from the massive video library according to the input text content, and then the multiple videos are Frames are combined into video clips according to certain rules. Of course, the video data to be detected may be determined by other methods in the art. For example, the video data to be detected may also be an off-the-shelf video segment obtained from various paths, which is not limited in this embodiment of the present application.

Step 204: Extract quality feature information of each video data to be detected, respectively.

Similar to the sample video data, the quality feature information of the video data to be detected may also include image pixel feature information, continuous frame image object migration feature information, continuous frame image motion feature information, different frequency domain feature information of the image frame, and image frame wavelet. Transforming feature information, and/or image rotation operator feature information.

For the method for extracting the foregoing quality feature information, refer to step 201, which is not described in this step.

Step 205: Identify, by using a preset video data detection model, the quality feature information of the one or more video data to be detected to obtain a quality score of the one or more video data to be detected.

In a specific implementation, after the completion of the construction of the video detection model and the extraction of the quality feature information of the video data to be detected, the quality feature information may be identified by using the trained video detection model, and based on the recognition result. Each video data to be detected is scored, and a corresponding quality score is output.

Step 206: Extract video data whose quality score exceeds a second preset threshold as target video data.

Generally, the higher the quality score, the better the quality of the corresponding video data, the smoothness and consistency of the video data, and the overall style of each video frame will be relatively consistent. Therefore, video data whose quality score exceeds the second preset threshold can be extracted as target video data. A person skilled in the art can determine the size of the second preset threshold according to actual needs, which is not limited by the embodiment of the present application. Of course, the video data with the highest quality score can be directly selected as the target video data, which is not limited in this embodiment of the present application.

Step 207: Determine a target user group among the plurality of user groups;

In the embodiment of the present application, since the attribute information of the user is added in the process of constructing the video data detection model, the identified target video data may include a corresponding video tag to reflect the classification or other information of the video data. .

In a specific implementation, the target user group for which the target video data is targeted may be identified according to the comparison between the video tag and the user tag of the user group. For example, it may be determined that the user group corresponding to the same user tag of the video tag of the target video data is the target user group. Of course, a person skilled in the art may also determine the target user group in other manners, which is not limited by the embodiment of the present application.

Step 208, recommend the target video data to the target user group.

In the embodiment of the present application, after the target video data and the target user group are separately determined, the target video data may be recommended to the target user group.

For example, for a video shopping guide of an e-commerce website, after determining a high-quality shopping guide video clip, the video clip can be recommended to a potential consumer group, improving the user service experience and improving the user conversion rate.

Referring to FIG. 4, a flow chart of the steps of a method for generating a video data detection model of the present application is shown, which may specifically include the following steps:

Step 401: Extract quality feature information of a plurality of sample video data, where the plurality of sample video data includes a plurality of forward sample video data and negative direction sample video data;

Step 402: Perform training by using quality feature information of the plurality of forward sample video data and negative sample video data to generate a video data detection model.

In the embodiment of the present application, the quality feature information may include image pixel feature information, continuous frame image object migration feature information, continuous frame image motion feature information, different frequency domain feature information of the image frame, and image frame wavelet transform feature information. And/or image rotation operator feature information.

The method for generating the video data detection model in the step 401 to the step 402 of the present embodiment is similar to the step 201 to the step 202 in the second embodiment of the video data recommendation method, and can be referred to each other.

Referring to FIG. 5, a flow chart of steps of an embodiment of a method for identifying video data according to the present application is shown. Specifically, the method may include the following steps:

Step 501: Acquire one or more video data to be detected.

In the embodiment of the present application, a user interface may be provided. For example, an interactive interface is displayed on the display screen of the terminal, and the user may submit a detection request for one or more video data through the interaction interface. The video data may be an off-the-shelf video segment obtained from various channels, or may be a video segment that is synthesized in real time by extracting a plurality of video frames according to a certain rule in the video library. The specific source of the video data in the embodiment of the present application is The type is not limited.

Step 502: Send the one or more video data to be detected to a server, where the server is configured to separately identify the one or more video data to be detected to obtain a recognition result, where the identification result includes One or more candidate video data;

After the user submits the detection request for the video data, the terminal may send one or more video data to be detected to the server, and the server completes the identification of the video data to obtain a corresponding recognition result.

In this embodiment of the present application, the identification result may include one or more candidate video data, and each candidate video data includes a corresponding quality score.

In a specific implementation, the process of identifying the one or more video data to be detected by the server is similar to the step 201 to step 205 in the foregoing embodiment, and may be referred to each other.

Step 503: Receive the one or more candidate video data returned by the server.

In the embodiment of the present application, after the server completes the identification of the video data to be detected, and obtains the recognition result, the server may return one or more candidate video data included in the identification result to the terminal.

Step 504: Determine target video data in the one or more candidate video data.

In the embodiment of the present application, since the candidate video data has a corresponding quality score, the target video data may be determined according to the level of the quality score.

In an example, the higher the quality score, the better the quality of the corresponding video data can be considered. Therefore, the video data with the highest quality score can be used as the target video data; or, the quality score can exceed a certain threshold. Determining a screening range in the plurality of candidate video data, and then determining the target video data from the plurality of candidate video data in the range according to actual requirements of the service, and the specific manner of determining the target video data in the embodiment of the present application Not limited. Of course, there may be more than one target video data, and there may be multiple, and this application does not limit this.

It should be noted that the target video data may be determined by the terminal according to the information input by the user, and may be specifically selected by the user in the multiple candidate video data, which is not limited in this embodiment of the present application.

Step 505, presenting the target video data.

After the target video data is determined, the terminal may display the target video data on the interaction interface, for example, the specific information of the target video data may be displayed, or the target video data may be directly played, which is not limited in this embodiment of the present application.

In the embodiment of the present application, by providing an interaction interface on the terminal, the user can directly submit the identification request for the video data through the interaction interface, and the server identifies the video data targeted by the identification request, so that the user can The detection of the video data is completed according to actual needs, and the convenience of the user to judge the quality of the video data is improved.

It should be noted that, for the method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should understand that the embodiments of the present application are not limited by the described action sequence, because In accordance with embodiments of the present application, certain steps may be performed in other sequences or concurrently. In the following, those skilled in the art should also understand that the embodiments described in the specification are all preferred embodiments, and the actions involved are not necessarily required in the embodiments of the present application.

Referring to FIG. 6, a structural block diagram of a device for recommending video data of the present application is shown, which may specifically include the following modules:

The obtaining module 601 is configured to acquire one or more video data to be detected;

The extracting module 602 is configured to separately extract quality feature information of each video data to be detected;

The identification module 603 is configured to identify the quality feature information by using a preset video data detection model to obtain target video data.

The recommendation module 604 is configured to recommend the target video data to the user.

In this embodiment of the present application, the preset video data detection model may be generated by calling the following module:

a quality feature information extraction module, configured to separately extract quality feature information of the plurality of sample video data, where the plurality of sample video data may include a plurality of forward sample video data and negative direction sample video data;

In the embodiment of the present application, the quality feature information extraction module may specifically include the following submodules:

In the embodiment of the present application, the quality feature information extraction module may further include the following sub-modules:

The amplitude and phase processing sub-module is configured to respectively determine the amplitude difference and the phase difference of the adjacent two frames of images to obtain different frequency domain characteristic information of the image frame.

In the embodiment of the present application, the video data detection model generating module may specifically include the following submodules:

In the embodiment of the present application, the target quality feature information identifying submodule may specifically include the following units:

In the embodiment of the present application, generating the preset video data detection model may also invoke the following modules:

In the embodiment of the present application, the identification module 603 may specifically include the following sub-modules:

In the embodiment of the present application, the recommendation module 604 may specifically include the following submodules:

In the embodiment of the present application, the target video data may have a corresponding video label, and the target user group determining sub-module may specifically include the following units:

Referring to FIG. 7, a structural block diagram of an embodiment of a device for generating a video data detection model of the present application is shown, which may specifically include the following modules:

The quality feature information extraction module 701 is configured to separately extract quality feature information of the plurality of sample video data, where the plurality of sample video data may include a plurality of forward sample video data and negative direction sample video data;

The video data detection model generating module 702 is configured to perform training by using the quality feature information of the plurality of forward sample video data and the negative sample video data to generate a video data detection model.

Referring to FIG. 8, a structural block diagram of an embodiment of an apparatus for identifying video data according to the present application is shown, which may specifically include the following modules:

The obtaining module 801 is configured to acquire one or more video data to be detected;

a sending module 802, configured to send the one or more video data to be detected to a server, where the server is configured to separately identify the one or more video data to be detected to obtain a recognition result, where The recognition result may include one or more candidate video data;

The receiving module 803 is configured to receive the one or more candidate video data returned by the server;

a determining module 804, configured to determine target video data in the one or more candidate video data;

A presentation module 805 is configured to present the target video data.

For the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.

The various embodiments in the present specification are described in a progressive manner, and each embodiment focuses on differences from other embodiments, and the same similar parts between the various embodiments can be referred to each other.

Those skilled in the art will appreciate that embodiments of the embodiments of the present application can be provided as a method, apparatus, or computer program product. Therefore, the embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, embodiments of the present application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.

In a typical configuration, the computer device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory. The memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory. Memory is an example of a computer readable medium. Computer readable media includes both permanent and non-persistent, removable and non-removable media. Information storage can be implemented by any method or technology. The information can be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-persistent computer readable media, such as modulated data signals and carrier waves.

Embodiments of the present application are described with reference to flowcharts and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing terminal device to produce a machine such that instructions are executed by a processor of a computer or other programmable data processing terminal device Means are provided for implementing the functions specified in one or more of the flow or in one or more blocks of the flow chart.

The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing terminal device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The instruction device implements the functions specified in one or more blocks of the flowchart or in a flow or block of the flowchart.

These computer program instructions can also be loaded onto a computer or other programmable data processing terminal device such that a series of operational steps are performed on the computer or other programmable terminal device to produce computer-implemented processing, such that the computer or other programmable terminal device The instructions executed above provide steps for implementing the functions specified in one or more blocks of the flowchart or in a block or blocks of the flowchart.

While a preferred embodiment of the embodiments of the present application has been described, those skilled in the art can make further changes and modifications to the embodiments once they are aware of the basic inventive concept. Therefore, the appended claims are intended to be interpreted as including all the modifications and the modifications

Finally, it should also be noted that in this context, relational terms such as first and second are used merely to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these entities. There is any such actual relationship or order between operations. Furthermore, the terms "comprises" or "comprising" or "comprising" or any other variations are intended to encompass a non-exclusive inclusion, such that a process, method, article, or terminal device that includes a plurality of elements includes not only those elements but also Other elements that are included, or include elements inherent to such a process, method, article, or terminal device. An element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article, or terminal device that comprises the element, without further limitation.

A method for recommending video data provided by the present application, a recommendation device for video data, a method for generating a video data detection model, a device for generating a video data detection model, a method for identifying video data, and A device for identifying video data is described in detail. The principles and implementations of the present application are described in the following. The description of the above embodiments is only used to help understand the method and core idea of the present application; In the meantime, the present invention is not limited to the scope of the present application.

Claims

A method for recommending video data, comprising:

Obtaining one or more video data to be detected;

Extracting quality characteristic information of each video data to be detected separately;

Identifying the quality feature information by using a preset video data detection model to obtain target video data;

The target video data is recommended to the user.
The method according to claim 1, wherein the preset video data detection model is generated by:

Extracting quality feature information of the plurality of sample video data, the plurality of sample video data including a plurality of forward sample video data and negative direction sample video data;

Training is performed by using quality feature information of the plurality of forward sample video data and negative sample video data to generate a video data detection model.
The method according to claim 2, wherein the quality feature information comprises image pixel feature information, continuous frame image object migration feature information, continuous frame image motion feature information, different frequency domain feature information of the image frame, and image frame. Wavelet transform feature information, and/or image rotation operator feature information.
The method according to claim 3, wherein the step of separately extracting quality feature information of the plurality of sample video data comprises:

Extracting pixel information of each frame image of each sample video data;

The pixel information is separately subjected to convolution operation and pooling processing to obtain image pixel feature information.
The method according to claim 3, wherein the step of separately extracting quality feature information of the plurality of sample video data comprises:

Identifying an object object in each frame of image of each sample video data;

The number and frequency of occurrences of the object objects in the adjacent two frames of images are respectively determined to obtain continuous frame image object migration feature information.
The method according to claim 3, wherein the step of separately extracting quality feature information of the plurality of sample video data comprises:

Identifying a shape feature of the action object in each frame image of each sample video data;

The geometric parameters of the shape features of the motion objects in the adjacent two frames of images are respectively determined to obtain continuous frame image motion feature information.
The method according to claim 3, wherein the step of separately extracting quality feature information of the plurality of sample video data comprises:

Determining the amplitude and phase of each frame of image of each sample video data;

The amplitude difference and the phase difference of the adjacent two frames of images are respectively determined to obtain different frequency domain feature information of the image frame.
The method according to claim 3, wherein the step of separately extracting quality feature information of the plurality of sample video data comprises:

Determining a wavelet coefficient of each frame image of each sample video data;

The change values of the wavelet coefficients of the adjacent two frames of images are respectively determined to obtain image frame wavelet transform feature information.
The method according to claim 3, wherein the step of separately extracting quality feature information of the plurality of sample video data comprises:

Determining a rotation operator for each frame of image of each sample video data;

The change values of the rotation operators of the adjacent two frames of images are respectively determined to obtain image rotation operator feature information.
The method according to any one of claims 2-9, wherein the step of training using the quality feature information of the plurality of forward sample video data and negative sample video data to generate a video data detection model comprises: :

Normalizing the quality feature information of the plurality of forward sample video data and negative sample video data to obtain normalized quality feature information;

Completing the missing value of the normalized quality feature information;

Identifying target quality feature information from the normalized quality feature information;

The target quality feature information is used to train the neural network model to generate a video data detection model.
The method according to claim 10, wherein the step of identifying the target quality feature information from the normalized quality feature information comprises:

Determining an information entropy of the normalized quality feature information;

The quality feature information identifying that the information entropy exceeds the first preset threshold is the target quality feature information.
The method of claim 2, further comprising:

Obtain attribute information of multiple users;

And the plurality of users are clustered into a plurality of user groups according to the attribute information, and the user groups have corresponding user labels.
The method according to claim 12, wherein the step of identifying the quality feature information by using a preset video data detection model to obtain target video data comprises:

Determining quality characteristic information of the one or more video data to be detected by using a preset video data detection model to obtain a quality score of the one or more video data to be detected;

The video data whose quality score exceeds the second preset threshold is extracted as target video data.
The method according to claim 13, wherein said step of recommending said target video data to a user comprises:

Determining a target user group among the plurality of user groups;

The target video data is recommended to the target user group.
The method according to claim 14, wherein the target video data has a corresponding video tag, and the step of determining a target user group among the plurality of user groups comprises:

Determining a user group corresponding to the same user tag of the video tag of the target video data as a target user group.
A method for generating a video data detection model, comprising:

Extracting quality feature information of the plurality of sample video data, the plurality of sample video data including a plurality of forward sample video data and negative direction sample video data;

Training is performed by using quality feature information of the plurality of forward sample video data and negative sample video data to generate a video data detection model.
The method according to claim 16, wherein the quality feature information comprises image pixel feature information, continuous frame image object migration feature information, continuous frame image motion feature information, different frequency domain feature information of the image frame, and image frame. Wavelet transform feature information, and/or image rotation operator feature information.
A method for identifying video data, comprising:

Obtaining one or more video data to be detected;

Sending the one or more video data to be detected to a server, where the server is configured to separately identify the one or more video data to be detected to obtain a recognition result, where the recognition result includes one or more Candidate video data;

Receiving the one or more candidate video data returned by the server;

Determining target video data in the one or more candidate video data;

Presenting the target video data.
A device for recommending video data, comprising:

An obtaining module, configured to acquire one or more video data to be detected;

An extraction module, configured to separately extract quality feature information of each video data to be detected;

An identification module, configured to identify the quality feature information by using a preset video data detection model to obtain target video data;

a recommendation module for recommending the target video data to a user.
A device for generating a video data detection model, comprising:

a quality feature information extraction module, configured to separately extract quality feature information of the plurality of sample video data, where the plurality of sample video data includes a plurality of forward sample video data and negative sample video data;

The video data detection model generating module is configured to perform training by using the quality feature information of the plurality of forward sample video data and the negative sample video data to generate a video data detection model.
A device for identifying video data, comprising:

An obtaining module, configured to acquire one or more video data to be detected;

a sending module, configured to send the one or more video data to be detected to a server, where the server is configured to separately identify the one or more video data to be detected to obtain a recognition result, and the identifying The result includes one or more candidate video data;

a receiving module, configured to receive the one or more candidate video data returned by the server;

a determining module, configured to determine target video data in the one or more candidate video data;

a presentation module for presenting the target video data.