CN114153342B

CN114153342B - Visual information display method, device, computer equipment and storage medium

Info

Publication number: CN114153342B
Application number: CN202010832432.4A
Authority: CN
Inventors: 希曼舒·辛格
Original assignee: Oneplus Technology Shenzhen Co Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2020-08-18
Filing date: 2020-08-18
Publication date: 2024-11-26
Anticipated expiration: 2040-08-18
Also published as: CN114153342A

Abstract

The present application relates to a method, device, computer equipment and storage medium for displaying visual information. The method comprises: obtaining multiple face images and video sets; the video sets include video frames to be displayed; clustering the face images and video sets respectively to obtain at least one face image group consisting of face images, and at least one video group consisting of video frames to be displayed; when a first trigger operation is detected, the face images to be displayed are determined from the face image groups through the first trigger operation; when a second trigger operation is detected for the faces in the face images to be displayed, based on the second trigger operation, a target video group is screened out from the video groups; and the target video group is played. The use of this method can improve the browsing efficiency of images or videos.

Description

Visual information display method, visual information display device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a visual information display method, a visual information display device, a computer device, and a storage medium.

Background

With the development of communication technology, terminals integrate more and more applications, such as video management applications. Based on the video management application, a plurality of videos stored in the terminal can be presented to the user, and the stored videos are completely played to the user from beginning to end. Because the video stored in the terminal contains the face information, the user can search the stored video for the video frame related to a specific person.

At present, a user can search for a video frame related to a specific person in the whole video mainly by performing complete accelerated play of each video stored in the terminal. But playing each video completely through may result in inefficient video browsing.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a visual information display method, apparatus, computer device, and storage medium capable of improving browsing efficiency.

A visual information presentation method, the method comprising:

Acquiring a plurality of face images and a video set; the video set comprises video frames to be displayed;

clustering the face images and the video sets respectively to obtain at least one face image group consisting of the face images and at least one video group consisting of video frames to be displayed;

when a first triggering operation is detected, determining a face image to be displayed from the face image group through the first triggering operation;

When a second trigger operation on a face in the face image pointed by the first trigger operation is detected, screening a target video packet from the video packets based on the second trigger operation;

And playing the target video packet.

In one embodiment, the first triggering operation includes a first click operation and a second click operation; the step of determining the face image to be displayed from the face image group through the first triggering operation includes:

Selecting a first target group of face images from the at least one group of face images when a first click operation for the group of face images is detected;

determining the arrangement sequence of face images in the first target face image group;

Displaying each face image in the first target face image group according to the arrangement sequence;

and when a second clicking operation on the displayed face image is detected, displaying the face image pointed by the second clicking operation.

In one embodiment, the facial image group includes facial images belonging to the same person; the determining the arrangement sequence of the face images in the first target face image group comprises the following steps:

determining faces of the same person contained in each face image in the first target face image group, and marking the faces as clustered faces;

determining the correlation between other faces in each face image and the clustered faces;

based on the correlation between other faces and the clustered faces, determining the correlation of each face image;

And sequencing the plurality of face images in the first target face image group according to the correlation of each face image.

In one embodiment, the determining the correlation between the other faces in each face image and the clustered faces includes:

determining a topological relation diagram between other faces in each face image and the clustered faces by taking the clustered faces as a benchmark; the topological relation diagram comprises at least one edge; the edges in the topological relation diagram are connected with two faces in the same face image;

determining the frequency of occurrence of two different faces in each face image based on edge connection;

determining weights corresponding to all sides in the topological relation diagram based on the frequency times;

And determining the correlation between other faces in each face image and the clustered faces according to the weight and the connection relation between the faces in the topological relation diagram.

In one embodiment, the playing the target video packet includes:

Determining the video to which each frame of the video frame to be displayed belongs in the target video packet, and acquiring the playing time of each frame of the video frame to be displayed in the video to which each frame of the video frame to be displayed belongs;

Ordering the video frames to be displayed in the same video based on the playing time to obtain at least one intermediate sequence result;

sequencing all intermediate sequence results according to a preset sequencing rule to obtain a video sequence;

and playing the video sequence according to the arrangement sequence of the video frames to be displayed in the video sequence.

In one embodiment, the playing the video sequence according to the arrangement order of the video frames to be displayed in the video sequence includes:

When detecting a clicking operation of a currently played video frame to be displayed, determining a position coordinate pointed by the clicking operation;

determining a position area to which the clicking operation belongs based on the position coordinates; the location area comprises a first area and a second area;

when the clicking operation belongs to a first area, determining and playing a previous video frame which is adjacent to the currently played video frame to be displayed and is positioned in front of the currently played video frame to be displayed based on the arrangement sequence of all video frames to be displayed in the video sequence;

And when the clicking operation belongs to the second area, determining and playing a later video frame which is adjacent to the currently played video frame to be displayed and is positioned behind the currently played video frame to be displayed based on the arrangement sequence of the video frames to be displayed in the video sequence.

In one embodiment, the method further comprises:

When a clicking operation on a face in the video frame to be displayed is detected, screening a second target face image group from the at least one face image group according to the clicking operation;

and displaying the second face image group.

A visual information presentation device, the device comprising:

the clustering module is used for acquiring a plurality of face images and a video set; clustering the face images and the video sets respectively to obtain at least one face image group consisting of the face images and at least one video group consisting of video frames to be displayed;

The image display module is used for determining face images to be displayed from the face image groups through the first triggering operation when the first triggering operation is detected;

The video display module is used for screening target video groups from the video groups based on the second trigger operation when the second trigger operation on the face in the face image pointed by the first trigger operation is detected; and playing the target video packet.

A computer device comprising a memory storing a computer program and a processor which when executing the computer program performs the steps of:

And playing the target video packet.

A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

And playing the target video packet.

According to the visual information display method, the visual information display device, the computer equipment and the storage medium, the face images and the video sets can be clustered by acquiring the face images and the video sets, so that at least one face image group and at least one video group are obtained; by determining the triggering operation of the user, the face image to be displayed can be determined according to the first triggering operation when the first triggering operation is detected, the target video group is correspondingly screened when the second triggering operation of the face in the face image to be displayed is detected, and the playing of the face image to be displayed is automatically jumped to the playing of the target video group. Because the user can select the corresponding video group by taking the face in the face image as a reference and browse the selected video group, the video frame associated with the specific person can be quickly found.

Drawings

FIG. 1 is an application environment diagram of a visual information presentation method in one embodiment;

FIG. 2 is a flow chart of a visual information display method according to an embodiment;

FIG. 3 is a schematic diagram of face image grouping in one embodiment;

FIG. 4 is a schematic diagram of a jump from image playback to video playback in one embodiment;

FIG. 5 is a schematic diagram of a face image display method to be displayed in one embodiment;

FIG. 6 is a schematic diagram of a topological graph in one embodiment;

FIG. 7 is a schematic diagram of a jump from video playback to image playback in one embodiment;

FIG. 8 is a block diagram of a visual information presentation apparatus according to an embodiment;

fig. 9 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

It will be understood that the terms "first," "second," and the like, as used herein, may be used to describe various objects, but these objects are not limited by these terms. These terms are only used to distinguish a first object from another object. For example, a first trigger operation may be referred to as a second trigger operation, and similarly, a second trigger operation may be referred to as a first trigger operation, without departing from the scope of the present application. Both the first trigger operation and the second trigger operation are trigger operations, but they are not the same trigger operation.

The visual information display method provided by the application can be applied to an application environment shown in figure 1. The terminal 102 acquires a plurality of face images and video sets, and clusters the acquired face images and video sets to obtain face image groups and video groups. The terminal displays the face images to be displayed in the face image groups according to the first triggering operation of the user, and determines and plays the corresponding target video groups according to the second triggering operation of the user. The terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, among others.

In one embodiment, as shown in fig. 2, a visual information display method is provided, and the method is applied to the terminal in fig. 1 for illustration, and includes the following steps:

s202, acquiring a plurality of face images and a video set.

The face image refers to an image containing faces, and the same face image can contain faces of a plurality of people. The video set includes at least one video stored in the terminal. A portion of the video frames in the video contain faces of at least one person. For convenience of description, the video frame including the face will be referred to as a video frame to be displayed hereinafter.

Specifically, the terminal device may be provided with an image library and a video library, and the image library stores face images of a plurality of persons. The video library is pre-stored with a plurality of videos. The face image can be acquired based on an image acquisition application in the terminal, and can also be acquired based on network downloading. Similarly, the video may be acquired based on a video acquisition application in the terminal, or may be downloaded based on a network.

In one embodiment, the terminal performs face detection on each video frame in the video based on a preset face detection algorithm, estimates the face quality in the video frame based on indexes of the face in terms of definition, size, angle, shielding degree and the like, and determines the video frame with the face quality greater than a preset quality threshold as the video frame to be displayed.

S204, clustering the face images and the video sets respectively to obtain at least one face image group consisting of the face images and at least one video group consisting of video frames to be displayed.

Wherein each face image group includes face images belonging to the same person. It is easy to understand that if two faces are included in the same face image, the face image appears in two face image groups at the same time. Each video packet includes video frames to be presented that belong to the same person. It is easy to understand that if two faces are included in the same video frame to be displayed, the video frame to be displayed will appear in two video packets at the same time.

Specifically, a trained feature extraction machine learning model is preset in the terminal. The feature extraction machine learning model can have the face feature extraction capability through sample learning. The machine learning model may employ a neural network model, a dual path network model (DPN, dualPathNetwork), a support vector machine, or a logistic regression model, etc. The method comprises the steps that a terminal inputs all face images in an image library and all video frames to be displayed in a video library into a feature extraction machine learning model, the feature extraction machine learning model extracts the face images and the face features in the video frames to be displayed, and the face features are clustered based on a preset clustering algorithm to obtain at least one face image group and at least one video group. The preset clustering algorithm may be a DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm. Wherein the face features are data for reflecting facial features of the person. The facial features may reflect one or more of the sex of the person, the contour of the face, the hairstyle, the glasses, the nose, the mouth, and the distance between the facial organs.

In one embodiment, the terminal takes each face image in the image library as a class, calculates the distance between the classes, and merges the two classes when the distance between the two classes is smaller than a preset distance threshold. The distance between the classes may be a distance between face images included in each of the two classes, and the distance between the two face images may be calculated from image information (e.g., a multidimensional vector including information such as a face feature and a photographing time) of the two face images. The terminal repeats the above steps of calculating the distance between the classes and the subsequent steps until no new class is generated. Thus, the clustering operation of the face images is completed, and at least one face image group is obtained.

In one embodiment, the feature extraction machine learning model may be a generic feature extraction capable machine learning model that has been trained. The general machine learning model is not effective when used for extraction of a specific scene, and thus further training and optimization of the general machine learning model by a sample specific to the specific scene is required. In this embodiment, the terminal may obtain a model structure and model parameters according to a general machine learning model, and import the model parameters into a feature extraction machine learning model structure to obtain a feature extraction machine learning model with model parameters. Model parameters carried by the feature extraction machine learning model are used as initial parameters of the training feature extraction machine learning model in the embodiment to participate in training.

In one embodiment, the feature extraction machine learning model may be a complex network model formed from multiple layers of interconnections. The Feature extraction machine learning model may include multiple Feature extraction layers, each Feature extraction layer has corresponding model parameters, the model parameters of each layer may be multiple, and the model parameters in each Feature extraction layer perform linear or nonlinear changes on an input face to obtain a Feature Map (Feature Map) as an operation result. Each feature extraction layer receives the operation result of the previous layer, outputs the operation result of the present layer to the next layer through self operation until the last feature extraction layer completes linear or nonlinear change operation, and obtains the face feature aiming at the current input face image or video frame according to the result output by the last feature extraction layer. The model parameters are parameters in the model structure, and can reflect the corresponding relation between the output and the input of each layer of the model.

In one embodiment, the facial features may include facial texture features. The facial texture features may reflect pixel depths of facial organs, including the nose, ears, eyebrows, cheeks, or lips, etc. The facial texture features may include a color value distribution and a luminance value distribution of facial image pixels.

S206, when the first triggering operation is detected, determining the face image to be displayed from the face image group through the first triggering operation.

The triggering operation refers to a selection operation of a user on content displayed in the terminal, and the triggering operation includes but is not limited to a clicking operation, a swiping operation, a long-press operation and the like. The specific operation mode of the triggering operation can be set according to the requirement, for example, the operation mode of the triggering operation can be set to be a clicking operation in the scene A, and the operation mode of the triggering operation can be set to be a stroking operation in the scene B.

The image library display interface in the terminal is provided with a person display control, and when the user clicks the person display control, the terminal acquires at least one face image group generated based on the clustering operation and correspondingly displays the acquired face image group as shown in fig. 3. Fig. 3 is a schematic diagram of face image grouping in one embodiment. The terminal monitors triggering operation of the user, and when the user is determined to trigger a first triggering operation aiming at face image grouping, a first target face image grouping is determined from at least one face image grouping based on the first triggering operation. For example, when it is determined that the user clicks one face image group, the face image group clicked by the user is determined as the first target face image group. The terminal acquires all face images in the first target face image group and correspondingly displays all face images in the first target face image group to the user. When the fact that the user triggers the first triggering operation for the displayed face image again is detected, the terminal acquires the face image pointed by the first triggering operation, judges the acquired face image as the face image to be displayed, and displays the face image to be displayed to the user. For example, when it is determined that the user clicks one face image in the first target face image group, the terminal determines the clicked face image of the user as the face image to be displayed, and displays the face image to be displayed to the user in a full screen.

In one embodiment, when it is determined that the user swipes the face image to be displayed to the left, the terminal determines an arrangement sequence of face images in the first target face image group, determines a next face image to be displayed, which is adjacent to the face image to be displayed and is located behind the face image to be displayed, based on the arrangement sequence, and at the same time, displays the next face image to be displayed. Similarly, when the user is determined to swipe the face image to be displayed to the right, the terminal determines the face image to be displayed before the face image to be displayed, and displays the face image to be displayed before.

And S208, when a second triggering operation of the face in the face image to be displayed is detected, selecting a target video packet from the video packets based on the second triggering operation.

S210, playing the target video packet.

Specifically, the terminal identifies a face region in the face image to be displayed, and when a second trigger operation of the user on the face region is detected, the face pointed by the second trigger operation is determined. For convenience of description, the face to which the second trigger operation is directed will be referred to as a video face hereinafter. For example, when a long-pressed face of a user in a face image to be displayed is detected, the long-pressed face of the user is determined to be a video face.

After the terminal acquires the video face, determining a target video group containing the video face in each video frame to be displayed according to the face characteristics of the video face, and then playing each video frame to be displayed in the target video group.

Since all the video packets and the face image packets can be regarded as one network, the video packets or the face image packets can be regarded as nodes in the network, and any two nodes can be connected based on one edge, a user can browse the contents in the image library and the video library along with the edge by taking any node as a starting point. Therefore, not only the browsing efficiency of the user is greatly improved, but also the browsing effect can be improved.

In a real-time example, as shown in fig. 4, when it is determined that the user presses the video face for a long time, the terminal correspondingly displays a control list, and the control list includes a slow release control, a delay control and a video control. When the user is determined to click the slow-release control, the terminal slowly acts to play the target video group; when the user is determined to click the delay control, the terminal delays to play the target video packet; when clicking the video control, the terminal plays the target video packet at a standard play speed. Fig. 4 is a schematic diagram of a jump from image playback to video playback in one embodiment.

It is easy to understand that the moving pictures stored in the terminal can also be clustered based on the above method to obtain moving picture groups, so that the moving picture groups can be jumped from the face image to be displayed to one moving picture group or jumped from the video frame to be displayed to one moving picture group.

In the visual information display method, the face images and the video sets can be clustered by acquiring the face images and the video sets to obtain at least one face image group and at least one video group; by determining the triggering operation of the user, the face image to be displayed can be determined according to the first triggering operation when the first triggering operation is detected, the target video group is correspondingly screened when the second triggering operation of the face in the face image to be displayed is detected, and the playing of the face image to be displayed is automatically jumped to the playing of the target video group. Because the user can select the corresponding video group by taking the face in the face image as a reference and browse the selected video group, the video frame associated with the specific person can be quickly found.

In addition, as only simple triggering operation is needed, the image playing can be quickly changed into video playing, compared with the traditional method that the image playing is firstly needed to be exited and then the video library is entered for video playing, the application not only improves the jumping efficiency, but also increases the jumping flexibility, thereby enhancing the user experience.

In one embodiment, as shown in fig. 5, determining, by the first triggering operation, a face image to be displayed from the face image group includes:

s502, when a first clicking operation aiming at the face image grouping is detected, a first target face image grouping is screened from at least one face image grouping based on the first clicking operation.

S504, determining the arrangement sequence of the face images in the first target face image group.

S506, displaying each face image in the first target face image group according to the arrangement sequence.

And S508, when the second clicking operation on the displayed face image is detected, displaying the face image pointed by the second clicking operation.

Specifically, when the terminal determines a first click operation on the presented face image group, a first target face image group is determined based on the first click operation. The terminal acquires all face images in the first target face image group, and sequences the face images based on a preset arrangement rule to obtain a face image sequence. The terminal displays the face images correspondingly according to the arrangement sequence of the face images in the face image sequence. When the second click operation of the user aiming at the displayed face image is determined, the terminal judges the face image pointed by the second click operation as the face image to be displayed, and at the same time, the terminal displays the face image to be displayed in a full screen mode.

In this embodiment, the user can watch the face image to be displayed in full screen by simply clicking, so that the user experience of the user can be greatly improved.

In one embodiment, determining the order of arrangement of face images in the first target face image group includes: determining faces of the same person contained in each face image in the first target face image group, and marking the faces as clustered faces; determining the correlation between other faces in each face image and the clustered faces; based on the correlation between other faces and the clustered faces, determining the correlation of each face image; and sequencing the plurality of face images in the first target face image group according to the correlation of each face image.

The other faces refer to faces of people in the face image except for the people corresponding to the clustered faces.

Specifically, the terminal acquires the face characteristics of each face in each face image, performs similarity matching on the face characteristics, and determines the same face contained in each face image in the first target face image group based on a matching result. For convenience of description, the same face contained in each face image in the first target face image group is referred to as a clustered face hereinafter. The terminal determines the correlation between other faces in each face image and the clustered faces, and carries out linear superposition on the correlation of different faces in the same face image to obtain the correlation of the face image. The terminal sorts all face images in the first target face image from high to low according to the correlation of the face images, so that the face image with the highest correlation can be sorted to the head of the team, and the face image with the lowest correlation can be sorted to the tail of the team. When the correlation of the first target face image grouping face images is the same, the terminal randomly sorts the face images with the same correlation.

For example, the first target face image group includes three face images, which are a first face image, a second face image, and a third face image. The first face image comprises three different faces, namely a face 1, a face 2 and a face 3; the second face image comprises four different faces, namely a face 1, a face 2, a face 4 and a face 5; the third face image includes three different faces, namely face 1 and face 3. Wherein the face 1 is a clustered face. When the correlation between the face 2 and the face 1 is 2, the correlation between the face 3 and the face 1 is 2, the correlation between the face 4 and the face 1 is 1, and the correlation between the face 5 and the face 1 is 1, the correlation of the first face image is 2+2=4; the correlation of the second face image is 2+1+1=4; the correlation of the third face image is 2.

In this embodiment, the correlation of the face images is obtained by overlapping the correlation of the faces, so that the correlation of the face images is more accurate, and the browsing efficiency and the user experience can be improved by preferentially displaying the face images with high correlation.

In one embodiment, determining correlations between other faces in each face image and the clustered faces includes: determining a topological relation diagram between other faces in each face image and the clustered faces by taking the clustered faces as a reference; the topological relation diagram comprises at least one edge; the edges in the topological relation diagram are connected with two faces in the same face image; determining the frequency of occurrence of two different faces in each face image based on edge connection; determining weights corresponding to all sides in the topological relation diagram based on the frequency times; and determining the correlation between other faces in each face image and the clustered faces according to the weight and the connection relation between the faces in the topological relation diagram.

Specifically, the terminal acquires one face image in the first target face image group, and determines different faces in the acquired face images. The terminal uses the clustered faces as a reference, and connects different faces in the obtained face images by one edge, so the terminal repeats the step of connecting two different faces by one edge until all face images in the first target face image group are traversed. Thus, the operation of generating the topological relation diagram is completed, and the topological relation diagram shown in fig. 6 is obtained. FIG. 6 is a schematic diagram of a topological graph in one embodiment.

Further, the terminal determines the frequency number of occurrence of two different faces connected by the edge in the same face image, and the frequency number is determined as a weight value of the corresponding edge. For example, in the above example, when the first face image includes the face 1, the face 2, and the face 3; the second face image comprises a face 1, a face 2, a face 4 and a face 5; when the third face image contains the face 1 and the face 3, the face 1 and the face 2 which are connected by the edge are simultaneously present in the first face image and the second face image, and at the moment, the weight value of the edge connecting the face 1 and the face 2 is 2.

Further, the terminal acquires a preset correlation calculation algorithm, and determines the correlation between each face and the clustered faces according to the weight, the connection relation among the faces in the topological relation diagram and the correlation calculation algorithm. Wherein, the correlation calculation algorithm may be: correlation between face K and clustered face= (superimposed maximum of weights of all sides between clustered face to face K)/(square of number value of sides between clustered face to face K). For example, in the above example, when the topological relation diagram is shown in fig. 5, when the correlation between the face 2 and the clustered face (face 1) needs to be calculated, the terminal determines that only one communication path exists between the face 1 and the face 2 based on the topological relation diagram, and the path connects the face 1 and the face 2 by one edge, the weight value of the edge is 2, and at this time, the terminal divides 2 (the weight value of the edge connecting the face 1 and the face 2) by 1 (the square value of the number of edges between the face 1 and the face 2) to obtain the correlation 2 between the face 1 and the face 2.

It is easy to understand that when a plurality of paths are provided between the face K and the clustered faces, the terminal respectively superimposes the weight values of the edges included in the plurality of paths, and a plurality of superimposed results are obtained. The terminal determines a target path with the largest superposition result, calculates the number of edges contained in the target path, and then calculates the correlation between the face K and the clustered faces based on the correlation calculation algorithm.

In this embodiment, by generating the topological relation diagram and determining the weight value of each side in the topological relation diagram, the correlation between other faces in each face image and the clustered faces can be determined based on the weight value and the connection relation between faces in the topological relation diagram, so that a higher correlation can be given to the faces directly related to the clustered faces, a relatively lower correlation can be given to the faces related to the face images to be displayed, and then the faces with higher correlation can be displayed preferentially later, and thus, the user experience is greatly improved.

In one embodiment, the playing the target video packet includes: determining a video to which each frame of video frames to be displayed in the target video packet belongs, and acquiring the playing time of each frame of video frames to be displayed in the video to which each frame of video frames to be displayed belongs; sequencing the video frames to be displayed in the same video based on the playing time to obtain at least one intermediate sequence result; sequencing all intermediate sequence results according to a preset sequencing rule to obtain a video sequence; and playing the video sequence according to the arrangement sequence of the video frames to be displayed in the video sequence.

The playing time of the video frame to be displayed refers to the time spent when the first frame in the video is taken as the starting time and is played to the video frame to be displayed according to the standard playing speed.

Specifically, when the terminal extracts the video frame to be displayed from the video, the terminal determines the video identification of the video and the playing time of the video frame to be played in the video, and embeds the video identification and the playing time in the video frame to be displayed. When the terminal obtains the target video packet, the terminal determines the video to which each frame of video frame to be displayed belongs based on the video identifier embedded in the video frame to be displayed, and sorts the video frames to be displayed belonging to the same video based on the playing time embedded in the video frame to be displayed, so as to obtain at least one intermediate sequence result. For example, when the video frame a to be displayed and the video frame B to be displayed belong to the video a, the playing time of the video frame a to be displayed is 10 minutes, and the playing time of the video frame B to be displayed is 11 minutes; when the video frame C to be displayed and the video frame D to be displayed belong to the video B, and the playing time of the video frame C to be displayed is 2 minutes and the playing time of the video frame B to be displayed is 7 minutes, the terminal sorts the video frame A to be displayed and the video frame B to be displayed based on the playing time to obtain an AB intermediate sequence result; and sequencing the video frames C to be displayed and the video frames D to be displayed based on the playing time to obtain an intermediate sequence result of the CD.

Further, the terminal sequences all the generated intermediate sequence results based on a preset sequencing rule to obtain a video sequence, and plays the video sequence according to the sequence of each video frame to be displayed in the video sequence.

In one embodiment, the terminal may determine the correlation of each video frame to be displayed based on the correlation calculation algorithm, and superimpose the correlation of the video frames to be displayed in the same intermediate sequence result to obtain the correlation of the intermediate sequence result. And the terminal sequences each intermediate sequence result according to the correlation of the intermediate sequence result to obtain a video sequence.

In this embodiment, by playing the video frame sequence including the same face, the user can quickly browse the target content in the video without completely watching the whole video, thereby greatly improving the efficiency of browsing the video for the user. In addition, the video frames to be displayed are correspondingly played based on the arrangement sequence in the video sequence, so that a user can watch more important video frames preferentially, and user experience is improved.

In one embodiment, playing the video sequence in accordance with an order of arrangement of video frames to be presented in the video sequence comprises: when the clicking operation of the currently played video frame to be displayed is detected, determining the position coordinate pointed by the clicking operation; determining a position area to which the clicking operation belongs based on the position coordinates; the location area comprises a first area and a second area; when the clicking operation belongs to the first area, determining and playing a previous video frame which is adjacent to the currently played video frame to be displayed and is positioned in front of the currently played video frame to be displayed based on the arrangement sequence of all video frames to be displayed in the video sequence; when the clicking operation belongs to the second area, determining and playing a next video frame which is adjacent to the currently played video frame to be displayed and is positioned behind the currently played video frame to be displayed based on the arrangement sequence of the video frames to be displayed in the video sequence.

The first region refers to an image region located at the left half of the video frame to be displayed, and the second region refers to an image region located at the right half of the video frame to be displayed.

Specifically, the terminal may determine, with a central axis of the video frame to be displayed as a starting point, an image area with a central axis on the left as a first image area, and an image area with a central axis on the right as a second image area. When the terminal plays the current video frame to be displayed, the user can perform clicking operation on the currently played video frame to be displayed, and meanwhile, the terminal determines the position coordinates pointed by the clicking operation of the user. When the position coordinates are determined to belong to the first area, the terminal obtains the arrangement position of the currently played video frame to be displayed in the video sequence, obtains the previous video frame based on the arrangement position, and displays the previous video frame. The previous video frame is a previous frame video frame adjacent to and before the currently played video frame to be displayed. Similarly, when the user clicks the second image area, the terminal acquires and displays the latter video frame.

In this embodiment, the user only needs to operate the playing of each video frame to be displayed in the video frames through simple clicking operation, so that not only is the playing efficiency improved, but also the user experience is greatly improved.

In one embodiment, the visual information display method further includes: when a clicking operation of a face in a video frame to be displayed is detected, screening a second target face image group from at least one face image group according to the clicking operation; and displaying the second facial image group.

Specifically, when the terminal plays the video sequence, the terminal monitors clicking operation of a user, when detecting clicking operation of a face in a currently played video frame to be displayed, the terminal acquires face features in the face pointed by the clicking operation, and determines a second target face image group from the face image groups based on the face features. Each face image in the second target face image group contains a face pointed by clicking operation. Meanwhile, the terminal sorts the face images in the second face image group, and displays the face images in the second face image group according to the arrangement sequence.

In one embodiment, when a user performs clicking operation on a face in a video frame to be displayed, the terminal displays a control list containing photo controls, and when the user is determined to click the photo controls, the terminal randomly acquires a face image containing the face pointed by the clicking operation, and displays the acquired face image in a full screen manner as shown in fig. 7. Fig. 7 is a schematic diagram of a jump from video playback to image playback in one embodiment.

In this embodiment, the user can skip from the video frame to the face image group only by simple clicking operation, compared with the traditional method that the user needs to withdraw from video playing before entering the image library, the method can reduce the skip steps required by video skip to the image, thereby improving the skip efficiency. In addition, because the user can quickly switch between face image groups or video groups corresponding to different faces by taking the faces as the reference, compared with the traditional method of browsing all images in an image library from beginning to end or completely watching the whole video, the application can further improve browsing efficiency.

It should be understood that, although the steps in the flowcharts of fig. 2 and 5 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 2, 5 may include steps or stages that are not necessarily performed at the same time, but may be performed at different times, or the order in which the steps or stages are performed is not necessarily sequential, but may be performed in rotation or alternatively with at least some of the other steps or stages.

In one embodiment, as shown in fig. 8, there is provided a visual information presentation apparatus 800 comprising: a clustering module 802, an image presentation module 804, and a video presentation module 806, wherein:

a clustering module 802, configured to acquire a plurality of face images and a video set; clustering the face images and the video set to be displayed respectively to obtain at least one face image group consisting of the face images and at least one video group consisting of the video frames to be displayed;

The image display module 804 is configured to determine, when a first trigger operation is detected, a face image to be displayed from the face image group through the first trigger operation;

The video display module 806 is configured to, when a second trigger operation on a face in the face image pointed to by the first trigger operation is detected, screen out a target video packet from the video packets based on the second trigger operation; and playing the target video packet.

In one embodiment, the image presentation module 804 further includes a sorting module 8041 for, when a first click operation for a group of face images is detected, screening a first group of target face images from at least one group of face images based on the first click operation; determining the arrangement sequence of face images in a first target face image group; displaying each face image in the first target face image group according to the arrangement sequence; and when the second clicking operation on the displayed face image is detected, displaying the face image pointed by the second clicking operation.

In one embodiment, the sorting module 8041 is further configured to determine faces of the same person included in each face image in the first target face image group, and record the faces as clustered faces; determining the correlation between other faces in each face image and the clustered faces; based on the correlation between other faces and the clustered faces, determining the correlation of each face image; and sequencing the plurality of face images in the first target face image group according to the correlation of each face image.

In one embodiment, the sorting module 8041 is further configured to determine a topological relation diagram between other faces in each face image and the clustered faces based on the clustered faces; the topological relation diagram comprises at least one edge; the edges in the topological relation diagram are connected with two faces in the same face image; determining the frequency of occurrence of two different faces in each face image based on edge connection; determining weights corresponding to all sides in the topological relation diagram based on the frequency times; and determining the correlation between other faces in each face image and the clustered faces according to the weight and the connection relation between the faces in the topological relation diagram.

In one embodiment, the video display module 806 further includes a video sequence generating module 8061, configured to determine a video to which each frame of the video frame to be displayed in the target video packet belongs, and obtain a playing time of each frame of the video frame to be displayed in the video to which each frame of the video frame to be displayed belongs; sequencing the video frames to be displayed in the same video based on the playing time to obtain at least one intermediate sequence result; sequencing all intermediate sequence results according to a preset sequencing rule to obtain a video sequence; and playing the video sequence according to the arrangement sequence of the video frames to be displayed in the video sequence.

In one embodiment, the video sequence generating module 8061 is further configured to, when detecting a click operation of a currently playing video frame to be displayed, determine a position coordinate pointed by the click operation; determining a position area to which the clicking operation belongs based on the position coordinates; the location area comprises a first area and a second area; when the clicking operation belongs to the first area, determining and playing a previous video frame which is adjacent to the currently played video frame to be displayed and is positioned in front of the currently played video frame to be displayed based on the arrangement sequence of all video frames to be displayed in the video sequence; when the clicking operation belongs to the second area, determining and playing a next video frame which is adjacent to the currently played video frame to be displayed and is positioned behind the currently played video frame to be displayed based on the arrangement sequence of the video frames to be displayed in the video sequence.

In one embodiment, the visual information display apparatus 800 is further configured to, when detecting a clicking operation of a face in a video frame to be displayed, screen a second target face image group from at least one face image group according to the clicking operation; and displaying the second facial image group.

For specific limitations of the visual information presentation apparatus, reference may be made to the above limitations of the visual information presentation method, and no further description is given here. The respective modules in the visual information presentation apparatus described above may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a terminal, and the internal structure thereof may be as shown in fig. 9. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a visual information presentation method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by persons skilled in the art that the architecture shown in fig. 9 is merely a block diagram of some of the architecture relevant to the present inventive arrangements and is not limiting as to the computer device to which the present inventive arrangements are applicable, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:

When the first triggering operation is detected, determining a face image to be displayed from the face image group through the first triggering operation;

when a second triggering operation of a face in the face image to be displayed is detected, screening out a target video group from the video groups based on the second triggering operation;

and playing the target video packet.

In one embodiment, the processor when executing the computer program further performs the steps of:

When a first clicking operation aiming at the face image grouping is detected, screening a first target face image grouping from at least one face image grouping based on the first clicking operation;

determining the arrangement sequence of face images in a first target face image group;

And when the second clicking operation on the displayed face image is detected, displaying the face image pointed by the second clicking operation.

In one embodiment, the group of face images includes face images belonging to the same person; the processor when executing the computer program also implements the steps of:

determining a topological relation diagram between other faces in each face image and the clustered faces by taking the clustered faces as a reference; the topological relation diagram comprises at least one edge; the edges in the topological relation diagram are connected with two faces in the same face image;

Determining a video to which each frame of video frames to be displayed in the target video packet belongs, and acquiring the playing time of each frame of video frames to be displayed in the video to which each frame of video frames to be displayed belongs;

sequencing the video frames to be displayed in the same video based on the playing time to obtain at least one intermediate sequence result;

When the clicking operation of the currently played video frame to be displayed is detected, determining the position coordinate pointed by the clicking operation;

when the clicking operation belongs to the first area, determining and playing a previous video frame which is adjacent to the currently played video frame to be displayed and is positioned in front of the currently played video frame to be displayed based on the arrangement sequence of all video frames to be displayed in the video sequence;

when the clicking operation belongs to the second area, determining and playing a next video frame which is adjacent to the currently played video frame to be displayed and is positioned behind the currently played video frame to be displayed based on the arrangement sequence of the video frames to be displayed in the video sequence.

When a clicking operation of a face in a video frame to be displayed is detected, screening a second target face image group from at least one face image group according to the clicking operation;

And displaying the second facial image group.

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:

and playing the target video packet.

In one embodiment, the computer program when executed by the processor further performs the steps of:

In one embodiment, the group of face images includes face images belonging to the same person; the computer program when executed by the processor also performs the steps of:

And displaying the second facial image group.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can be in various forms such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), etc.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims

1. A method of visual information presentation, the method comprising:

Clustering the face images and the video sets respectively to obtain at least one face image group consisting of the face images and at least one video group consisting of video frames to be displayed; each face image group comprises face images belonging to the same person; each video packet includes video frames to be displayed belonging to the same person;

when a second triggering operation on the face in the face image to be displayed is detected, screening a target video group from the video groups based on the second triggering operation;

And playing the target video packet.

2. The method of claim 1, wherein the first trigger operation comprises a first click operation and a second click operation; the step of determining the face image to be displayed from the face image group through the first triggering operation includes:

when a first clicking operation for the face image group is detected, screening a first target face image group from the at least one face image group based on the first clicking operation;

3. The method of claim 2, wherein the group of face images includes face images belonging to the same person; the determining the arrangement sequence of the face images in the first target face image group comprises the following steps:

4. A method according to claim 3, wherein said determining correlations between other faces in each face image and the clustered faces comprises:

5. The method of claim 1, wherein said playing said target video packet comprises:

6. The method of claim 5, wherein playing the video sequence in the order of the video frames to be displayed in the video sequence comprises:

7. The method according to claim 1, wherein the method further comprises:

and displaying the second target face image group.

8. A visual information presentation device, the device comprising:

The clustering module is used for acquiring a plurality of face images and a video set; clustering the face images and the video sets respectively to obtain at least one face image group consisting of the face images and at least one video group consisting of video frames to be displayed; each face image group comprises face images belonging to the same person; each video packet includes video frames to be displayed belonging to the same person;

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.