CN114513694B

CN114513694B - Score determination method, device, electronic equipment and storage medium

Info

Publication number: CN114513694B
Application number: CN202210145089.5A
Authority: CN
Inventors: 李锦华
Original assignee: Ping An International Smart City Technology Co Ltd
Current assignee: Ping An International Smart City Technology Co Ltd
Priority date: 2022-02-17
Filing date: 2022-02-17
Publication date: 2024-09-20
Anticipated expiration: 2042-02-17
Also published as: CN114513694A

Abstract

The embodiment of the application provides a score determining method, a score determining device, electronic equipment and a storage medium, which comprise the following steps: during the process of playing the target video clips of the target video on the first interface, playing a target animation matched with the target video clips on the first interface; displaying the acquired user image on a first interface; determining a scoring result corresponding to each scoring time point according to the user image acquired in a first time range before each scoring time point and the video image matched with the user image acquired in the first time range, and displaying the scoring result corresponding to each scoring time point on a first interface when each scoring time point arrives; after the target video is played, the comprehensive scoring result is displayed on the first interface, so that equipment resources can be saved. The present application relates to blockchain technology, such as writing target animations into blockchains for obtaining scenes such as target animations that match target video clips.

Description

Score determination method, device, electronic equipment and storage medium

Technical Field

The present application relates to the field of internet technologies, and in particular, to a score determining method, a score determining device, an electronic device, and a storage medium.

Background

At present, with the development of internet technology, man-machine interaction is also becoming more and more mature. For example, the user can make corresponding actions according to the prompt actions in the video, so that the user can be helped to exercise and meanwhile, the exercise effect can be achieved. In the process of making corresponding actions with the video, the user cannot know that the user makes the score of the action with the prompt in the video, and thus the user cannot lose interest soon. Therefore, the user needs to score the user action after the user follows the prompt action, but in the prior art, the user image and the corresponding video image need to be continuously collected for comparison and scoring, and the process needs to have larger power consumption for equipment and more resources needed for processing the image.

Disclosure of Invention

The embodiment of the application provides a score determining method, a score determining device, electronic equipment and a storage medium, which can reduce the power consumption of the equipment and save resources required for processing images.

In one aspect, an embodiment of the present application provides a score determining method, including:

playing a target video on a first interface;

During the process of playing the target video clips of the target video, playing a target animation matched with the target video clips on the first interface, wherein the target video clips are any video clip in a plurality of video clips included in the target video, and the target animation is configured with at least one scoring time point;

invoking a camera device to collect a user image during the playing of the target video clip, and displaying the user image on the first interface; the frame rate of the target video is consistent with the frame rate of the user image acquired by the camera device;

Determining a scoring result corresponding to each scoring time point according to a user image acquired in a first time range before each scoring time point and a video image matched with the user image acquired in the first time range, and displaying the scoring result corresponding to each scoring time point on the first interface when each scoring time point arrives, wherein the video image is contained in the target video clip;

after the target video is played, determining a comprehensive scoring result according to the scoring result corresponding to each scoring time point configured for each of a plurality of animations corresponding to the target video, and displaying the comprehensive scoring result on the first interface, wherein the plurality of animations corresponding to the target video comprise animations matched with each of the plurality of video clips.

In a second aspect, an embodiment of the present application provides a score display device, including:

The processing unit is used for playing the target video on the first interface;

The processing unit is further configured to play, during playing a target video segment of the target video, a target animation matched with the target video segment on the first interface, where the target video segment is any one video segment of a plurality of video segments included in the target video, and the target animation is configured with at least one scoring time point;

The processing unit is further used for calling a camera device to collect user images during the process of playing the target video clip, and displaying the user images on the first interface;

the processing unit is further used for determining a scoring result corresponding to each scoring time point according to the user images acquired in a first time range before the scoring time point and the video images matched with the user images acquired in the first time range;

The display unit is used for displaying a scoring result corresponding to each scoring time point on the first interface when the scoring time point arrives; the video image is contained in the target video clip;

The processing unit is further configured to determine a comprehensive scoring result according to scoring results corresponding to each scoring time point configured for each of a plurality of animations corresponding to the target video after the target video is played, where the plurality of animations corresponding to the target video include animations matched with each of the plurality of video clips;

And the display unit is also used for displaying the comprehensive scoring result on the first interface.

In a third aspect, an embodiment of the present application provides an electronic device, including a processor and a memory, where the processor and the memory are connected to each other, where the memory is configured to store a computer program, and the computer program includes program instructions, and the processor is configured to invoke the program instructions to execute the score determining method described above.

In one aspect, embodiments of the present application provide a computer readable storage medium having stored therein program instructions that, when executed, are configured to implement the score determining method described above.

In one aspect, embodiments of the present application provide a computer program product or computer program, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium, the computer instructions, when executed by a processor of an electronic device, performing the score determination method described above.

In the embodiment of the application, the electronic equipment can play the target video on the first interface; during the process of playing the target video clips of the target video, playing a target animation matched with the target video clips on a first interface, wherein the target video clips are any video clip in a plurality of video clips included in the target video, and the target animation is configured with at least one scoring time point; invoking a camera device to collect user images during the process of playing the target video clip, and displaying the user images on a first interface; determining a scoring result corresponding to each scoring time point according to the user image acquired in a first time range before each scoring time point and the video image matched with the user image acquired in the first time range, and displaying the scoring result corresponding to each scoring time point on a first interface when each scoring time point arrives; the video image is contained in the target video clip; after the target video is played, determining a comprehensive scoring result according to the scoring result corresponding to each scoring time point configured for each of a plurality of animations corresponding to the target video, and displaying the comprehensive scoring result on a first interface, wherein the plurality of animations corresponding to the target video comprise animations matched with each of the plurality of video clips. By the evaluation determination method provided by the embodiment of the application, the corresponding scoring result can be displayed when each scoring time point in the animation arrives, the scoring result can be displayed in a short time to increase the interestingness of a user, and meanwhile, the scoring of the user in a period of time can be obtained by scoring the user image acquired in the first time range and the video image matched with the user image, so that the user image is not required to be continuously acquired and compared with the video image matched with the user image, the power consumption of the equipment can be reduced, and the resources required for processing the image are saved.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a scoring system according to an embodiment of the present application;

FIG. 2 is a flowchart of a score determining method according to an embodiment of the present application;

FIG. 3a is a schematic diagram of a first interface for displaying a target video clip and a target animation according to an embodiment of the present application;

Fig. 3b is a schematic flow chart of displaying a corresponding scoring result at a scoring time point according to an embodiment of the present application;

FIG. 3c is a schematic diagram of a next video clip of a target video clip displayed on a first interface according to an embodiment of the present application;

Fig. 4 is a schematic structural diagram of a score determining device according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The embodiment of the application can process the user image or the video image and the like based on the artificial intelligence technology. Wherein artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is the theory, method, technique, and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results.

Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

With the development of internet technology, man-machine interaction is also becoming more mature. The user can select the favorite video and follow the prompting action in the video to do the same action, so that the user plays a role in body building. However, the method is mainly limited to the movement of the user following the prompt action in the video, and the action of the user cannot be evaluated in a short time and the evaluation result cannot be displayed. In view of the shortcomings of the prior art, the embodiment of the application provides a scoring display scheme, which is generally based on the following principles: when the user wants to move along with the video, the user can select the favorite video, and accordingly, the electronic device can detect the triggering operation on the target video and play the target video on the first interface. Then during playing of the target video clip of the target video, a target animation matching the target video clip can be simultaneously played in the first interface, the target animation being configured with at least one scoring point in time. The target animation can be used for controlling the scoring time point for scoring the user action, and meanwhile, the target animation can also be used for controlling the scoring time point to display the scoring result and the sound effect perfect stuck point of the scoring result. Then when the target video clip is played, the user can move along with the prompt action in the target video clip, at this time, the camera device can collect the user image of the user, and the collected user image is displayed in the first interface. The electronic device may acquire a video image displayed in the first interface while the user image is acquired. Optionally, the target video frame rate and the frame rate of the user image collected by the camera device can be dynamically configured, so that the collected user image and the video image matched with the user image are located at the same time point, the scoring result of the user action at the time point can be determined according to the user image and the video image obtained at the same time point, and the more accurate scoring result can be obtained.

Then, before any scoring time point arrives, the electronic device may analyze the user image and the displayed video image collected in a first time range before any scoring time point, determine a scoring result of the user in the first time range, and display a scoring result corresponding to any scoring time point on the first interface when any scoring time point arrives. The process of analyzing the user image and the displayed video image acquired in the first time range before any scoring time point is as follows: and analyzing the user gesture of the user according to the user images acquired at each time point in the first time range and the displayed video images. After the playing of the target video clip is completed, the next video clip of the target video clip and the animation matched with the next video clip can be continuously played in the first interface until the playing of the target video is completed, and after the playing of the target video is completed, a comprehensive grading result is displayed in the first interface or the second interface, wherein the comprehensive grading result is determined according to grading results corresponding to grading time points configured by each animation in the plurality of animations, and each animation is matched with any video clip in the plurality of video clips.

The scoring display scheme provided by the embodiment of the application has the following beneficial effects: (1) When the grading time of the animation configuration corresponding to the video clip arrives, the grading result can be displayed at the grading time point, so that the grading result is displayed in a short time, the user is ensured to know the grading result, and the user is prevented from paying attention to the grading. (2) The scoring result of the user in a period of time can be obtained by scoring the user image acquired in the first time range and the video image matched with the user image, and the user image is not required to be continuously acquired and compared with the video image matched with the user image, so that the power consumption of the equipment can be reduced, and meanwhile, the resources required for processing the image are saved. (3) The frame rate of the target video is consistent with the frame rate of the user image collected by the camera device, the problem that the video frame rate is different from the frame rate of the user image collected by the camera device is solved, and further, the user image collected in the first time range and the displayed video image correspond to a time point, so that the scoring result determined by the user image collected in the first time range and the displayed image is more accurate, and meanwhile, after the processing time of the camera device to the user image is reduced as much as possible, the video image played at the scoring time point and the displayed user image are visually synchronized. Wherein, reducing the processing time of the image can be performed on the saturation and the color processing time of the image.

Based on the scoring display scheme, an embodiment of the present application provides a scoring display system, referring to fig. 1, which may include at least one terminal device 101 and at least one server 102. The terminal device 101 is provided with a camera device by means of which images of the user can be acquired, which camera device can be a camera module of the terminal device itself, in which case the terminal device 101 is also provided with a camera. Optionally, the terminal device 101 may provide a video playing interface for the user, where the video playing interface may play various videos including multiple video clips, and play the animation corresponding to each video clip in the video playing interface at the same time. The terminal device 101 may display the collected user image in the video playback interface, and display the scoring result at each scoring time point. Server 102 may store various types of videos, each of which may include multiple video clips, and animations corresponding to each video clip, scoring results, and so forth. Wherein the terminal device 101 and the server 102 may be directly or indirectly connected by wired or wireless communication. The terminal device 101 may be an intelligent device such as a mobile phone, a tablet computer, a notebook computer, a palm computer, a Mobile Internet Device (MID), a wearable device, etc. The server 102 may include a plurality of servers (also referred to as nodes), which may be independent physical servers, and may also be cloud servers that provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms.

Based on the above-mentioned score display scheme, referring to fig. 2, fig. 2 is a flowchart of a score determining method provided by an embodiment of the present application, where the score determining method may be performed by an electronic device, and the electronic device may be the terminal device 101 or the server 102 in the above-mentioned score display system, and the score determining method may include the following steps S201 to S205:

s201: and playing the target video on the first interface. Wherein, the target video can be dance video, fitness video and the like; the target video may include a plurality of video clips.

In one embodiment, the electronic device may provide a video playing selection interface, where the video playing selection interface may be an interface of the electronic device that is capable of selecting to play video; or different types of applications are running on the electronic device, the video play selection interface may be a video play selection interface in a target application, where the target application may be an exercise APP (application program), a dance APP, or the like. A plurality of videos are displayed in the video play selection interface. The user can select the favorite target video in the video playing selection interface. And playing the target video in the first interface when the electronic equipment detects that the target video in the plurality of videos is selected. The method for selecting the favorite target video in the video playing selection interface by the user can be a single click mode, a double click mode, a voice mode and the like. For example, when a user inputs "play target video" by voice, the electronic device may perform semantic analysis on the "play target video" of the voice input, and select the target video based on the voice analysis result. The first interface may be a video play selection interface or the first interface may be another video play interface. When the first interface is other video playing interfaces, the electronic equipment can switch from the video playing selection interface to the first interface and play the target video on the first interface when the electronic equipment detects that the target video in the videos is selected.

In one embodiment, the multiple videos included in the video playing selection interface all correspond to scoring results, and the electronic device may automatically select a target video with the highest score after entering the video playing selection interface, and play the target video on the first interface. The target video with the highest score is played, so that the user can conveniently initiate challenges to the target video with the highest score, and the interestingness is increased.

S202: and during the process of playing the target video clips of the target video, playing a target animation matched with the target video clips on the first interface, wherein the target video clips are any video clip in a plurality of video clips included in the target video, the target animation is configured with at least one grading time point, and each video clip corresponds to one animation. The target video clip comprises a standard prompt action and is used for prompting a user to move along with the standard prompt action.

In a specific implementation, each video may be divided into a plurality of video clips, after step S201 is performed, the first interface includes a shooting button, when the user clicks the shooting button, the electronic device detects a clicking operation for the shooting button, and when the electronic device plays the plurality of video clips of the target video in sequence according to a time sequence, the electronic device may obtain, from the local storage space, a target animation matching the target video clip. Or the user can select a target video clip to be played from a plurality of video clips, and when the electronic equipment detects that the target video clip in the plurality of video clips is selected, the electronic equipment can acquire a target animation matched with the target video clip; and playing the target animation on the first interface during playing the target video clip of the target video.

In one embodiment, the target video clip may be played at a first location in the first interface, which may be any location of the first interface; for example, in fig. 3a, the target video clip is played in the upper left corner position. The target animation may be played at a second location in the first interface, which may be any location of the first interface, such as an upper left corner location, a middle location, a right location, etc. of the first interface, as in FIG. 3a, the target animation is played at a location below the first interface.

In one embodiment, the electronic device may divide the target video into a plurality of video segments according to the playing time of the target video, where the playing time of each video segment is the same. Or the electronic equipment can randomly divide the target video into a plurality of video clips according to the playing time length of the target video; or the target video comprises a plurality of standard prompt actions, and the electronic device can divide the target video into a plurality of video segments according to the plurality of standard prompt actions in the target video, wherein each video segment comprises one standard prompt action.

After dividing the target video clip into the plurality of video clips, the electronic device may display a second interface, the second interface may include the plurality of video clips of the target video, and in response to an animation configuration operation on the target video clip of the plurality of video clips, a third interface may be displayed, the third interface including a candidate animation set; wherein the candidate animation set may include one or more animations. The target video clip corresponds to an animation configuration button, and the animation configuration operation may be a single click, double click, or the like operation on the animation configuration button, or the animation configuration of each video clip corresponds to a fixed gesture, and the animation configuration operation may be an operation of inputting a corresponding gesture, for example, an M gesture, an OK gesture, or the like. The user then enters a corresponding gesture, and the electronic device may respond to the animated configuration operation on the target video clip of the plurality of video clips. When an animation selection operation for a set of candidate animations is detected, the selected animation is determined as a target animation, and at least one scoring time point is configured for the target animation in response to a scoring time configuration operation for the target animation. Wherein the animation selection operation may be a single click, double click of a certain animation. Each animation corresponds to a scoring time configuration button, and the scoring time configuration operation of the target animation can be a single click or a double click on the scoring time configuration button. The playing time length of the target animation can be consistent with the playing time length of the target video clip, namely, the playing life cycle of the target animation is the same as the playing life cycle of the target video clip. Specifically, at least one scoring time point is configured for the target motion, and at least one scoring time point can be configured in the target animation according to the scoring requirement, so that the scoring result can be displayed at each scoring time point later. And then when detecting that the configuration of the video clips is finished, storing each video clip and the corresponding animation in a correlated way, and storing the video clip and the corresponding animation in a local storage space of the electronic equipment, or storing each video clip and the corresponding animation in a correlated way, and storing the video clip and the corresponding animation in a blockchain network. Setting the corresponding animation for any one of the plurality of video clips may be in accordance with a specific implementation of setting the corresponding target animation for the target video clip. At this time, the animation content in the target animation may be a standard prompt action in the target video clip; or the animation content in the target animation may be a landscape, an animal, or the like.

In one embodiment, when an animation selection operation is detected for a set of candidate animations, a specific implementation of determining the selected animation as the target animation may be: when an animation selection operation for a candidate animation set is detected, a corresponding animation may be selected from the candidate animation set by the animation selection operation, and the selected animation may be determined as a target animation. Wherein the animation included in the candidate animation set may include a play duration and animation content, and the animation selection operation may be generated based on the play duration and animation content of the target video clip. I.e. the playing time length and the animation content of the target video clip are considered when selecting the corresponding animation for the target video clip.

In one embodiment, a second interface is displayed, the second interface comprising a plurality of video segments of the target video; in response to an animation configuration operation on a target video clip of the plurality of video clips, identifying a standard user action (i.e., a standard prompt action) in the target video clip, and configuring a corresponding target animation based on the standard user action as the target video clip; at least one scoring time point is configured for the target motion in response to a scoring time configuration operation for the target motion.

Wherein, in response to the scoring time configuration operation for the target animation, configuring at least one scoring time point for the target animation may be: in response to the scoring time configuration operation for the target animation, one or more scoring time points are set randomly according to the playing time length of the target animation, or may be set on average according to the playing time length of the target animation. For example, the playing time of the target animation is 1 minute, and the 5 th second and the 10 th second can be set as scoring time points; or directly set a scoring time point, i.e., 30 seconds as the scoring time point, 60 th seconds as the scoring time point, etc. In one embodiment, each video clip corresponds to an animation, and at least one scoring time point set by each animation may be the same or different. It should be noted that, the play duration of each video clip can be understood as a play life cycle of the video clip. When the play life cycle of any video clip ends, it means that the next video clip can be played.

S203: and calling the camera device to collect the user image during the playing of the target video clip, and displaying the user image on the first interface.

In one embodiment, the frame rate of the target video can be dynamically adjusted to be consistent with the frame rate of the user image collected by the camera device, so that the problem that the frame rate of the video is different from the frame rate of the user image collected by the camera device is solved, the user image collected in the first time range and the video image matched with the user image collected in the first time range can be favorably corresponding to a time point, the scoring result determined by the user image collected in the first time range and the image collected in the first time range is more accurate, and meanwhile, when the processing time of the camera device to the user image is reduced as much as possible, the video image played at the scoring time point and the displayed user image can be visually synchronized. The frame rate of the target video refers to the playing frame rate of the target video, namely the playing frame rate refers to the number of frames played per second, and the frame rate of the user image collected by the image pickup device refers to the number of frames of the user image collected by the image pickup device per second. If the image pickup apparatus collects 30 frames for 1 second, the frame rate at which the user image is collected may be 33 milliseconds to collect 1 frame, or the like. Alternatively, the frame rate at which the image of the user is acquired may be different for different types of image capturing devices. In this case, the frame rate of the target video also changes with the frame rate at which the image capturing apparatus captures the user image. The target video clip may be composed of multiple frames of user images.

In one embodiment, the electronic device determines the time spent from the acquisition of the first user image to the display of the first user image, then can automatically adjust internal parameters in the image pickup device according to the time spent as required, and can reduce the processing time of the image pickup device on image saturation, color, image de-dithering, denoising and the like by adjusting the internal parameters, thereby shortening the processing time of the acquired user image and effectively ensuring that the time and the vision of the user image currently displayed on the first interface are consistent with those of the video image matched with the user image.

In one embodiment, if the target video playing is 1 second and 1 frame, then the target video clip playing of the target video is also 1 second and 1 frame, and it is assumed that the user image acquired at the 4 th second needs to be compared with the video image displayed at the 4 th second, at this time, the video image displayed at the 4 th second needs to be acquired, the user image of the user is acquired at the 4 th second by using the image capturing device, and then the displayed video image and the acquired user image are compared and scored. But it is obvious that if the target video is 1 second and the playing frame rate of 1 frame is obviously not matched with the frame rate of the camera (1 second and 30 frames), the video image displayed at the 4 th second and the acquired user image cannot be matched; meanwhile, the standard prompt action in the target video segment also looks obviously not corresponding to scoring animation and audio effect in the scoring result display, so that the best method for solving the problem is to subdivide the playing progress callback time granularity of the video player, set the frame rate of the target video to be the frame rate of the camera, for example, the camera is 1 second for 30 frames, namely, 33 milliseconds is used for collecting one frame, and set the frame rate of the target video to be 33 milliseconds is used for collecting one frame, so that the displayed video image can be compared with the user image collected by the camera on the premise that only 1 frame (33 milliseconds) is different each time, and the more accurate comparison result can be obtained; meanwhile, standard prompt actions in the visual user image and the user image actions are guaranteed to be the same actions, meanwhile, scoring and sound effect synchronization can be guaranteed, and experience is improved. In a specific implementation, the frame rate of the target video may be adjusted as the frame rate of the user image acquired by different types of camera devices. Before the electronic device plays the target video on the first interface, the frame rate of the target video can be dynamically adjusted to the frame rate of the user image collected by the camera device. Specifically, before the target video is played on the first interface, a configuration request for the frame rate of the target video may be received, and the electronic device obtains the initial frame rate of the target video and the frame rate of the user image acquired by the image capturing device based on the configuration request; then judging whether the initial frame rate of the target video is consistent with the frame rate of the user image acquired by the camera device, and when the initial frame rate of the target video is inconsistent with the frame rate of the user image acquired by the camera device, updating the initial frame rate of the target video into the frame rate of the user image acquired by the camera device by the electronic equipment to obtain the updated frame rate of the target video; and determining the updated frame rate of the target video as the frame rate of the target video. When the initial frame rate of the target video is consistent with the frame rate of the user image acquired by the camera device, the initial frame rate of the target video is not required to be modified.

In one embodiment, the user may make a corresponding action according to the standard prompt action in the target video clip, and the electronic device may invoke the camera device to acquire a user image, where the user image includes a user gesture for the user to make the corresponding action following the standard prompt action. The camera device can be a camera component of the electronic equipment, such as a camera head, a camera and the like; or the image pickup apparatus is an image pickup device dedicated to image pickup, such as a video camera. After the user image is acquired, the electronic device may display the user image at a target position of the first interface, where the target position may be any position in the first interface, for example, the target position may be a middle position, a lower position, and so on of the first interface; as another example, the target location may be the location shown in fig. 3 a. In one embodiment, the electronic device may acquire the user image of the user in real time during the playing of the target time segment, and each time the user image of the user is acquired, the user image of the user is displayed once in the first interface. Optionally, the nodes included in the user image may also be displayed, as indicated by the black dots in the user image in fig. 3 a. In one embodiment, when the user image is displayed, a plurality of joint points included in the first user image, and a connecting line between the joint points and a first included angle between adjacent connecting lines can be displayed; simultaneously displaying a plurality of joints included in the video image matched with the first user image, connecting lines among the joints, and a second angle between adjacent connecting lines; through the comparison between the first angle and the second angle, the difference between the user and the standard prompt action in the target video segment can be intuitively known through angle matching, and if the first angle is different from the second angle, the user can quickly adjust the user's action according to the second angle. For example, the hip joint point and the foot joint point in the user image form a connecting line 1; the hip joint point and the shoulder joint point form a connecting line 2, and a first angle is formed between the connecting line 1 and the connecting line 2; the hip joint points and the foot joint points of the video image matched with the user image form a connecting line 3; the hip joint point and the shoulder joint point form a connection 4, and a second angle is formed between the connection 3 and the connection 4. Then judging whether the first angle and the second angle are the same, and if so, adjusting the action by the user.

In one embodiment, before step S204 is performed, due to external influence, the playing of the target animation may be paused, which may be understood as follows: and when the playing of the target animation is paused, the playing of the target video clip and the like are paused. Wherein, external influence refers to artificial play pause or video buffering and the like. At this time, when receiving the animation interrupt signal, pause playing the target animation, and record the played time length of the target animation; and when the animation starting signal is received, determining a playing position based on the played time length, and continuing to play the target animation by taking the playing position as a starting point.

S204: determining a scoring result corresponding to each scoring time point according to the user image acquired in a first time range before each scoring time point and the video image matched with the user image acquired in the first time range, and displaying the scoring result corresponding to each scoring time point on a first interface when each scoring time point arrives; the video image is contained in the target video clip, that is, the target video clip may be composed of a plurality of frames of video images. Wherein, the matching in the video image matching the user image acquired in the first time range may refer to matching at a time point or matching at an action. For example, the matching in time, the user image is acquired at the 4 th s, and the video image matched with the user image may be the video image displayed at the 4 th s; for another example, the matching of actions includes a squat action in the user image acquired in the first time frame, and the video image matching the user image also includes a squat action. In one embodiment, the user image acquired by the image capturing device at each time point and the video image displayed on the first interface at the first time point may also be matched.

In a specific implementation, user gesture analysis is performed according to a user image acquired in a first time range before each scoring time point and a video image matched with the user image acquired in the first time range, so as to obtain a scoring result corresponding to each scoring time point, then the electronic device can continuously acquire the current playing time of the target animation, judge whether the current playing time is any scoring time point in at least one scoring time point, and if the current playing time is any scoring time point in at least one scoring time point, display the scoring result corresponding to the any scoring time point on a first interface. The scoring results may be displayed at any location on the first interface. For example, as shown in fig. 3b, when the small person in the target animation moves from right to left, the electronic device may continuously determine whether the current playing time is any of the at least one scoring time points, and when it is determined that the current playing time is any of the at least one scoring time points (when the current playing time reaches the scoring time point as shown in fig. 3 b), display 95 points in the first interface.

In one embodiment, the target video clip is composed of multiple frames of video images, and playing the target video clip may be understood as displaying each frame of video image at the first interface. Before each scoring time point (e.g., a target scoring time point) arrives, the electronic device may determine a scoring result corresponding to each scoring time point (e.g., the target scoring time point) according to the user image acquired in the first time range and the video image matched with the user image acquired in the first time range, and then display the scoring result corresponding to each scoring time point (e.g., the target scoring time point) on the first interface. The first time range may be located between the scoring time point (e.g., the target scoring time point) and a previous scoring time point (e.g., the target scoring time point) of the scoring time point. It is understood that the first time range before each scoring time point may be a part or all of a time range formed between each scoring time point (e.g., the target scoring time point) and a previous scoring time point (e.g., the target scoring time point), for example, the scoring time point is 4 th seconds, the previous scoring time point of the scoring time point is 1 st seconds, the first time range may be 3 seconds (i.e., 2 nd seconds, 3 rd seconds), or the first time range may be any 1 second between 1 st seconds and 4 th seconds, etc.

In one embodiment, since the user image acquired in the first time range and the video image matched with the user image acquired in the first time range may be different in time points or the user action included in the user image and the standard prompt action included in the video image are not the same action, the scoring result may be determined to be unreasonable, and in order to ensure that the scoring result displayed at each scoring time point is relatively accurate, for the target scoring time point in at least one scoring time point, determining the first scoring result corresponding to the target scoring time point according to the user image acquired in the first time range before the target scoring time point and the video image matched with the user image acquired in the first time range; judging whether the first scoring result is greater than or equal to a scoring result threshold value, if the first scoring result is smaller than the scoring result threshold value, acquiring a user image acquired in a second time range before a target scoring time point and a video image matched with the user image acquired in the second time range to determine a second scoring result corresponding to the target scoring time point, wherein the second time range comprises the first time range; and then determining a scoring result corresponding to the target scoring time point based on the second scoring result and the first scoring result. And if the first scoring result is determined to be greater than or equal to the scoring result threshold, taking the first scoring result as the scoring result corresponding to the scoring point of the target time. The target scoring time point is any one of the at least one scoring time point.

In one embodiment, after determining the second scoring result, a difference between the first scoring result and the second scoring result may be determined, if the difference between the first scoring result and the second scoring result is greater than a difference threshold, which indicates that the first scoring result and the second scoring result may be caused by different time points or that the user action included in the first user image and the standard prompting action included in the first video image are not the same, at which time the electronic device may redetermine the first time range and the second time range to a new time range, and determine the scoring result corresponding to the scoring time point according to the first user image acquired in the new time range and the first video image matched with the first user image. By the method, accuracy of scoring results can be effectively improved. If the difference between the first scoring result and the second scoring result is less than or equal to the difference threshold, a new time range does not need to be determined.

Wherein the second time range includes the first time range means that the second time range may include the first time range. It may be understood that when the first scoring result is lower than the scoring result threshold, the second time range may further obtain K time points in the first time range, for example, the scoring time point is 5 seconds, the first time range is 1 st second, 2 nd second, and 3 rd second, and the second time range may further obtain 1 time point, that is, obtain 4 th second. The electronic device may then determine a scoring result corresponding to the scoring time point based on the second scoring result and the first scoring result. Specifically, the first scoring result and the second scoring result may be summed to obtain a scoring result corresponding to the scoring time point. The accuracy of the scoring result corresponding to the target scoring time point can be effectively improved by evaluating the first scoring result and the second scoring time result. Or the scoring result corresponding to the target scoring time point with the largest scoring result is taken from the first scoring result and the second scoring result, so that the enthusiasm of the user can be improved.

The specific implementation of determining the second scoring result corresponding to the target scoring time point according to the user image acquired in the second time range before the target scoring time point and the video image matched with the user image acquired in the second time range can be the same as the specific implementation of determining the first scoring result corresponding to the target scoring time point according to the user image acquired in the first time range before the target scoring time point and the video image matched with the user image acquired in the first time range, so that the first scoring result corresponding to the target scoring time point is determined according to the user image acquired in the first time range before the target scoring time point and the video image matched with the user image acquired in the first time range.

The specific implementation manner that the electronic device determines the first scoring result corresponding to the target scoring time point according to the user image acquired in the first time range before the target scoring time point and the video image matched with the user image acquired in the first time range may include s11-s13:

s11: and performing human body joint point detection on the user image acquired in the first time range to obtain a first human body joint point detection result of the user image acquired in the first time range, and performing human body joint point detection on the video image matched with the user image acquired in the first time range to obtain a second human body joint point detection result of the video image matched with the user image acquired in the first time range. The image pickup device collects user images of the user, so that human body joint points can be detected on the user images.

In specific implementation, the electronic device performs preprocessing on a user image acquired in a first time range to obtain a preprocessed user image, and then performs human body joint point detection on the preprocessed user image by adopting a preset joint point identification model to obtain a first human body joint point detection result of the user image acquired in the first time range, wherein the preset joint point identification model is obtained by training based on a plurality of training user images and joint point labels corresponding to each training user image; the preprocessing may include: denoising the target frame image and unifying the sizes of the target frame image. The preset joint point recognition model may be a human body posture estimation openpose. OpenPose can detect all the joints of the user in the user image.

It should be noted that, the specific implementation manner of performing the human body joint point detection on the video image matched with the user image acquired in the first time range to obtain the second human body joint point detection result of the video image matched with the user image acquired in the first time range may refer to the specific implementation manner of performing the human body joint point detection on the user image acquired in the first time range to obtain the first human body joint point detection result of the user image acquired in the first time range, which is not described herein again.

S12: and constructing a user posture evaluation parameter set according to the first human body joint detection result and the second human body joint detection result, wherein the user posture evaluation parameter set comprises at least one posture evaluation parameter.

In one embodiment, the user images acquired over the first time range may include a target user image, and the video images that match the target user image include a target video image. Wherein the first time range includes one or more time points. The target time point is any one of one or more time points. The first human body node detection result may include a plurality of first nodes in the target user image and location information of each first node, which may be understood as: the camera device captures a user image of a user to whom the plurality of first nodes may belong. The second human body node detection result includes a plurality of second nodes in the target video image and position information of each of the second nodes. The plurality of second nodes belong to a user in the target video image. The specific implementation manner of the electronic device for constructing the user gesture evaluation parameter set according to the first human body joint detection result and the second human body joint detection result may be: counting the number of the plurality of first nodes and the number of the plurality of second nodes; the number of the plurality of first nodes and the number of the plurality of second nodes can be compared to obtain a first comparison result, and the first comparison result indicates the matching degree between the number of the plurality of first nodes and the number of the plurality of second nodes; specifically, the electronic device may determine whether the number of the plurality of first nodes is the same as the number of the plurality of second nodes, and if the number of the plurality of first nodes is the same as the number of the plurality of second nodes, which indicates that the user gesture of the user in the target user image matches the user gesture of the user in the target video image, determine that the matching degree between the number of the plurality of first nodes and the number of the plurality of second nodes is a first preset value; If the number of the plurality of first nodes is different from the number of the plurality of second nodes, which indicates that the user gestures of the user in the target user image and the user in the target video image are different, determining that the matching degree between the number of the plurality of first nodes and the number of the plurality of second nodes is a second preset value. The reason why the number of the plurality of first nodes is different from the number of the plurality of second nodes may be that: the user's user actions are not standard, resulting in a small number of nodes being unrecognizable. For example, a user image acquired at 5 th second; the video image matched with the user image is a video image displayed in the 5 th second, but the user image collected in the 5 th second comprises a bending action, the video image displayed in the 5 th second comprises an extending action, at the moment, the user image comprises 3 joint points, the video image displayed in the 5 th second can comprise 6 joint points, and the user action is not standard compared with the standard prompt action in the video image. the first preset value and the second preset value can be set according to requirements. Alternatively, the first preset value, the second preset value, the third preset value, and the like may be set according to a difference in number between the number of the plurality of first nodes and the number of the plurality of second nodes. For example, the number difference is 1, the degree of matching between the number of the plurality of first nodes and the number of the plurality of second nodes is a first preset value, the number difference is 2, the degree of matching between the number of the plurality of first nodes and the number of the plurality of second nodes is a second preset value, the number difference is 3, and the degree of matching between the number of the plurality of first nodes and the number of the plurality of second nodes is a third preset value.

The electronic equipment can compare the position information of each first joint point with the position information of a second joint point corresponding to each first joint point to obtain a second comparison result, and the second comparison result indicates the matching degree between the first joint point and the second joint point corresponding to the first joint point; in a specific implementation, the position information may include position coordinates, and the electronic device may compare, for a target node of the plurality of first nodes, the position coordinates of the target node with position coordinates of a second node corresponding to the target node, to obtain a second comparison result. The second node corresponding to the target node refers to: the target joint point and the second joint point belong to the same type of joint point. For example, the target node is a wrist node, and the second node corresponding to the target node is also a wrist node. And comparing the position coordinates of the target articulation point with the position coordinates of a second articulation point corresponding to the target articulation point, and if the two position coordinates are the same, indicating that the positions of the target articulation point and the second articulation point corresponding to the target articulation point are the same, determining the matching degree of the second articulation point corresponding to the target articulation point and the target articulation point as a fourth preset value. If the two position coordinates are different, the positions of the target joint point and the second joint point corresponding to the target joint point are different; and determining the matching degree of the target joint point and the second joint point corresponding to the target joint point as a fifth preset value. Optionally, the target distance may be calculated according to the two position coordinates, the smaller the distance, the higher the matching degree of the target node and the second node corresponding to the target node is, and the matching degree corresponding to the target distance is determined according to the corresponding relationship between the reference matching degree and the reference distance, so as to obtain the matching degree of the target node and the second node corresponding to the target node. It should be noted that, any first node may refer to the target node to calculate the matching degree between the target node and the corresponding second node. When the matching degree between all the first articulation points and the corresponding second articulation points is obtained, the matching degree between all the first articulation points and the corresponding second articulation points can be subjected to average operation or weighted average calculation, and a second comparison result is obtained.

After the first comparison result and the second comparison result are determined, determining the user gesture standard according to the first comparison result and the second comparison result. The first comparison result may include a degree of matching between the number of the plurality of first nodes and the number of the plurality of second nodes; the second comparison result may include a degree of match between each joint point and a corresponding second joint point; as an implementation manner, the electronic device may calculate an average value according to the first comparison result and the second comparison result, so as to obtain the user gesture standard. As another implementation manner, the electronic device may perform weighted summation on the first comparison result to obtain a first weighted value; and carrying out weighted summation on the second comparison result to obtain a second weighted value. And calculating an average of the first weighted value and the second weighted value to obtain the user gesture standard, and then constructing a user gesture evaluation parameter set comprising the user gesture standard.

In one embodiment, the user images acquired in the first time range include a plurality of user images, and the video images matched with the user images acquired in the first time range include a plurality of frames of video images, for example, the plurality of user images are acquired at a plurality of time points in the first time range, the plurality of frames of video images are displayed at a plurality of time points in the same first time range, and each user image can be matched in the plurality of frames of video images. For example, the first time range is "5 th second-6 th second", 2 user images are acquired by calling the camera device at the 5 th second and the 6 th second in the first time range, and 2 frames of video images are displayed at the 5 th second and the 6 th second in the first time range, respectively. It can be understood that the video image displayed at the 5 th second is matched with the user image acquired at the 5 th second of the camera device; matching the user image acquired by the camera device for 6 seconds is the video image displayed for the 6 th second. The specific implementation manner that the electronic device can construct the user gesture evaluation parameter set according to the first human body joint point detection result and the second human body joint point detection result can be as follows: drawing a first motion track of each joint point in a plurality of joint points according to a first human joint point detection result of each user image in the plurality of user images; drawing a second motion track of each joint point in multiple joint points in the multiple frames of video images according to a second human joint point detection result of each frame of video image in the multiple frames of video images; matching a first motion trail of each of a plurality of types of joints in each frame of user image with a second motion trail of each of a plurality of types of joints in each frame of video image to obtain motion trail similarity, and determining user gesture grace according to the motion trail similarity; constructing a gesture evaluation parameter set comprising the user gesture grace; the first human body joint point detection result comprises position coordinates of a plurality of joint points.

As an implementation manner, the number of user images and video frame images between various kinds of joints in the plurality of user images and various kinds of joints in the multi-frame video image is the same, and the kinds of joints are the same, for example, the number of user images is 2, and the user images are respectively user images acquired in the 5 th second and user images acquired in the 6 th second; the video images are 2, namely the video image displayed in the 5 th second, the video image displayed in the 6 th second, and the number of various joints included in the user image collected in the 5 th second and the number of various joints included in the video image displayed in the 5 th second are the same and the types of the joints are the same. In this case, for a target node in multiple nodes in multiple user images, the computer may connect the position coordinates of the target node in each user image to obtain a first motion track of the target node, and similarly, may draw a second motion track of the target node in the multiple nodes according to a second human node detection result of each video image in multiple video images. Namely, connecting the position coordinates of the target joint point in the user image acquired in the 5 th second and the position coordinates of the target joint point in the user image acquired in the 6 th second to obtain a first motion track of the target joint point; and connecting the position coordinates of the target joint point in the video image displayed in the 5 th second and the video image displayed in the 6 th second to obtain a second motion track of the target joint point. The first motion trail and the second motion trail of each joint point can be obtained according to the mode. The first motion trail of each joint point can see whether the user is in a discontinuous condition when doing actions, and the user gesture grace degree of the user can be determined.

In another implementation, the number of user images and video images between the plurality of types of nodes in the plurality of user images and the plurality of types of nodes in the plurality of frames of video images is the same, but the types of nodes are different. This situation may be due to inconsistent user actions included in the user image and standard hinting actions in the video image. For example, the user images are 2, are respectively user images collected in the 5 th second, and are user images collected in the 6 th second; the video images are 2, respectively are 5 th second video images, 6 th second video images, but the 5 th second user images and the 6 th second user images all comprise bending motions, the 5 th second user images comprise bending motions, the 6 th second user images comprise stretching motions, at this time, the 5 th second video images all comprise 3 kinds of joint points, the 5 th second video images can comprise 3 kinds of joint points and the 6 th second video images all comprise 6 kinds of joint points, and the 3 kinds of joint points belong to 6 kinds of joint points. At this time, the electronic device may refer to the above manner, obtain 3 kinds of joints included in the user image, draw the first motion trail of each kind of joint of the 3 kinds of joints, and generate the second motion trail of each kind of joint of the 3 kinds of joints according to the video image at the same time, and the remaining 3 kinds of joints cannot form the corresponding motion trail, so that the similarity may be directly set to 0. In the embodiment of the present application, when the types of the nodes included in the user images and the user images in the plurality of user images may be different or the same, the plurality of user images need to connect the position coordinates of each related node. Similarly, in the multi-frame video image, the position coordinates of each related node may be connected.

Then, the electronic device matches the first motion trail of each of the plurality of kinds of joints in each frame of user image with the second motion trail of each of the plurality of kinds of joints in each frame of video image, and the specific implementation manner of obtaining the motion trail similarity is as follows: the electronic device may determine the user gesture grace according to a motion trajectory similarity between a first motion trajectory of each joint point in each user image and a second motion trajectory of a plurality of joint points in each video image. Specifically, the electronic device may calculate a reference motion trajectory similarity between the first motion trajectory and the second motion trajectory of each joint point, and determine the motion trajectory similarity based on the reference motion trajectory similarity obtained by each joint point. Specifically, the similarity of the reference motion trail obtained by each joint point can be subjected to average treatment to obtain the similarity of the motion trail; or weighting the similarity of the reference motion trail obtained by each joint point, and then carrying out average processing to obtain the similarity of the motion trail. In one embodiment, the electronic device may calculate the motion trajectory similarity between the first motion trajectory of each of the articular points and the second motion trajectory of the articular points using a similarity algorithm, which may be LCSS, DTW, or the like. Then, a specific implementation manner of determining the reference user gesture grace degree of each articulation point based on the motion trail similarity obtained by each articulation point may be: setting the correspondence between the user gesture grace degree and the similarity, for example, the similarity is 10-29, and the user gesture grace degree is 20; similarity of 30-49, user gesture grace of 60, etc. The electronic device may determine the user gesture grace from a correspondence between the user gesture grace and the similarity according to the motion trajectory similarity. Then, a set of posture evaluation parameters including the user's posture grace is constructed.

It should be noted that, the playing mode of any video clip on the first interface, the playing mode of any video clip matching animation, the scoring result corresponding to each scoring time point determined in any video clip, and the like may be referred to the specific implementation manner referred to in the target video clip correspondingly, which is not described herein.

In one embodiment, as shown in fig. 3c, after the playing of the target video clip is completed, the electronic device may display a next video clip of the target video clip on the first interface, and during the playing of the next video clip, obtain a matching animation matching with the next video clip, and play the matching animation in the first interface; and when each scoring time point in at least one scoring time point included in the matching animation arrives, displaying a scoring result corresponding to the scoring time point on the first interface. The next video clip here may be a video clip played in time order, or the next video clip may also be user-selected.

S13: and generating a first scoring result corresponding to the target scoring time point according to the gesture evaluation parameter set. Wherein the set of user gesture evaluation parameters may comprise at least one of: user gesture standard, user skill mastery, user gesture grace.

After the user gesture evaluation parameter set is constructed, the electronic device may generate, according to the gesture evaluation parameter set, a first scoring result corresponding to the target scoring time point, where the first scoring result may be: and carrying out average operation on at least one gesture evaluation parameter included in the gesture evaluation parameter set to obtain a first scoring result corresponding to the target scoring time point. Specifically, a set of pose evaluation parameters is constructed for the target time score points. And the electronic equipment carries out average operation on the values of the gesture evaluation parameters in the gesture evaluation parameter set constructed by the target scoring time point to obtain a first scoring result corresponding to the target scoring time point. Wherein the target scoring time point is any one of the at least one scoring time point. For example, the gesture evaluation parameter set constructed at the scoring time point 1 includes a user gesture grace (e.g., the value of the user gesture grace is 95) and a user gesture standard (e.g., the value of the user gesture standard is 95), and the electronic device may perform an average operation on the user gesture grace and the user gesture standard to obtain a first scoring result corresponding to the scoring time point 1, i.e., (95+95)/2=95. Optionally, the generating, by the electronic device, the first scoring result corresponding to the target scoring time point according to the pose evaluation parameter set may be: and weighting at least one gesture evaluation parameter included in the gesture evaluation parameter set, and carrying out average operation after weighting to obtain a first scoring result corresponding to the target scoring time point.

S205: after the target video is played, determining a comprehensive scoring result according to scoring results corresponding to each scoring time point configured for each animation in a plurality of animations corresponding to the target video; and displaying the comprehensive scoring result on the first interface, wherein the plurality of animations corresponding to the target video comprise animations matched with each video clip in the plurality of video clips.

In one embodiment, as can be seen from the foregoing, the target video includes a plurality of video clips, each video clip has a matching animation, so after the target video is played, the electronic device may determine, according to the scoring result corresponding to each scoring time point configured by each animation in the plurality of animations, a composite scoring result of the user moving along with the target video. As an implementation manner, the electronic device may perform an average operation on the scoring results corresponding to each scoring time point of each animation configuration, so as to obtain a comprehensive scoring result. As another implementation manner, the electronic device may perform weighting processing on the scoring results corresponding to each scoring time point of each animation configuration, and perform average operation on the weighting processing results to obtain a comprehensive scoring result.

After the comprehensive scoring result is obtained, the electronic device may display the comprehensive scoring result on the first interface or the second interface. When the composite score result is displayed on the first interface, the composite score result may be displayed at any position in the first interface. For example, it may be displayed in an intermediate position of the first interface. Or the electronic device may display the composite score results in a floating window in the first interface. When the comprehensive scoring result is displayed on the second interface, after the comprehensive scoring result is obtained, the electronic equipment is switched from the first interface to the second interface, and then the comprehensive scoring result is displayed on the second interface. Wherein the composite scoring result is displayed at any position in the second interface; or the electronic device may display the composite score results in a floating window in the second interface.

In this embodiment, the electronic device may play the target video on the first interface; during the process of playing the target video clips of the target video, playing a target animation matched with the target video clips on a first interface, wherein the target video clips are any video clip in a plurality of video clips included in the target video, and the target animation is configured with at least one scoring time point; invoking a camera device to collect user images during the process of playing the target video clip, and displaying the user images on a first interface; determining a scoring result corresponding to each scoring time point according to the user image acquired in a first time range before each scoring time point and the video image matched with the user image acquired in the first time range, and displaying the scoring result corresponding to each scoring time point on a first interface when each scoring time point arrives; the video image is contained in the target video clip; after the target video is played, determining a comprehensive scoring result according to the scoring result corresponding to each scoring time point configured for each animation in the plurality of animations corresponding to the target video, and displaying the comprehensive scoring result on the first interface. By the evaluation determination method provided by the embodiment of the application, the corresponding scoring result can be displayed when each scoring time point in the animation arrives, the scoring result can be displayed in a short time to increase the interestingness of a user, and meanwhile, the scoring of the user in a period of time can be obtained by scoring the user image acquired in the first time range and the video image matched with the user image, so that the user image is not required to be continuously acquired and compared with the video image matched with the user image, the power consumption of the equipment can be reduced, and the resources required for processing the image are saved.

Based on the description of the above embodiments of the score determining method, the embodiments of the present application also disclose a score determining device, which may be a computer program (including program code) running in the above mentioned electronic device. The score determining means may perform the method shown in fig. 2. Referring to fig. 4, the score determining device may operate as follows:

a processing unit 401, configured to play a target video on a first interface;

The processing unit 401 is further configured to play, during playing of a target video segment of the target video, a target animation matching the target video segment on the first interface, where the target video segment is any video segment of a plurality of video segments included in the target video, and the target animation is configured with at least one scoring time point;

the processing unit 401 is further configured to invoke a camera device to collect a user image during playing the target video clip, and display the user image on the first interface;

The processing unit 401 is further configured to determine a scoring result corresponding to each scoring time point according to a user image acquired in a first time range before the scoring time point and a video image matched with the user image acquired in the first time range;

A display unit 402, configured to display, on the first interface, a scoring result corresponding to each scoring time point when the up to each scoring time point arrives; the video image is contained in the target video clip;

The processing unit 401 is further configured to determine, after the playing of the target video is finished, a comprehensive scoring result according to scoring results corresponding to each scoring time point configured for each of a plurality of animations corresponding to the target video; the plurality of animations corresponding to the target video include animations matching each of the plurality of video clips;

the display unit 402 is further configured to display a composite score result on the first interface.

In one embodiment, when determining the scoring result corresponding to each scoring time point according to the user image acquired in the first time range before each scoring time point and the video image matched with the user image acquired in the first time range, the processing unit 401 may be specifically configured to:

Determining a first scoring result corresponding to a target scoring time point in the at least one scoring time point according to the user images acquired in a first time range before the target scoring time point and the video images matched with the user images acquired in the first time range;

If the first scoring result is smaller than or equal to a scoring result threshold, determining a second scoring result corresponding to the target scoring time point according to the user image acquired in a second time range before the target scoring time point and the video image matched with the user image acquired in the second time range, wherein the second time range comprises a first time range;

And determining a scoring result corresponding to the target scoring time point based on the second scoring result and the first scoring result.

In one embodiment, before the playing of the target video on the first interface, the processing unit 401 is further configured to:

acquiring an initial frame rate of the target video and a frame rate of a user image acquired by the camera device;

When the initial frame rate of the target video is inconsistent with the frame rate of the user image acquired by the camera device, updating the initial frame rate of the target video to the frame rate of the user image acquired by the camera device, and obtaining the updated frame rate of the target video;

and determining the updated frame rate of the target video as the frame rate of the target video.

In one embodiment, when determining, for a target scoring time point in the at least one scoring time point, a first scoring result corresponding to the target scoring time point according to a user image acquired in a first time range before the target scoring time point and a video image matched with the user image acquired in the first time range, the processing unit 401 may be specifically configured to:

Aiming at a target scoring time point in the at least one scoring time point, detecting human body joint points of a user image acquired in a first time range before the target scoring time point, and obtaining a first human body joint point detection result of the user image;

Performing human body joint point detection on the video image matched with the user image acquired in the first time range to obtain a second human body joint point detection result of the video image matched with the user image acquired in the first time range;

Constructing a user posture evaluation parameter set according to the first human body joint detection result and the second human body joint detection result, wherein the user posture evaluation parameter set comprises at least one posture evaluation parameter;

And generating a first scoring result corresponding to the target scoring time point according to the gesture evaluation parameter set.

In one embodiment, the user images acquired within the first time range include a target user image, and the video images matching the user images acquired within the first time range include a target video image matching the target user image; the first human body node detection result includes the position information of the plurality of first nodes and each first node in the target user image, the second human body node detection result includes the position information of the plurality of second nodes and each second node in the target video image, and the processing unit 401 may be specifically configured to, when constructing the user posture evaluation parameter set according to the first human body node detection result and the second human body node detection result:

Counting the number of the plurality of first nodes and the number of the plurality of second nodes;

comparing the number of the plurality of first nodes with the number of the plurality of second nodes to obtain a first comparison result, wherein the first comparison result indicates the matching degree between the number of the plurality of first nodes and the number of the plurality of second nodes;

Comparing the position information of each first joint point with the position information of a second joint point corresponding to the first joint point to obtain a second comparison result, wherein the second comparison result indicates the matching degree between the first joint point and the second joint point corresponding to the first joint point;

determining a user gesture standard according to the first comparison result and the second comparison result;

and constructing a user gesture evaluation parameter set comprising the user gesture standard.

In one embodiment, the user image acquired in the first time range includes a plurality of user images, the video image matched with the user image acquired in the first time range includes a plurality of frames of video images, and the processing unit 401 may be specifically configured to, when constructing the user posture evaluation parameter set according to the first human body node detection result and the second human body node detection result:

Drawing a first motion trail of each of a plurality of kinds of joints in the plurality of user images according to a first human joint detection result of each user image in the plurality of user images;

Drawing a second motion track of each joint point in a plurality of joint points in the multi-frame video image according to a second human joint point detection result of each frame of video image in the multi-frame video image;

Matching the first motion trail of each of the plurality of types of joints in each user image with the second motion trail of each of the plurality of types of joints in each frame of video image to obtain motion trail similarity;

Determining the user gesture grace according to the motion trail similarity;

And constructing a gesture evaluation parameter set comprising the gesture grace degree of the user.

In one embodiment, the display unit 402 is further configured to: displaying a second interface, the second interface comprising a plurality of video segments of a target video; in response to an animation configuration operation on a target video clip of the plurality of video clips, displaying a fourth interface, the fourth interface comprising a candidate animation set;

The processing unit 401 is further configured to determine, when an animation selection operation for the candidate animation set is detected, the selected animation as a target animation; at least one scoring time point is configured for the target motion in response to a scoring time configuration operation for the target motion.

In one embodiment, when each scoring time point of the at least one scoring time point arrives, before the first interface displays the scoring result corresponding to the each scoring time point, the processing unit 401 is further configured to:

when an animation interrupt signal is received, pausing playing the target animation, and recording the played time length of the target animation;

and when an animation starting signal is received, determining a playing position based on the played time length, and continuing to play the target animation by taking the playing position as a starting point.

It can be understood that each functional unit of the score determining apparatus of the present embodiment may be specifically implemented according to the method in fig. 2 of the above method embodiment, and the specific implementation process thereof may refer to the related description in fig. 2 of the above method embodiment, which is not repeated herein.

Further, referring to fig. 5, fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device in the embodiment corresponding to fig. 2 may be the electronic device shown in fig. 5. As shown in fig. 5, the electronic device may include: the processor 501 and the memory 504, the electronic device may optionally further comprise an input device 502 and an output device 503, and the electronic device may optionally further comprise an image pick-up means. The processor 501, the input device 502, the output device 503, and the memory 504 are connected via a bus 505. The memory 504 is used for storing a computer program comprising program instructions, and the processor 501 is used for executing the program instructions stored by the memory 504.

In an embodiment of the present application, the processor 501 performs the following operations by executing executable program code in the memory 504:

playing a target video on a first interface;

Determining a scoring result corresponding to each scoring time point according to a user image acquired in a first time range before each scoring time point and a video image matched with the user image acquired in the first time range, and displaying the scoring result corresponding to each scoring time point on the first interface when each scoring time point arrives; the video image is contained in the target video clip;

In one embodiment, when determining the scoring result corresponding to each scoring time point according to the user image acquired in the first time range before each scoring time point and the video image matched with the user image acquired in the first time range, the processor 501 may be specifically configured to:

In one embodiment, before the first interface plays the target video, the processor 501 is further configured to:

In one embodiment, when determining, for a target scoring time point in the at least one scoring time point, a first scoring result corresponding to the target scoring time point according to a user image acquired in a first time range before the target scoring time point and a video image matched with the user image acquired in the first time range, the processor 501 may be specifically configured to:

Aiming at a target scoring time point in the at least one scoring time point, detecting human body joint points of the user images acquired in a first time range before the target scoring time point, and obtaining a first human body joint point detection result of the user images acquired in the first time range;

Detecting human body joint points of the video images matched with the user images acquired in the first time range, and obtaining a second human body joint point detection result of the video images matched with the user images acquired in the first time range;

In one embodiment, the user images acquired within the first time range include a target user image, and the video images matching the user images acquired within the first time range include a target video image matching the target user image; the first human body node detection result includes the position information of the plurality of first nodes and each first node in the target user image, the second human body node detection result includes the position information of the plurality of second nodes and each second node in the target video image, and the processor 501 may be specifically configured to, when constructing the user posture evaluation parameter set according to the first human body node detection result and the second human body node detection result:

In one embodiment, the user image acquired in the first time range includes a plurality of user images, the video image matched with the user image acquired in the first time range includes a plurality of frames of video images, and the processor 501 may be specifically configured to, when constructing the user posture evaluation parameter set according to the first human body node detection result and the second human body node detection result:

Determining the user gesture grace according to the motion trail similarity;

In one embodiment, the processor 501 is further configured to:

Displaying a second interface, the second interface comprising a plurality of video segments of a target video;

In response to an animation configuration operation on a target video clip of the plurality of video clips, displaying a fourth interface, the fourth interface comprising a candidate animation set;

When an animation selection operation for the candidate animation set is detected, determining the selected animation as a target animation;

At least one scoring time point is configured for the target motion in response to a scoring time configuration operation for the target motion.

In one embodiment, when each scoring time point of the at least one scoring time point arrives, before the first interface displays the scoring result corresponding to each scoring time point, the processor is further configured to:

It should be appreciated that in embodiments of the present application, the Processor 501 may be a central processing unit (Central Processing Unit, CPU), the Processor 501 may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL processors, DSPs), application SPECIFIC INTEGRATED Circuits (ASICs), off-the-shelf Programmable gate arrays (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 504 may include read only memory and random access memory and provide instructions and data to the processor 501. A portion of memory 504 may also include non-volatile random access memory.

In specific implementation, the processor 501, the input device 502, the output device 503, and the memory 504 described in the embodiments of the present application may perform the implementation described in all the embodiments above, or may perform the implementation described in the apparatus above, which is not described herein again.

Embodiments of the present application provide a computer readable storage medium storing a computer program comprising program instructions which, when executed by a processor, perform the steps performed in all the embodiments described above.

Embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium, which, when executed by a processor of an electronic device, perform the method of all embodiments described above.

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random-access Memory (Random Access Memory, RAM), or the like. It is emphasized that the scoring results referred to above may also be stored in nodes of a blockchain in order to further ensure privacy and security of the data. The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The blockchain (Blockchain), essentially a de-centralized database, is a string of data blocks that are generated in association using cryptographic methods, each of which contains information from a batch of network transactions for verifying the validity (anti-counterfeit) of its information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.

The above disclosure is only a preferred embodiment of the present application, and it should be understood that the scope of the application is not limited thereto, and those skilled in the art will appreciate that all or part of the procedures described above can be performed according to the equivalent changes of the claims, and still fall within the scope of the present application.

Claims

1. A score determining method, comprising:

Determining the time length required to be spent from invoking the camera device to acquire the first user image to displaying the first user image, and adjusting internal parameters in the camera device according to the time length required to be spent;

playing a target video on a first interface;

Calling the camera device with the adjusted internal parameters to collect user images during the process of playing the target video clip, and displaying the user images on the first interface;

determining a scoring result corresponding to the target scoring time point based on the second scoring result and the first scoring result;

When each scoring time point arrives, displaying a scoring result corresponding to each scoring time point on the first interface, wherein the video image is contained in the target video clip;

After the target video is played, determining a comprehensive scoring result according to scoring results corresponding to scoring time points configured for each of a plurality of animations corresponding to the target video, and displaying the comprehensive scoring result on the first interface, wherein the plurality of animations corresponding to the target video comprise animations matched with each of the plurality of video clips;

the method for determining the first scoring result corresponding to the target scoring time point according to the user image acquired in the first time range before the target scoring time point and the video image matched with the user image acquired in the first time range aiming at the target scoring time point in the at least one scoring time point comprises the following steps:

aiming at a target scoring time point in the at least one scoring time point, detecting human body joint points of the user image acquired in a first time range before the target scoring time point, and obtaining a first human body joint point detection result of the user image acquired in the first time range;

Generating a first scoring result corresponding to the target scoring time point according to the gesture evaluation parameter set;

The user images collected in the first time range include a plurality of user images, the video images matched with the user images collected in the first time range include a plurality of frames of video images, a user gesture evaluation parameter set is constructed according to the first human body joint detection result and the second human body joint detection result, and the method comprises the following steps:

Determining the user gesture grace according to the motion trail similarity;

2. The method of claim 1, wherein prior to playing the target video at the first interface, the method further comprises:

3. The method of claim 1, wherein the user images acquired within the first time frame comprise target user images, and the video images that match the user images acquired within the first time frame comprise target video images that match the target user images; the first human body node detection result includes position information of a plurality of first nodes and each first node in the target user image, the second human body node detection result includes position information of a plurality of second nodes and each second node in the target video image, and the constructing a user gesture evaluation parameter set according to the first human body node detection result and the second human body node detection result includes:

4. The method according to claim 1, wherein the method further comprises:

at least one scoring time point is configured for the target animation in response to a scoring time configuration operation for the target animation.

5. A score determining device, characterized by comprising:

the processing unit is used for determining the time length required for calling the camera device to acquire the first user image and displaying the first user image, and adjusting the internal parameters in the camera device according to the time length required to be consumed;

The processing unit is further used for playing the target video on the first interface;

the processing unit is further used for calling the camera device with the adjusted internal parameters to collect user images during the process of playing the target video clip, and displaying the user images on the first interface;

The processing unit is further configured to determine, for a target scoring time point in the at least one scoring time point, a first scoring result corresponding to the target scoring time point according to a user image acquired in a first time range before the target scoring time point and a video image matched with the user image acquired in the first time range;

The processing unit is further configured to determine a second scoring result corresponding to the target scoring time point according to the user image acquired in a second time range before the target scoring time point and the video image matched with the user image acquired in the second time range if the first scoring result is less than or equal to a scoring result threshold, where the second time range includes the first time range; determining a scoring result corresponding to the target scoring time point based on the second scoring result and the first scoring result;

the display unit is further used for displaying the comprehensive scoring result on the first interface;

The processing unit is specifically configured to, when determining, for a target scoring time point in the at least one scoring time point, a first scoring result corresponding to the target scoring time point according to a user image acquired in a first time range before the target scoring time point and a video image matched with the user image acquired in the first time range:

The user images acquired in the first time range comprise a plurality of user images, the video images matched with the user images acquired in the first time range comprise multi-frame video images, and the processing unit is specifically used for constructing a user gesture evaluation parameter set according to the first human body joint detection result and the second human body joint detection result:

Determining the user gesture grace according to the motion trail similarity;

6. An electronic device, comprising:

a memory for storing a computer program;

A processor invoking said computer program in said memory for performing the score determination method according to any of claims 1-4.

7. A computer storage medium storing a computer program comprising program instructions which, when executed by a processor, perform the score determination method of any one of claims 1-4.