CN119204085B

CN119204085B - Multi-source video data robot skill learning method and system

Info

Publication number: CN119204085B
Application number: CN202411699867.0A
Authority: CN
Inventors: 张博源; 张振亮
Original assignee: Beijing General Artificial Intelligence Research Institute
Current assignee: Beijing General Artificial Intelligence Research Institute
Priority date: 2024-11-26
Filing date: 2024-11-26
Publication date: 2025-02-18
Anticipated expiration: 2044-11-26
Also published as: CN119204085A

Abstract

The invention provides a multi-source video data robot skill learning method and system, which comprises the steps of automatically collecting example videos related to a skill through an example video collecting module according to a movement skill text description, carrying out data expansion to obtain movement example video data, constructing a virtual robot and a virtual camera, instantiating, combining an intelligent agent with a robot control strategy and a camera mirror strategy, generating and recording robot movement video recording data, constructing a video intelligent scoring model through a movement skill video scoring module, generating a scoring result for the robot movement video recording data, setting a reward feedback collaborative optimization robot control strategy and a camera mirror strategy of a neural network scoring model through an intelligent agent learning module, and updating the robot control strategy and the camera mirror strategy into the intelligent agent.

Description

Multi-source video data robot skill learning method and system

Technical Field

The invention relates to the technical field of machine learning intelligent iteration data processing information transmission control, in particular to a multi-source video data robot skill learning method and system.

Background

Skill learning of virtual robots aims at providing them with a variety of motor skill keywords (e.g., walking, running, grabbing objects, etc.) that can be used to generate robot character animations that conform to human motor patterns and to physical laws. Since the motion process of a robot often requires joint control of multiple joints, and collision interaction with the environment during the motion process is often not differentiable, the existing methods generally adopt reinforcement learning technology to learn the control strategy of the robot. The training signal of the reinforcement learning method comes from reward feedback obtained after the robot executes a certain action, and the control strategy of the robot is gradually adjusted according to the feedback so as to obtain higher expected reward. The simulation learning method provides a simple and effective reward calculation method, namely, a reward is calculated based on the similarity between a robot action sequence and an example action sequence (usually from a human example), wherein the higher the similarity is, the larger the reward is, and conversely, the lower the similarity is, the smaller the reward is;

The similarity between the robot motion sequence and the example motion sequence is calculated by two common calculation methods, namely, based on tracking errors, namely, calculating an average value of L2 distances between each joint of the robot and the example individual joint at corresponding moments, and based on a distance between distributions, namely, measuring the distance between state transition distribution generated when the robot moves and state transition distribution generated by the example individual, and particularly, the method can be used for quantitatively estimating through a discriminator in an countermeasure learning method. The premise of calculating the two rewards is to obtain three-dimensional motion tracks of each joint of an example individual, and the data are often recovered through a motion capture device or a motion reconstruction algorithm. Existing skill learning methods based on video data typically involve two steps, first reconstructing a sequence of motion poses of an example individual from an example video, and then calculating a reward signal for the robot based on tracking errors or inter-distribution distances. Because the motion capturing and motion reconstructing methods have high requirements on training video acquisition, most of the existing video skill learning methods are only suitable for video with fixed visual angle recording and small character movement. Therefore, it is an important research challenge to improve the existing video skill learning method to better utilize massive internet data, the existing video skill learning method generally relies on a three-dimensional pose sequence to train, namely, a reward signal is calculated in a three-dimensional space to assist a robot to learn, however, the three-dimensional pose sequence is usually recovered by means of a complex motion capturing device or a motion reconstruction algorithm, both the two methods have high requirements on the acquisition process of a training video, the limitation makes the existing algorithm difficult to effectively utilize massive motor skills on the internet to display the video, so that the problem that the robot can learn skills is limited still to be solved, and therefore, a multi-source video data robot skill system and method are necessary to be provided to at least partially solve the problems in the prior art.

Disclosure of Invention

The summary of the invention is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to identify the scope of the claimed subject matter, since a series of concepts in a simplified form are included in the summary of the invention, which is described in further detail in the detailed description.

To at least partially solve the above problems, the present invention provides a multi-source video data robot skill learning method, comprising:

S100, automatically collecting example videos related to the skills according to the motor skill text description through an example video collecting module, and performing data expansion to obtain motor example video data;

s200, constructing and instantiating a virtual robot and a virtual camera, cooperating a robot control strategy and a camera mirror-moving strategy, combining an intelligent body, and generating and recording robot motion video recording data;

S300, constructing an intelligent video scoring model through a motor skill video scoring module, and generating a scoring result of robot motion video recording data;

s400, setting a reward feedback collaborative optimization robot control strategy and a camera lens strategy of a neural network scoring model through an agent learning module, and updating the reward feedback collaborative optimization robot control strategy and the camera lens strategy into the agent.

Preferably, S100 includes:

s101, setting a motor skill text description, and analyzing and extracting motor skill keywords in the motor skill text description from the motor skill text description by combining a keyword extraction algorithm with a large language model;

s102, performing label content aggregation on the motor skill keywords to obtain motor skill labels;

And S103, according to the motor skill keywords and the motor skill labels, gathering the motor example video and performing data expansion to obtain motor example video data and motor video expansion data.

Preferably, S200 includes:

s201, virtual modeling of the robot is carried out according to robot parameter information by using a simulation experiment platform, and a virtual robot with a virtual structure formed by combining a plurality of rigid bodies is constructed;

S202, modeling a camera according to camera parameter information, constructing a pinhole model camera without collision attribute, and obtaining a virtual camera;

and S203, instantiating the virtual robot and the virtual camera, combining the instantiated virtual robot and virtual camera into an intelligent body in cooperation with a robot control strategy and a camera mirror operation strategy, performing virtual robot control and motion video recording, generating and recording a robot motion video captured by a camera in a simulation environment, acquiring robot motion video recording data, and transmitting the robot motion video recording data to a motion skill video scoring module.

Preferably, S300 includes:

s301, extracting and storing characteristics of robot motion video recording data, robot motion example video data and motion video expansion data;

s302, a neural network scoring model is built, feature similarity of the robot motion video recording data, the robot motion example video data and the motion video expansion data is compared, and a motion video recording data scoring result is generated.

Preferably, S400 includes:

S401, the intelligent agent learning module acquires an optimized robot control strategy and an optimized camera lens strategy by cooperating with an optimized robot control strategy and a camera lens strategy through rewarding feedback of a neural network scoring model based on scoring result feedback of motion video recording data of the neural network scoring model;

S402, according to the optimized robot control strategy and the optimized camera lens-transporting strategy, iteratively optimizing the intelligent agent strategy, and updating the intelligent agent strategy into the virtual robot and the virtual camera combined intelligent agent.

The invention provides a multi-source video data robot skill system, comprising:

The video collection data expansion subsystem is used for automatically collecting example videos related to the skills according to the motor skill text description through the example video collection module and carrying out data expansion to obtain motor example video data;

Virtually constructing a control strategy subsystem, constructing a virtual robot and a virtual camera, instantiating, cooperating with a robot control strategy and a camera mirror strategy, combining an intelligent body, and generating and recording robot motion video recording data;

The intelligent motor skill scoring system constructs an intelligent video scoring model through a motor skill video scoring module to generate scoring results of robot motion video recording data;

And the strategy optimization agent updating subsystem is used for setting a reward feedback collaborative optimization robot control strategy and a camera lens-operating strategy of the neural network scoring model through an agent learning module and updating the strategy control strategy and the camera lens-operating strategy into the agent.

Preferably, the video gathering data expansion subsystem comprises:

the keyword extraction analysis unit is used for setting a motor skill text description, and analyzing and extracting motor skill keywords in the motor skill text description from the motor skill text description by combining a keyword extraction algorithm with a large language model;

the label content aggregation unit is used for carrying out label content aggregation on the motor skill keywords to obtain motor skill labels;

and the video tag expansion unit is used for collecting the motion example video and carrying out data expansion according to the motion skill keywords and the motion skill tags to obtain the motion example video data and the motion video expansion data.

Preferably, the virtual construction control strategy subsystem comprises:

The robot structure modeling unit is used for performing virtual modeling on the robot according to the robot parameter information by using a simulation experiment platform, and constructing a virtual robot with a virtual structure formed by combining a plurality of rigid bodies;

The virtual camera modeling unit is used for carrying out camera modeling according to the camera parameter information, constructing a pinhole model camera without collision attribute and obtaining a virtual camera;

The virtual robot and the virtual camera are instantiated, the virtual robot control strategy and the camera mirror strategy are cooperated, the instantiated virtual robot and virtual camera are combined into an intelligent body, virtual robot control and motion video recording are carried out, a robot motion video captured by a camera in a simulation environment is generated and recorded, robot motion video recording data are obtained, and the robot motion video recording data are transmitted to a motion skill video scoring module.

Preferably, the motor skills intelligence scoring subsystem comprises:

The feature extraction storage unit is used for extracting and storing features of robot motion video recording data, robot motion example video data and motion video expansion data;

And the scoring model feature scoring unit is used for constructing a neural network scoring model, comparing the feature similarity of the robot motion video recording data, the robot motion example video data and the motion video expansion data, and generating a motion video recording data scoring result.

Preferably, the policy optimization agent update subsystem comprises:

The intelligent learning optimization unit is used for acquiring an optimized robot control strategy and an optimized camera lens operation strategy by cooperatively optimizing the robot control strategy and the camera lens operation strategy through rewarding feedback of the neural network scoring model based on scoring result feedback of the motion video recording data of the neural network scoring model;

and the agent strategy iteration updating unit is used for iteratively optimizing the agent strategy according to the optimized robot control strategy and the optimized camera lens-operating strategy, and updating the agent strategy into the virtual robot and virtual camera combined agent.

Compared with the prior art, the invention at least comprises the following beneficial effects:

The invention discloses a multi-source video data robot skill learning method and system, wherein an example video collecting module is used for automatically collecting example videos related to skills and expanding data according to a motion skill text description to obtain motion example video data, a virtual robot and a virtual camera are constructed and instantiated, a robot motion video recording data is generated and recorded by combining a robot control strategy and a camera mirror strategy, a video intelligent scoring module is constructed to generate scoring results for the robot motion video recording data, a neural network scoring module is arranged through the intelligent body learning module, a reward feedback collaborative optimization robot control strategy and a camera mirror strategy of the neural network scoring module are set and updated into the intelligent body, the skill learning method based on the multi-source video data is provided, and aims to enable the virtual robot to learn a unified motion control strategy from the multi-source video example. The method simplifies the complex flow of the traditional method in the training data acquisition and preprocessing stage, improves the processing efficiency of training data, is of a multi-source video data type, models the skill learning problem based on the multi-source video data as a multi-agent reinforcement learning problem, trains a camera lens-operating strategy to assist the robot in finishing skill learning while training a robot motion control strategy, can naturally display the skill learning result of a virtual robot according to the self recording track of the robot without manual control, is a method for learning uniform motion skill keywords from the multi-source video data, combines the motion control of the robot with the camera lens-operating strategy, realizes a multi-source motion skill keyword learning method based on the multi-source video data, monitors the virtual robot to acquire new motion skill keywords through imitation learning, can naturally display the video by utilizing network data learning in various ways because the method does not require to acquire training videos in a specific mode, and can also display the natural skill keywords for the robot motion skill learning strategy while the method is used for intuitively displaying the natural skill training strategy of the robot.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:

Fig. 1 is a diagram of a multi-source video data robot skill learning system according to an embodiment of the invention.

Fig. 2 is a diagram of an embodiment of a multi-source video data robot skill learning method according to the present invention.

Fig. 3 is a diagram illustrating an exemplary embodiment of a multi-source video data robot skill learning method and system according to the present invention.

Detailed Description

The invention is further described in detail below with reference to the drawings and examples to enable one skilled in the art to practice the invention, and as shown in the drawings, the invention provides a multi-source video data robot skill learning method comprising:

The technical scheme has the principle and effects that the multi-source video data robot skill learning method comprises the steps of automatically collecting example videos related to skills according to a skill text description through an example video collecting module and expanding data, obtaining motion example video data, constructing a virtual robot and a virtual camera and instantiating, combining an intelligent body with a camera mirror strategy to generate and record robot motion video recording data, constructing a video intelligent scoring model through a skill video scoring module to generate scoring results for the robot motion video recording data, setting a reward feedback collaborative optimization robot control strategy and a camera mirror strategy of a neural network scoring model through an intelligent body learning module, and updating the strategy into the intelligent body, wherein the skill learning method based on the multi-source video data aims to enable a virtual robot to learn a unified motion control strategy from the multi-source video example, training the robot mirror strategy to record the robot, recording the scoring auxiliary motion video recording data through the scoring result, and avoiding the problem that the training method does not need to be used for preprocessing the video data in a massive scale, and the problem that the training of the strategy is not required to be used for training the video data is complex, and the training of the multi-source video data is not needed to be processed, and the problem is solved. The method simplifies the complex flow of the traditional method in the training data acquisition and preprocessing stage, improves the processing efficiency of training data, multi-source video data types, models the skill learning problem based on the multi-source video data as a multi-agent reinforcement learning problem, trains a camera lens strategy to assist the robot to complete skill learning while training a robot motion control strategy, the camera lens strategy trained by the method can adjust the recording track of the robot according to the motion track of the robot without manual control, naturally displays the skill learning result of the virtual robot, learns unified motion skill keywords from the multi-source video data, combines the method of training the robot motion control and the camera lens strategy, realizes a multi-source motion skill keyword learning method based on the multi-source video data, monitors the virtual robot to acquire new motion skill keywords through imitation learning, acquires training videos in a specific mode because the method does not require acquisition of a specific mode, can naturally display the skill learning result of the virtual robot according to the motion track, simultaneously gathers a video learning result for a large-scale video image through a video training strategy by a video training module 1, can also visually gather the video game content of the training strategy for the robot, and can display a video sample by capturing the video game content of the video through a motion map of the training module, the method comprises the steps of analyzing and judging motion keyword information of a motion example video, wherein the motion keyword information comprises a motion description keyword and a motion action characteristic keyword, the motion description keyword is subjected to video text description analysis through text description content and subtitle text description content of the motion example video to obtain a video text description analysis result, the video text description analysis result is compared with the motion action characteristic keyword according to the video text description analysis result to judge whether the video text description is consistent with the motion action characteristic, motion example video data are transmitted to a skill video scoring module, the skill video scoring module parallelly receives robot motion video recording data recorded by a skill video recording module in a 3D simulation environment system, a virtual robot and a virtual camera are combined to construct a combined intelligent body, the combined intelligent body records a virtual robot video to obtain robot motion video recording data, the robot motion video recording data are transmitted to the skill video scoring module, the skill video scoring module scores the motion example video data and the robot motion video recording data, the robot motion video data scoring information is transmitted to the intelligent body, the intelligent body learning intelligent body is continuously trained according to the training environment, and the training environment is continuously optimized.

In one embodiment, S100 comprises:

The principle and effect of the technical proposal are that setting a text description of the motor skills; the method comprises the steps of analyzing and extracting a motor skill keyword in a motor skill text description through a keyword extraction algorithm in combination with a large language model, carrying out tag content aggregation on the motor skill keyword to obtain a motor skill tag, collecting motor example video and data expansion according to the motor skill keyword and the motor skill tag, obtaining motor example video and motor video expansion data, collecting motor example video and data expansion according to the motor skill keyword and the motor skill tag, collecting example video according to the motor skill keyword and the motor skill tag, obtaining collected motor example video through data capture, carrying out motor tag matching on the motor skill tag and a reference motor tag in the existing motor data set, taking out the motor video with the matching degree higher than the set reference matching degree, carrying out data expansion, obtaining motor example video data and motor video expansion data, transmitting the motor example video data and the motor video expansion data to a skill video scoring module, analyzing the text of the skill description, building a simulation environment and instantiating a machine, initializing/updating a motion mirror and a motion mirror, optimizing and a motion mirror according to a training strategy, and optimizing the training strategy by combining the intelligent model with the intelligent model of the motion mirror according to the analysis model.

In one embodiment, S200 includes:

The principle and the effect of the technical scheme are that a simulation experiment platform is utilized to carry out virtual modeling on the robot according to the parameter information of the robot, and a virtual robot with a virtual structure formed by combining a plurality of rigid bodies is constructed; modeling the camera according to the camera parameter information, constructing a pinhole model camera without collision attribute, and obtaining a virtual camera; instantiating a virtual robot and a virtual camera, cooperating a robot control strategy and a camera operation strategy, combining the instantiated virtual robot and virtual camera into an intelligent body, carrying out virtual robot control and motion video recording, generating and recording a robot motion video captured by a camera in a simulation environment, acquiring robot motion video recording data, transmitting the robot motion video recording data to a motion skill video scoring module, carrying out virtual modeling on the robot according to robot parameter information by using a simulation experiment platform, constructing a virtual robot 211 with a plurality of rigid body combined virtual structures by using the simulation experiment platform, constructing a plurality of rigid body combined virtual structures by using the robot parameter information, connecting all parts of the virtual structures through virtual joints, using a rotation joint with 1 degree of freedom for knee joints and elbow joints, setting all other joints into spherical joints with 3 degrees of freedom, setting robot entity virtual association parameters including a robot total mass of 45 kg, a robot height of 1.62 m, a robot state space 197 for the robot, a space dimension for the robot, a network control strategy for the robot, and inputting the robot state space dimension for the robot control strategy for the network model control strategy, the method comprises the steps of outputting torque values for controlling joints, carrying out control strategy modeling through a fully-connected neural network, obtaining a robot control strategy network model through input and output dimensions of 197 and 36 respectively, wherein a training scene is a ground surface which is randomly initialized, the robot makes different actions on the ground surface, and a plurality of grid types and a plurality of physical parameters are arranged on the ground surface during initialization so as to improve the robustness of the robot control strategy;

According to camera parameter information, camera modeling is carried out, a pinhole model camera without collision attribute is built, virtual camera 212 is obtained, internal parameter of the camera is randomly sampled and generated within a certain range to cover the internal parameter range of the camera used when multi-source video data are collected, a camera lens operation strategy network model is built, input of the camera lens operation strategy network model comprises world pose of the camera and the robot in a plurality of frames in the past, relative pose of the camera and the robot, and projected pixel coordinates of a robot joint in a picture, output of the camera lens operation strategy network model is a camera next operation command, the next operation command is decomposed into a lens operation rotation command and a lens operation displacement command, virtual robot and a virtual camera are instantiated, the instantiated virtual robot and the instantiated virtual camera lens operation strategy are combined into an intelligent object, virtual robot control and motion video recording comprises the first instantiation of the camera and the robot in a plurality of frames, the robot control network strategy and the camera lens operation strategy network model is optimized, the network strategy is accurately studied from the network model, and the network model of the camera is accurately controlled from the network model, and the network strategy of the network model is accurately studied.

In one embodiment, S300 includes:

The principle and the effect of the technical scheme are that the characteristic extraction and storage are carried out on the robot motion video recording data, the robot motion example video data and the motion video expansion data;

The method comprises the steps of constructing a neural network scoring model, comparing characteristic similarity of robot motion video recording data with robot motion example video data and motion video expansion data, wherein the input of the neural network scoring model is the projection characteristic of two frames of state transitions before and after a robot on an image plane, the output of the neural network scoring model is a scoring value between 0 and 1, the state transitions sampled from example data distribution are trained to output 1, the state transitions recorded in the motion process of the robot are trained to output 0, the neural network scoring model is judged on the image space, the optical flow information is extracted from projection coordinates or images of the robot joints on the image plane, the scoring model is realized by using a fully connected neural network, the input is an observation vector with 52 dimensions, the observation vector comprises the projection coordinates (13 x 2) of the image of each joint of the robot and the optical flow information (13 x 2) of the projection position, the tensor with 1 dimension is input after a plurality of layers of fully connected layers, and the score between 0 and 1 is output through a moid activation function.

In one embodiment, S400 includes:

The principle and the effect of the technical scheme are that the intelligent agent learning module acquires an optimized robot control strategy and an optimized camera lens-operating strategy by cooperating with the optimized robot control strategy and the camera lens-operating strategy through the reward feedback of the neural network scoring model based on the scoring result feedback of the motion video recording data of the neural network scoring model;

the method comprises the steps of optimizing a robot control strategy and optimizing a camera lens-operating strategy, iteratively optimizing an agent strategy, updating the agent strategy to a virtual robot and virtual camera combined agent, introducing a reward signal to assist algorithm convergence in the initial stage of optimizing the robot control strategy and optimizing the camera lens-operating strategy, realizing a more robust training effect, defining a strategy network and a reward function, updating the strategy network by adopting a reinforcement learning algorithm, and using the updated camera lens-operating strategy and the updated robot control strategy for carrying out new data collection and iterative training so as to iteratively optimize the agent strategy, wherein the strategy network and the reward function are defined to prevent the robot from falling down, limit the action amplitude of the robot and ensure that the camera can record the reward to the whole body or part of the trunk of the robot all the time.

The principle and effect of the technical scheme are that the invention provides a multi-source video data robot skill system, which comprises a video collecting data expansion subsystem, a virtual construction control strategy subsystem, a virtual robot and virtual camera collaborative control strategy and camera fortune mirror strategy and an intelligent agent combination, wherein the video collecting data expansion subsystem automatically collects example videos related to skills according to a movement skill text description through an example video collecting module and expands the data, the virtual construction control strategy subsystem is used for constructing a virtual robot and a virtual camera and instantiating, the virtual robot control strategy and the camera fortune mirror strategy are cooperated and combined to generate and record robot movement video recording data, the movement skill intelligent evaluation subsystem is used for constructing a video intelligent scoring model through a movement skill video scoring module and generating scoring results for the robot movement video recording data, the strategy optimization intelligent agent updating subsystem is used for setting feedback of a neural network scoring model to cooperatively optimize the robot control strategy and the camera fortune mirror strategy and updating the strategy into the intelligent agent, the virtual robot is used for learning unified movement control strategy based on the multi-source video data, the virtual robot is used for learning a uniform movement control strategy from a multi-source video example, the method is not used for carrying out the training strategy, the training of the video training strategy is not needed to be completely different from the multi-source video example, the complex exercise data is not needed, the problem is solved by the training method is solved, the three-dimensional training data is not needs to be effectively trained, and the three-dimensional movement data is not has been processed, enabling a virtual robot to learn unified motor skills keywords from massive example videos of different sources. The method simplifies the complex flow of the traditional method in the training data acquisition and preprocessing stage, improves the processing efficiency of training data, multi-source video data types, models the skill learning problem based on the multi-source video data as a multi-agent reinforcement learning problem, trains a camera lens strategy to assist the robot to complete skill learning while training a robot motion control strategy, the camera lens strategy trained by the method can adjust the recording track of the robot according to the motion track of the robot without manual control, naturally displays the skill learning result of the virtual robot, learns unified motion skill keywords from the multi-source video data, combines the method of training the robot motion control and the camera lens strategy, realizes a multi-source motion skill keyword learning method based on the multi-source video data, monitors the virtual robot to acquire new motion skill keywords through imitation learning, acquires training videos in a specific mode because the method does not require acquisition of a specific mode, can naturally display the skill learning result of the virtual robot according to the motion track, simultaneously gathers a video learning result for a large-scale video image through a video training strategy by a video training module 1, can also visually gather the video game content of the training strategy for the robot, and can display a video sample by capturing the video game content of the video through a motion map of the training module, the method comprises the steps of analyzing and judging motion keyword information of a motion example video, wherein the motion keyword information comprises a motion description keyword and a motion action characteristic keyword, the motion description keyword is subjected to video text description analysis through text description content and subtitle text description content of the motion example video to obtain a video text description analysis result, the video text description analysis result is compared with the motion action characteristic keyword according to the video text description analysis result to judge whether the video text description is consistent with the motion action characteristic, motion example video data are transmitted to a skill video scoring module, the skill video scoring module parallelly receives robot motion video recording data recorded by a skill video recording module in a 3D simulation environment system, a virtual robot and a virtual camera are combined to construct a combined intelligent body, the combined intelligent body records a virtual robot video to obtain robot motion video recording data, the robot motion video recording data are transmitted to the skill video scoring module, the skill video scoring module scores the motion example video data and the robot motion video recording data, the robot motion video data scoring information is transmitted to the intelligent body, the intelligent body learning intelligent body is continuously trained according to the training environment, and the training environment is continuously optimized.

In one embodiment, a video gathering data expansion subsystem, comprising:

The principle and effect of the technical scheme are that the video collection data expansion subsystem comprises a keyword extraction and analysis unit and a motor skill text description; the method comprises the steps of analyzing and extracting a motor skill keyword in a motor text description by a keyword extraction algorithm in combination with a large language model, carrying out tag content aggregation on the motor skill keyword to obtain a motor skill tag, collecting a video tag expansion unit, collecting motor example videos and carrying out data expansion according to the motor skill keyword and the motor skill tag to obtain motor example video data and motor video expansion data, collecting motor example videos and carrying out data expansion according to the motor skill keyword and the motor skill tag, collecting the example videos through data capture, obtaining collected motor example videos, carrying out motor tag matching on the motor skill tag and a reference motor tag in the existing motor data set, carrying out data expansion on the motor videos with the matching degree higher than the set reference matching degree, transmitting the motor example video data and the motor video expansion data to a skill video scoring module, collecting example video data conforming to the description through analysis of the text of the skill description, constructing a simulation environment and initializing a robot and an updating machine, controlling a video scoring mirror and an intelligent training strategy and optimizing the model based on the intelligent model of the motor example video and the intelligent model, and updated into the simulation environment.

In one embodiment, the virtual build control strategy subsystem comprises:

The principle and effect of the technical scheme are that the virtual construction control strategy subsystem comprises a robot structure modeling unit, a virtual camera modeling unit, an instantiation agent combination unit, a virtual robot 211, and a simulation experiment platform, wherein the robot structure modeling unit is used for performing virtual modeling on a robot according to robot parameter information to construct a virtual robot with a virtual structure composed of a plurality of rigid bodies, the virtual camera modeling unit is used for performing camera modeling according to the camera parameter information to construct a pinhole model camera without collision attribute to obtain a virtual camera, the instantiation agent combination unit is used for instantiating the virtual robot and the virtual camera, the virtual robot control strategy and the camera mirror strategy are cooperated, the instantiated virtual robot and the virtual camera are combined to form an intelligent agent, virtual robot control and motion video recording are performed, the robot motion video captured by a camera in a simulation environment is generated and acquired, the motion video recording data of the robot is transmitted to a motion skill video scoring module, the virtual robot is constructed by the virtual robot 211 composed of the plurality of rigid bodies according to the robot parameter information by using the simulation experiment platform, the virtual robot modeling is performed by constructing the virtual robot with the virtual structure composed of a plurality of rigid bodies according to the robot parameter information, the method comprises the steps of connecting each part of a virtual structure through virtual joints, using a rotating joint with 1 degree of freedom for knee joints and elbow joints, setting the other joints as spherical joints with 3 degrees of freedom, performing control strategy modeling through a fully connected neural network, wherein each robot role comprises 13 joints and 34 degrees of freedom, setting virtual association parameters of a robot entity, wherein each virtual association parameter of the robot entity comprises 45 kg of total mass of the robot, 1.62 m of height of the robot, 197D of state space dimension of the robot, 36D of motion space dimension of the robot, constructing a robot control strategy network model, wherein the robot control strategy network model comprises taking state information of the robot as input and outputting torque values for controlling each joint, the state information of the robot comprises spatial positions and motion speeds of each joint of the robot, performing control strategy modeling through the fully connected neural network, wherein the input and output dimensions are 197 and 36 respectively, obtaining a robot control strategy network model, training the scene is a ground surface which is initialized randomly, and the grid of the robot makes different motions on the ground, and is provided with various types and physical parameters during the initialization of the ground grid so as to improve the robustness of robot control strategy, and various types comprise flatness and the physical parameters and various grid parameters and various coefficient of friction coefficients;

In one embodiment, the motor skills intelligence scoring subsystem includes:

The intelligent scoring subsystem for the motor skills comprises a feature extraction and storage unit, a feature extraction and storage unit and a feature extraction and storage unit, wherein the feature extraction and storage unit is used for extracting and storing robot motion video recording data, robot motion example video data and motion video expansion data; the scoring model feature scoring unit is used for constructing a neural network scoring model, comparing the feature similarity of the robot motion video recording data with the robot motion example video data and the motion video expansion data to generate a motion video recording data scoring result, constructing a neural network scoring model, and comparing the feature similarity of the robot motion video recording data with the robot motion example video data and the motion video expansion data, wherein the input of the neural network scoring model is the projection feature of the state transition of the front frame and the rear frame of the robot on an image plane, the output of the neural network scoring model is a scoring value between 0 and 1, the model is trained to output 1 for the state transition sampled from the example data distribution, the model is trained to output 0 for the state transition recorded in the motion process of the robot, the neural network scoring model is judged on an image space, the scoring model is realized by using a fully connected neural network, the input is an observation vector with 52 dimensions, the image projection coordinate (13 x 2) of each joint of the robot and the information (13 x 2) of the projection position, the model is a plurality of optical flow 1 layers are connected in a layer, and the optical flow 1 is activated through a plurality of layers, and the optical flow 1 is activated, and the optical flow 1 is scored through a plurality of layers, and the optical flow 1 is activated.

In one embodiment, a policy optimization agent update subsystem includes:

The principle and effect of the technical scheme are that the strategy optimization intelligent agent updating subsystem comprises an intelligent agent learning optimizing unit, an intelligent agent learning module, an intelligent agent strategy iteration updating unit, an intelligent agent control strategy and an intelligent agent control strategy updating unit, wherein the intelligent agent learning optimizing unit comprises an intelligent agent learning optimizing unit, the intelligent agent learning module is used for carrying out iterative optimization intelligent agent strategy updating according to the optimizing robot control strategy and the optimizing camera mirror strategy, the intelligent agent optimizing strategy is updated into a virtual robot and virtual camera combined intelligent agent, the optimizing robot control strategy and the optimizing camera mirror strategy are updated into the virtual robot and virtual camera combined intelligent agent, the iterative optimization intelligent agent strategy is introduced into an initial stage of the optimizing robot control strategy and the optimizing camera mirror strategy to assist algorithm convergence, a more robust training effect is achieved, a strategy network and a reward function are defined, the strategy is updated by adopting a reinforcement learning algorithm strategy, the updated camera mirror and the robot control strategy are used for carrying out data collection and iteration of a new round, the intelligent robot strategy is prevented from falling down, and the whole body motion of the intelligent robot is prevented from being limited by the iterative strategy updating unit, or the intelligent agent strategy is always limited by the whole body, and the intelligent agent strategy is prevented from falling down.

Although embodiments of the present invention have been disclosed above, it is not limited to the details and embodiments shown and described, it is well suited to various fields of use for which the invention would be readily apparent to those skilled in the art, and accordingly, the invention is not limited to the specific details and illustrations shown and described herein, without departing from the general concepts defined in the claims and their equivalents.

Claims

1. A method for robot skill learning based on multi-source video data, comprising:

S100, automatically collecting sample videos related to the skill according to the text description of the sport skill through a sample video collection module and performing data expansion to obtain sport sample video data;

S200, constructing and instantiating a virtual robot and a virtual camera, coordinating a robot control strategy and a camera movement strategy and combining intelligent agents, and generating and recording robot motion video recording data;

S300, through the sports skill video scoring module, builds a video intelligent scoring model to generate scoring results for the robot sports video recording data;

S400, through the agent learning module, sets the reward feedback of the neural network scoring model to collaboratively optimize the robot control strategy and camera movement strategy, and updates them to the agent;

S200 includes:

S201, using the simulation experiment platform, based on the robot parameter information, performing robot virtual modeling, and constructing a virtual robot composed of a plurality of rigid bodies and a virtual structure;

S202, performing camera modeling according to the camera parameter information, constructing a pinhole model camera with a collision-free property, and obtaining a virtual camera;

S203, instantiating the virtual robot and the virtual camera, coordinating the robot control strategy with the camera movement strategy, and combining the instantiated virtual robot and the virtual camera into an intelligent body, performing virtual robot control and motion video recording, generating and recording the robot motion video captured by the camera in the simulation environment, and obtaining the robot motion video recording data; transmitting the robot motion video recording data to the sports skill video scoring module;

The control strategy is modeled through a fully connected neural network to obtain the robot control strategy network model;

A camera movement strategy network model is constructed; the input of the camera movement strategy network model includes the world poses of the camera and the robot in the past several frames, the relative poses of the camera and the robot, and the pixel coordinates of the robot joints projected in the picture; the output of the camera movement strategy network model is the next camera movement instruction, and the next camera movement instruction is decomposed into a camera rotation instruction and a camera displacement instruction; the virtual robot and the virtual camera are instantiated, the robot control strategy and the camera movement strategy are coordinated, and the instantiated virtual robot and the virtual camera are combined into an intelligent body, and the virtual robot control and motion video recording are performed, including: the robot and the camera are instantiated for the first time, and the parameters of the robot control strategy network model and the camera movement strategy network model are randomly initialized; in the subsequent learning process, the strategy networks of the robot control strategy network model and the camera movement strategy network model are gradually optimized and updated; the robot control strategy network model enables the robot to accurately reproduce the action skills in the example video, and the camera movement strategy network model records the motion video from an angle similar to the example video.

2. The method for learning robot skills using multi-source video data according to claim 1, wherein S100 comprises:

S101, setting a sports skill text description; parsing and extracting sports skill keywords from the sports skill text description by using a keyword extraction algorithm in combination with a large language model;

S102, performing tag content aggregation on sports skill keywords to obtain sports skill tags;

S103, according to the sports skill keywords and sports skill tags, collect sports example videos and perform data expansion to obtain sports example video data and sports video expansion data.

3. The method for learning robot skills using multi-source video data according to claim 1, wherein S300 comprises:

S301, extracting and storing features of robot motion video recording data, robot motion example video data, and motion video expansion data;

S302, constructing a neural network scoring model, comparing the feature similarity between the robot motion video recording data and the robot motion example video data and the motion video expansion data, and generating a scoring result for the motion video recording data.

4. The method for learning robot skills using multi-source video data according to claim 1, wherein S400 comprises:

S401, the agent learning module coordinately optimizes the robot control strategy and the camera movement strategy based on the feedback of the scoring result of the motion video recording data of the neural network scoring model through the reward feedback of the neural network scoring model, and obtains the optimized robot control strategy and the optimized camera movement strategy;

S402, iteratively optimize the agent strategy according to the optimized robot control strategy and the optimized camera movement strategy, and update it to the virtual robot and virtual camera combined agent.

5. A multi-source video data robot skill learning system, characterized by comprising:

The video collection data expansion subsystem automatically collects sample videos related to the skill and performs data expansion based on the text description of the sport skill through the sample video collection module to obtain the sport sample video data;

Virtually construct the control strategy subsystem, build and instantiate the virtual robot and virtual camera, coordinate the robot control strategy and camera movement strategy and combine the intelligent body, generate and record the robot motion video recording data;

The sports skill intelligent scoring subsystem builds a video intelligent scoring model through the sports skill video scoring module to generate scoring results for the robot's sports video recording data;

The strategy optimization agent update subsystem uses the agent learning module to set the reward feedback of the neural network scoring model to collaboratively optimize the robot control strategy and camera movement strategy, and update them to the agent;

Virtual construction control strategy subsystem, including:

The robot structure modeling unit uses the simulation experiment platform to perform robot virtual modeling according to the robot parameter information, and constructs a virtual robot composed of multiple rigid bodies.

A virtual camera modeling unit performs camera modeling according to camera parameter information, constructs a pinhole model camera with a collision-free property, and obtains a virtual camera;

Instantiate the intelligent agent combination unit, instantiate the virtual robot and the virtual camera, coordinate the robot control strategy with the camera movement strategy, and combine the instantiated virtual robot and virtual camera into an intelligent agent to control the virtual robot and record motion video, generate and record the robot motion video captured by the camera in the simulation environment, and obtain the robot motion video recording data; transmit the robot motion video recording data to the sports skill video scoring module;

6. A multi-source video data robot skill learning system according to claim 5, characterized in that the video collection data expansion subsystem comprises:

The keyword extraction and parsing unit sets a sports skill text description; the sports skill keywords in the sports skill text description are parsed and extracted from the sports skill text description by using a keyword extraction algorithm in combination with a large language model;

A tag content aggregation unit aggregates tag content of sports skill keywords to obtain sports skill tags;

The video tag expansion unit collects sports example videos and performs data expansion based on sports skill keywords and sports skill tags to obtain sports example video data and sports video expansion data.

7. A multi-source video data robot skill learning system according to claim 5, characterized in that the sports skill intelligent scoring subsystem comprises:

A feature extraction and storage unit is used to extract and store features of robot motion video recording data, robot motion example video data, and motion video expansion data;

The scoring model feature scoring unit constructs a neural network scoring model, compares the feature similarity between the robot motion video recording data and the robot motion example video data and the motion video expansion data, and generates the motion video recording data scoring result.

8. The multi-source video data robot skill learning system according to claim 5, characterized in that the strategy optimization agent update subsystem comprises:

The intelligent agent learning optimization unit, the intelligent agent learning module is based on the feedback of the scoring results of the motion video recording data of the neural network scoring model, and the reward feedback of the neural network scoring model is used to collaboratively optimize the robot control strategy and the camera movement strategy to obtain the optimized robot control strategy and the optimized camera movement strategy;

The intelligent agent strategy iteration and updating unit iteratively optimizes the intelligent agent strategy according to the optimized robot control strategy and the optimized camera movement strategy, and updates it to the virtual robot and virtual camera combined intelligent agent.