CN119204085B - Multi-source video data robot skill learning method and system - Google Patents
Multi-source video data robot skill learning method and system Download PDFInfo
- Publication number
- CN119204085B CN119204085B CN202411699867.0A CN202411699867A CN119204085B CN 119204085 B CN119204085 B CN 119204085B CN 202411699867 A CN202411699867 A CN 202411699867A CN 119204085 B CN119204085 B CN 119204085B
- Authority
- CN
- China
- Prior art keywords
- robot
- camera
- strategy
- video
- virtual
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 64
- 230000033001 locomotion Effects 0.000 claims abstract description 258
- 238000011217 control strategy Methods 0.000 claims abstract description 104
- 238000013528 artificial neural network Methods 0.000 claims abstract description 47
- 238000005457 optimization Methods 0.000 claims abstract description 22
- 238000004088 simulation Methods 0.000 claims description 25
- 238000000605 extraction Methods 0.000 claims description 20
- 238000004422 calculation algorithm Methods 0.000 claims description 15
- 230000009471 action Effects 0.000 claims description 13
- 230000002776 aggregation Effects 0.000 claims description 10
- 238000004220 aggregation Methods 0.000 claims description 10
- 238000003860 storage Methods 0.000 claims description 8
- 230000008569 process Effects 0.000 claims description 7
- 238000006073 displacement reaction Methods 0.000 claims description 4
- 239000003795 chemical substances by application Substances 0.000 description 58
- 238000012549 training Methods 0.000 description 48
- 238000013486 operation strategy Methods 0.000 description 17
- 238000004458 analytical method Methods 0.000 description 13
- 230000000694 effects Effects 0.000 description 11
- 230000007704 transition Effects 0.000 description 8
- 230000003287 optical effect Effects 0.000 description 7
- 230000002787 reinforcement Effects 0.000 description 7
- 238000009826 distribution Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 4
- 238000007781 pre-processing Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000013481 data capture Methods 0.000 description 2
- 238000013480 data collection Methods 0.000 description 2
- 210000002310 elbow joint Anatomy 0.000 description 2
- 210000000629 knee joint Anatomy 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000005293 physical law Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/008—Artificial life, i.e. computing arrangements simulating life based on physical entities controlled by simulated intelligence so as to replicate intelligent life forms, e.g. based on robots replicating pets or humans in their appearance or behaviour
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Robotics (AREA)
- Manipulator (AREA)
Abstract
The invention provides a multi-source video data robot skill learning method and system, which comprises the steps of automatically collecting example videos related to a skill through an example video collecting module according to a movement skill text description, carrying out data expansion to obtain movement example video data, constructing a virtual robot and a virtual camera, instantiating, combining an intelligent agent with a robot control strategy and a camera mirror strategy, generating and recording robot movement video recording data, constructing a video intelligent scoring model through a movement skill video scoring module, generating a scoring result for the robot movement video recording data, setting a reward feedback collaborative optimization robot control strategy and a camera mirror strategy of a neural network scoring model through an intelligent agent learning module, and updating the robot control strategy and the camera mirror strategy into the intelligent agent.
Description
Technical Field
The invention relates to the technical field of machine learning intelligent iteration data processing information transmission control, in particular to a multi-source video data robot skill learning method and system.
Background
Skill learning of virtual robots aims at providing them with a variety of motor skill keywords (e.g., walking, running, grabbing objects, etc.) that can be used to generate robot character animations that conform to human motor patterns and to physical laws. Since the motion process of a robot often requires joint control of multiple joints, and collision interaction with the environment during the motion process is often not differentiable, the existing methods generally adopt reinforcement learning technology to learn the control strategy of the robot. The training signal of the reinforcement learning method comes from reward feedback obtained after the robot executes a certain action, and the control strategy of the robot is gradually adjusted according to the feedback so as to obtain higher expected reward. The simulation learning method provides a simple and effective reward calculation method, namely, a reward is calculated based on the similarity between a robot action sequence and an example action sequence (usually from a human example), wherein the higher the similarity is, the larger the reward is, and conversely, the lower the similarity is, the smaller the reward is;
The similarity between the robot motion sequence and the example motion sequence is calculated by two common calculation methods, namely, based on tracking errors, namely, calculating an average value of L2 distances between each joint of the robot and the example individual joint at corresponding moments, and based on a distance between distributions, namely, measuring the distance between state transition distribution generated when the robot moves and state transition distribution generated by the example individual, and particularly, the method can be used for quantitatively estimating through a discriminator in an countermeasure learning method. The premise of calculating the two rewards is to obtain three-dimensional motion tracks of each joint of an example individual, and the data are often recovered through a motion capture device or a motion reconstruction algorithm. Existing skill learning methods based on video data typically involve two steps, first reconstructing a sequence of motion poses of an example individual from an example video, and then calculating a reward signal for the robot based on tracking errors or inter-distribution distances. Because the motion capturing and motion reconstructing methods have high requirements on training video acquisition, most of the existing video skill learning methods are only suitable for video with fixed visual angle recording and small character movement. Therefore, it is an important research challenge to improve the existing video skill learning method to better utilize massive internet data, the existing video skill learning method generally relies on a three-dimensional pose sequence to train, namely, a reward signal is calculated in a three-dimensional space to assist a robot to learn, however, the three-dimensional pose sequence is usually recovered by means of a complex motion capturing device or a motion reconstruction algorithm, both the two methods have high requirements on the acquisition process of a training video, the limitation makes the existing algorithm difficult to effectively utilize massive motor skills on the internet to display the video, so that the problem that the robot can learn skills is limited still to be solved, and therefore, a multi-source video data robot skill system and method are necessary to be provided to at least partially solve the problems in the prior art.
Disclosure of Invention
The summary of the invention is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to identify the scope of the claimed subject matter, since a series of concepts in a simplified form are included in the summary of the invention, which is described in further detail in the detailed description.
To at least partially solve the above problems, the present invention provides a multi-source video data robot skill learning method, comprising:
S100, automatically collecting example videos related to the skills according to the motor skill text description through an example video collecting module, and performing data expansion to obtain motor example video data;
s200, constructing and instantiating a virtual robot and a virtual camera, cooperating a robot control strategy and a camera mirror-moving strategy, combining an intelligent body, and generating and recording robot motion video recording data;
S300, constructing an intelligent video scoring model through a motor skill video scoring module, and generating a scoring result of robot motion video recording data;
s400, setting a reward feedback collaborative optimization robot control strategy and a camera lens strategy of a neural network scoring model through an agent learning module, and updating the reward feedback collaborative optimization robot control strategy and the camera lens strategy into the agent.
Preferably, S100 includes:
s101, setting a motor skill text description, and analyzing and extracting motor skill keywords in the motor skill text description from the motor skill text description by combining a keyword extraction algorithm with a large language model;
s102, performing label content aggregation on the motor skill keywords to obtain motor skill labels;
And S103, according to the motor skill keywords and the motor skill labels, gathering the motor example video and performing data expansion to obtain motor example video data and motor video expansion data.
Preferably, S200 includes:
s201, virtual modeling of the robot is carried out according to robot parameter information by using a simulation experiment platform, and a virtual robot with a virtual structure formed by combining a plurality of rigid bodies is constructed;
S202, modeling a camera according to camera parameter information, constructing a pinhole model camera without collision attribute, and obtaining a virtual camera;
and S203, instantiating the virtual robot and the virtual camera, combining the instantiated virtual robot and virtual camera into an intelligent body in cooperation with a robot control strategy and a camera mirror operation strategy, performing virtual robot control and motion video recording, generating and recording a robot motion video captured by a camera in a simulation environment, acquiring robot motion video recording data, and transmitting the robot motion video recording data to a motion skill video scoring module.
Preferably, S300 includes:
s301, extracting and storing characteristics of robot motion video recording data, robot motion example video data and motion video expansion data;
s302, a neural network scoring model is built, feature similarity of the robot motion video recording data, the robot motion example video data and the motion video expansion data is compared, and a motion video recording data scoring result is generated.
Preferably, S400 includes:
S401, the intelligent agent learning module acquires an optimized robot control strategy and an optimized camera lens strategy by cooperating with an optimized robot control strategy and a camera lens strategy through rewarding feedback of a neural network scoring model based on scoring result feedback of motion video recording data of the neural network scoring model;
S402, according to the optimized robot control strategy and the optimized camera lens-transporting strategy, iteratively optimizing the intelligent agent strategy, and updating the intelligent agent strategy into the virtual robot and the virtual camera combined intelligent agent.
The invention provides a multi-source video data robot skill system, comprising:
The video collection data expansion subsystem is used for automatically collecting example videos related to the skills according to the motor skill text description through the example video collection module and carrying out data expansion to obtain motor example video data;
Virtually constructing a control strategy subsystem, constructing a virtual robot and a virtual camera, instantiating, cooperating with a robot control strategy and a camera mirror strategy, combining an intelligent body, and generating and recording robot motion video recording data;
The intelligent motor skill scoring system constructs an intelligent video scoring model through a motor skill video scoring module to generate scoring results of robot motion video recording data;
And the strategy optimization agent updating subsystem is used for setting a reward feedback collaborative optimization robot control strategy and a camera lens-operating strategy of the neural network scoring model through an agent learning module and updating the strategy control strategy and the camera lens-operating strategy into the agent.
Preferably, the video gathering data expansion subsystem comprises:
the keyword extraction analysis unit is used for setting a motor skill text description, and analyzing and extracting motor skill keywords in the motor skill text description from the motor skill text description by combining a keyword extraction algorithm with a large language model;
the label content aggregation unit is used for carrying out label content aggregation on the motor skill keywords to obtain motor skill labels;
and the video tag expansion unit is used for collecting the motion example video and carrying out data expansion according to the motion skill keywords and the motion skill tags to obtain the motion example video data and the motion video expansion data.
Preferably, the virtual construction control strategy subsystem comprises:
The robot structure modeling unit is used for performing virtual modeling on the robot according to the robot parameter information by using a simulation experiment platform, and constructing a virtual robot with a virtual structure formed by combining a plurality of rigid bodies;
The virtual camera modeling unit is used for carrying out camera modeling according to the camera parameter information, constructing a pinhole model camera without collision attribute and obtaining a virtual camera;
The virtual robot and the virtual camera are instantiated, the virtual robot control strategy and the camera mirror strategy are cooperated, the instantiated virtual robot and virtual camera are combined into an intelligent body, virtual robot control and motion video recording are carried out, a robot motion video captured by a camera in a simulation environment is generated and recorded, robot motion video recording data are obtained, and the robot motion video recording data are transmitted to a motion skill video scoring module.
Preferably, the motor skills intelligence scoring subsystem comprises:
The feature extraction storage unit is used for extracting and storing features of robot motion video recording data, robot motion example video data and motion video expansion data;
And the scoring model feature scoring unit is used for constructing a neural network scoring model, comparing the feature similarity of the robot motion video recording data, the robot motion example video data and the motion video expansion data, and generating a motion video recording data scoring result.
Preferably, the policy optimization agent update subsystem comprises:
The intelligent learning optimization unit is used for acquiring an optimized robot control strategy and an optimized camera lens operation strategy by cooperatively optimizing the robot control strategy and the camera lens operation strategy through rewarding feedback of the neural network scoring model based on scoring result feedback of the motion video recording data of the neural network scoring model;
and the agent strategy iteration updating unit is used for iteratively optimizing the agent strategy according to the optimized robot control strategy and the optimized camera lens-operating strategy, and updating the agent strategy into the virtual robot and virtual camera combined agent.
Compared with the prior art, the invention at least comprises the following beneficial effects:
The invention discloses a multi-source video data robot skill learning method and system, wherein an example video collecting module is used for automatically collecting example videos related to skills and expanding data according to a motion skill text description to obtain motion example video data, a virtual robot and a virtual camera are constructed and instantiated, a robot motion video recording data is generated and recorded by combining a robot control strategy and a camera mirror strategy, a video intelligent scoring module is constructed to generate scoring results for the robot motion video recording data, a neural network scoring module is arranged through the intelligent body learning module, a reward feedback collaborative optimization robot control strategy and a camera mirror strategy of the neural network scoring module are set and updated into the intelligent body, the skill learning method based on the multi-source video data is provided, and aims to enable the virtual robot to learn a unified motion control strategy from the multi-source video example. The method simplifies the complex flow of the traditional method in the training data acquisition and preprocessing stage, improves the processing efficiency of training data, is of a multi-source video data type, models the skill learning problem based on the multi-source video data as a multi-agent reinforcement learning problem, trains a camera lens-operating strategy to assist the robot in finishing skill learning while training a robot motion control strategy, can naturally display the skill learning result of a virtual robot according to the self recording track of the robot without manual control, is a method for learning uniform motion skill keywords from the multi-source video data, combines the motion control of the robot with the camera lens-operating strategy, realizes a multi-source motion skill keyword learning method based on the multi-source video data, monitors the virtual robot to acquire new motion skill keywords through imitation learning, can naturally display the video by utilizing network data learning in various ways because the method does not require to acquire training videos in a specific mode, and can also display the natural skill keywords for the robot motion skill learning strategy while the method is used for intuitively displaying the natural skill training strategy of the robot.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:
Fig. 1 is a diagram of a multi-source video data robot skill learning system according to an embodiment of the invention.
Fig. 2 is a diagram of an embodiment of a multi-source video data robot skill learning method according to the present invention.
Fig. 3 is a diagram illustrating an exemplary embodiment of a multi-source video data robot skill learning method and system according to the present invention.
Detailed Description
The invention is further described in detail below with reference to the drawings and examples to enable one skilled in the art to practice the invention, and as shown in the drawings, the invention provides a multi-source video data robot skill learning method comprising:
S100, automatically collecting example videos related to the skills according to the motor skill text description through an example video collecting module, and performing data expansion to obtain motor example video data;
s200, constructing and instantiating a virtual robot and a virtual camera, cooperating a robot control strategy and a camera mirror-moving strategy, combining an intelligent body, and generating and recording robot motion video recording data;
S300, constructing an intelligent video scoring model through a motor skill video scoring module, and generating a scoring result of robot motion video recording data;
s400, setting a reward feedback collaborative optimization robot control strategy and a camera lens strategy of a neural network scoring model through an agent learning module, and updating the reward feedback collaborative optimization robot control strategy and the camera lens strategy into the agent.
The technical scheme has the principle and effects that the multi-source video data robot skill learning method comprises the steps of automatically collecting example videos related to skills according to a skill text description through an example video collecting module and expanding data, obtaining motion example video data, constructing a virtual robot and a virtual camera and instantiating, combining an intelligent body with a camera mirror strategy to generate and record robot motion video recording data, constructing a video intelligent scoring model through a skill video scoring module to generate scoring results for the robot motion video recording data, setting a reward feedback collaborative optimization robot control strategy and a camera mirror strategy of a neural network scoring model through an intelligent body learning module, and updating the strategy into the intelligent body, wherein the skill learning method based on the multi-source video data aims to enable a virtual robot to learn a unified motion control strategy from the multi-source video example, training the robot mirror strategy to record the robot, recording the scoring auxiliary motion video recording data through the scoring result, and avoiding the problem that the training method does not need to be used for preprocessing the video data in a massive scale, and the problem that the training of the strategy is not required to be used for training the video data is complex, and the training of the multi-source video data is not needed to be processed, and the problem is solved. The method simplifies the complex flow of the traditional method in the training data acquisition and preprocessing stage, improves the processing efficiency of training data, multi-source video data types, models the skill learning problem based on the multi-source video data as a multi-agent reinforcement learning problem, trains a camera lens strategy to assist the robot to complete skill learning while training a robot motion control strategy, the camera lens strategy trained by the method can adjust the recording track of the robot according to the motion track of the robot without manual control, naturally displays the skill learning result of the virtual robot, learns unified motion skill keywords from the multi-source video data, combines the method of training the robot motion control and the camera lens strategy, realizes a multi-source motion skill keyword learning method based on the multi-source video data, monitors the virtual robot to acquire new motion skill keywords through imitation learning, acquires training videos in a specific mode because the method does not require acquisition of a specific mode, can naturally display the skill learning result of the virtual robot according to the motion track, simultaneously gathers a video learning result for a large-scale video image through a video training strategy by a video training module 1, can also visually gather the video game content of the training strategy for the robot, and can display a video sample by capturing the video game content of the video through a motion map of the training module, the method comprises the steps of analyzing and judging motion keyword information of a motion example video, wherein the motion keyword information comprises a motion description keyword and a motion action characteristic keyword, the motion description keyword is subjected to video text description analysis through text description content and subtitle text description content of the motion example video to obtain a video text description analysis result, the video text description analysis result is compared with the motion action characteristic keyword according to the video text description analysis result to judge whether the video text description is consistent with the motion action characteristic, motion example video data are transmitted to a skill video scoring module, the skill video scoring module parallelly receives robot motion video recording data recorded by a skill video recording module in a 3D simulation environment system, a virtual robot and a virtual camera are combined to construct a combined intelligent body, the combined intelligent body records a virtual robot video to obtain robot motion video recording data, the robot motion video recording data are transmitted to the skill video scoring module, the skill video scoring module scores the motion example video data and the robot motion video recording data, the robot motion video data scoring information is transmitted to the intelligent body, the intelligent body learning intelligent body is continuously trained according to the training environment, and the training environment is continuously optimized.
In one embodiment, S100 comprises:
s101, setting a motor skill text description, and analyzing and extracting motor skill keywords in the motor skill text description from the motor skill text description by combining a keyword extraction algorithm with a large language model;
s102, performing label content aggregation on the motor skill keywords to obtain motor skill labels;
And S103, according to the motor skill keywords and the motor skill labels, gathering the motor example video and performing data expansion to obtain motor example video data and motor video expansion data.
The principle and effect of the technical proposal are that setting a text description of the motor skills; the method comprises the steps of analyzing and extracting a motor skill keyword in a motor skill text description through a keyword extraction algorithm in combination with a large language model, carrying out tag content aggregation on the motor skill keyword to obtain a motor skill tag, collecting motor example video and data expansion according to the motor skill keyword and the motor skill tag, obtaining motor example video and motor video expansion data, collecting motor example video and data expansion according to the motor skill keyword and the motor skill tag, collecting example video according to the motor skill keyword and the motor skill tag, obtaining collected motor example video through data capture, carrying out motor tag matching on the motor skill tag and a reference motor tag in the existing motor data set, taking out the motor video with the matching degree higher than the set reference matching degree, carrying out data expansion, obtaining motor example video data and motor video expansion data, transmitting the motor example video data and the motor video expansion data to a skill video scoring module, analyzing the text of the skill description, building a simulation environment and instantiating a machine, initializing/updating a motion mirror and a motion mirror, optimizing and a motion mirror according to a training strategy, and optimizing the training strategy by combining the intelligent model with the intelligent model of the motion mirror according to the analysis model.
In one embodiment, S200 includes:
s201, virtual modeling of the robot is carried out according to robot parameter information by using a simulation experiment platform, and a virtual robot with a virtual structure formed by combining a plurality of rigid bodies is constructed;
S202, modeling a camera according to camera parameter information, constructing a pinhole model camera without collision attribute, and obtaining a virtual camera;
and S203, instantiating the virtual robot and the virtual camera, combining the instantiated virtual robot and virtual camera into an intelligent body in cooperation with a robot control strategy and a camera mirror operation strategy, performing virtual robot control and motion video recording, generating and recording a robot motion video captured by a camera in a simulation environment, acquiring robot motion video recording data, and transmitting the robot motion video recording data to a motion skill video scoring module.
The principle and the effect of the technical scheme are that a simulation experiment platform is utilized to carry out virtual modeling on the robot according to the parameter information of the robot, and a virtual robot with a virtual structure formed by combining a plurality of rigid bodies is constructed; modeling the camera according to the camera parameter information, constructing a pinhole model camera without collision attribute, and obtaining a virtual camera; instantiating a virtual robot and a virtual camera, cooperating a robot control strategy and a camera operation strategy, combining the instantiated virtual robot and virtual camera into an intelligent body, carrying out virtual robot control and motion video recording, generating and recording a robot motion video captured by a camera in a simulation environment, acquiring robot motion video recording data, transmitting the robot motion video recording data to a motion skill video scoring module, carrying out virtual modeling on the robot according to robot parameter information by using a simulation experiment platform, constructing a virtual robot 211 with a plurality of rigid body combined virtual structures by using the simulation experiment platform, constructing a plurality of rigid body combined virtual structures by using the robot parameter information, connecting all parts of the virtual structures through virtual joints, using a rotation joint with 1 degree of freedom for knee joints and elbow joints, setting all other joints into spherical joints with 3 degrees of freedom, setting robot entity virtual association parameters including a robot total mass of 45 kg, a robot height of 1.62 m, a robot state space 197 for the robot, a space dimension for the robot, a network control strategy for the robot, and inputting the robot state space dimension for the robot control strategy for the network model control strategy, the method comprises the steps of outputting torque values for controlling joints, carrying out control strategy modeling through a fully-connected neural network, obtaining a robot control strategy network model through input and output dimensions of 197 and 36 respectively, wherein a training scene is a ground surface which is randomly initialized, the robot makes different actions on the ground surface, and a plurality of grid types and a plurality of physical parameters are arranged on the ground surface during initialization so as to improve the robustness of the robot control strategy;
According to camera parameter information, camera modeling is carried out, a pinhole model camera without collision attribute is built, virtual camera 212 is obtained, internal parameter of the camera is randomly sampled and generated within a certain range to cover the internal parameter range of the camera used when multi-source video data are collected, a camera lens operation strategy network model is built, input of the camera lens operation strategy network model comprises world pose of the camera and the robot in a plurality of frames in the past, relative pose of the camera and the robot, and projected pixel coordinates of a robot joint in a picture, output of the camera lens operation strategy network model is a camera next operation command, the next operation command is decomposed into a lens operation rotation command and a lens operation displacement command, virtual robot and a virtual camera are instantiated, the instantiated virtual robot and the instantiated virtual camera lens operation strategy are combined into an intelligent object, virtual robot control and motion video recording comprises the first instantiation of the camera and the robot in a plurality of frames, the robot control network strategy and the camera lens operation strategy network model is optimized, the network strategy is accurately studied from the network model, and the network model of the camera is accurately controlled from the network model, and the network strategy of the network model is accurately studied.
In one embodiment, S300 includes:
s301, extracting and storing characteristics of robot motion video recording data, robot motion example video data and motion video expansion data;
s302, a neural network scoring model is built, feature similarity of the robot motion video recording data, the robot motion example video data and the motion video expansion data is compared, and a motion video recording data scoring result is generated.
The principle and the effect of the technical scheme are that the characteristic extraction and storage are carried out on the robot motion video recording data, the robot motion example video data and the motion video expansion data;
The method comprises the steps of constructing a neural network scoring model, comparing characteristic similarity of robot motion video recording data with robot motion example video data and motion video expansion data, wherein the input of the neural network scoring model is the projection characteristic of two frames of state transitions before and after a robot on an image plane, the output of the neural network scoring model is a scoring value between 0 and 1, the state transitions sampled from example data distribution are trained to output 1, the state transitions recorded in the motion process of the robot are trained to output 0, the neural network scoring model is judged on the image space, the optical flow information is extracted from projection coordinates or images of the robot joints on the image plane, the scoring model is realized by using a fully connected neural network, the input is an observation vector with 52 dimensions, the observation vector comprises the projection coordinates (13 x 2) of the image of each joint of the robot and the optical flow information (13 x 2) of the projection position, the tensor with 1 dimension is input after a plurality of layers of fully connected layers, and the score between 0 and 1 is output through a moid activation function.
In one embodiment, S400 includes:
S401, the intelligent agent learning module acquires an optimized robot control strategy and an optimized camera lens strategy by cooperating with an optimized robot control strategy and a camera lens strategy through rewarding feedback of a neural network scoring model based on scoring result feedback of motion video recording data of the neural network scoring model;
S402, according to the optimized robot control strategy and the optimized camera lens-transporting strategy, iteratively optimizing the intelligent agent strategy, and updating the intelligent agent strategy into the virtual robot and the virtual camera combined intelligent agent.
The principle and the effect of the technical scheme are that the intelligent agent learning module acquires an optimized robot control strategy and an optimized camera lens-operating strategy by cooperating with the optimized robot control strategy and the camera lens-operating strategy through the reward feedback of the neural network scoring model based on the scoring result feedback of the motion video recording data of the neural network scoring model;
the method comprises the steps of optimizing a robot control strategy and optimizing a camera lens-operating strategy, iteratively optimizing an agent strategy, updating the agent strategy to a virtual robot and virtual camera combined agent, introducing a reward signal to assist algorithm convergence in the initial stage of optimizing the robot control strategy and optimizing the camera lens-operating strategy, realizing a more robust training effect, defining a strategy network and a reward function, updating the strategy network by adopting a reinforcement learning algorithm, and using the updated camera lens-operating strategy and the updated robot control strategy for carrying out new data collection and iterative training so as to iteratively optimize the agent strategy, wherein the strategy network and the reward function are defined to prevent the robot from falling down, limit the action amplitude of the robot and ensure that the camera can record the reward to the whole body or part of the trunk of the robot all the time.
The invention provides a multi-source video data robot skill system, comprising:
The video collection data expansion subsystem is used for automatically collecting example videos related to the skills according to the motor skill text description through the example video collection module and carrying out data expansion to obtain motor example video data;
Virtually constructing a control strategy subsystem, constructing a virtual robot and a virtual camera, instantiating, cooperating with a robot control strategy and a camera mirror strategy, combining an intelligent body, and generating and recording robot motion video recording data;
The intelligent motor skill scoring system constructs an intelligent video scoring model through a motor skill video scoring module to generate scoring results of robot motion video recording data;
And the strategy optimization agent updating subsystem is used for setting a reward feedback collaborative optimization robot control strategy and a camera lens-operating strategy of the neural network scoring model through an agent learning module and updating the strategy control strategy and the camera lens-operating strategy into the agent.
The principle and effect of the technical scheme are that the invention provides a multi-source video data robot skill system, which comprises a video collecting data expansion subsystem, a virtual construction control strategy subsystem, a virtual robot and virtual camera collaborative control strategy and camera fortune mirror strategy and an intelligent agent combination, wherein the video collecting data expansion subsystem automatically collects example videos related to skills according to a movement skill text description through an example video collecting module and expands the data, the virtual construction control strategy subsystem is used for constructing a virtual robot and a virtual camera and instantiating, the virtual robot control strategy and the camera fortune mirror strategy are cooperated and combined to generate and record robot movement video recording data, the movement skill intelligent evaluation subsystem is used for constructing a video intelligent scoring model through a movement skill video scoring module and generating scoring results for the robot movement video recording data, the strategy optimization intelligent agent updating subsystem is used for setting feedback of a neural network scoring model to cooperatively optimize the robot control strategy and the camera fortune mirror strategy and updating the strategy into the intelligent agent, the virtual robot is used for learning unified movement control strategy based on the multi-source video data, the virtual robot is used for learning a uniform movement control strategy from a multi-source video example, the method is not used for carrying out the training strategy, the training of the video training strategy is not needed to be completely different from the multi-source video example, the complex exercise data is not needed, the problem is solved by the training method is solved, the three-dimensional training data is not needs to be effectively trained, and the three-dimensional movement data is not has been processed, enabling a virtual robot to learn unified motor skills keywords from massive example videos of different sources. The method simplifies the complex flow of the traditional method in the training data acquisition and preprocessing stage, improves the processing efficiency of training data, multi-source video data types, models the skill learning problem based on the multi-source video data as a multi-agent reinforcement learning problem, trains a camera lens strategy to assist the robot to complete skill learning while training a robot motion control strategy, the camera lens strategy trained by the method can adjust the recording track of the robot according to the motion track of the robot without manual control, naturally displays the skill learning result of the virtual robot, learns unified motion skill keywords from the multi-source video data, combines the method of training the robot motion control and the camera lens strategy, realizes a multi-source motion skill keyword learning method based on the multi-source video data, monitors the virtual robot to acquire new motion skill keywords through imitation learning, acquires training videos in a specific mode because the method does not require acquisition of a specific mode, can naturally display the skill learning result of the virtual robot according to the motion track, simultaneously gathers a video learning result for a large-scale video image through a video training strategy by a video training module 1, can also visually gather the video game content of the training strategy for the robot, and can display a video sample by capturing the video game content of the video through a motion map of the training module, the method comprises the steps of analyzing and judging motion keyword information of a motion example video, wherein the motion keyword information comprises a motion description keyword and a motion action characteristic keyword, the motion description keyword is subjected to video text description analysis through text description content and subtitle text description content of the motion example video to obtain a video text description analysis result, the video text description analysis result is compared with the motion action characteristic keyword according to the video text description analysis result to judge whether the video text description is consistent with the motion action characteristic, motion example video data are transmitted to a skill video scoring module, the skill video scoring module parallelly receives robot motion video recording data recorded by a skill video recording module in a 3D simulation environment system, a virtual robot and a virtual camera are combined to construct a combined intelligent body, the combined intelligent body records a virtual robot video to obtain robot motion video recording data, the robot motion video recording data are transmitted to the skill video scoring module, the skill video scoring module scores the motion example video data and the robot motion video recording data, the robot motion video data scoring information is transmitted to the intelligent body, the intelligent body learning intelligent body is continuously trained according to the training environment, and the training environment is continuously optimized.
In one embodiment, a video gathering data expansion subsystem, comprising:
the keyword extraction analysis unit is used for setting a motor skill text description, and analyzing and extracting motor skill keywords in the motor skill text description from the motor skill text description by combining a keyword extraction algorithm with a large language model;
the label content aggregation unit is used for carrying out label content aggregation on the motor skill keywords to obtain motor skill labels;
and the video tag expansion unit is used for collecting the motion example video and carrying out data expansion according to the motion skill keywords and the motion skill tags to obtain the motion example video data and the motion video expansion data.
The principle and effect of the technical scheme are that the video collection data expansion subsystem comprises a keyword extraction and analysis unit and a motor skill text description; the method comprises the steps of analyzing and extracting a motor skill keyword in a motor text description by a keyword extraction algorithm in combination with a large language model, carrying out tag content aggregation on the motor skill keyword to obtain a motor skill tag, collecting a video tag expansion unit, collecting motor example videos and carrying out data expansion according to the motor skill keyword and the motor skill tag to obtain motor example video data and motor video expansion data, collecting motor example videos and carrying out data expansion according to the motor skill keyword and the motor skill tag, collecting the example videos through data capture, obtaining collected motor example videos, carrying out motor tag matching on the motor skill tag and a reference motor tag in the existing motor data set, carrying out data expansion on the motor videos with the matching degree higher than the set reference matching degree, transmitting the motor example video data and the motor video expansion data to a skill video scoring module, collecting example video data conforming to the description through analysis of the text of the skill description, constructing a simulation environment and initializing a robot and an updating machine, controlling a video scoring mirror and an intelligent training strategy and optimizing the model based on the intelligent model of the motor example video and the intelligent model, and updated into the simulation environment.
In one embodiment, the virtual build control strategy subsystem comprises:
The robot structure modeling unit is used for performing virtual modeling on the robot according to the robot parameter information by using a simulation experiment platform, and constructing a virtual robot with a virtual structure formed by combining a plurality of rigid bodies;
The virtual camera modeling unit is used for carrying out camera modeling according to the camera parameter information, constructing a pinhole model camera without collision attribute and obtaining a virtual camera;
The virtual robot and the virtual camera are instantiated, the virtual robot control strategy and the camera mirror strategy are cooperated, the instantiated virtual robot and virtual camera are combined into an intelligent body, virtual robot control and motion video recording are carried out, a robot motion video captured by a camera in a simulation environment is generated and recorded, robot motion video recording data are obtained, and the robot motion video recording data are transmitted to a motion skill video scoring module.
The principle and effect of the technical scheme are that the virtual construction control strategy subsystem comprises a robot structure modeling unit, a virtual camera modeling unit, an instantiation agent combination unit, a virtual robot 211, and a simulation experiment platform, wherein the robot structure modeling unit is used for performing virtual modeling on a robot according to robot parameter information to construct a virtual robot with a virtual structure composed of a plurality of rigid bodies, the virtual camera modeling unit is used for performing camera modeling according to the camera parameter information to construct a pinhole model camera without collision attribute to obtain a virtual camera, the instantiation agent combination unit is used for instantiating the virtual robot and the virtual camera, the virtual robot control strategy and the camera mirror strategy are cooperated, the instantiated virtual robot and the virtual camera are combined to form an intelligent agent, virtual robot control and motion video recording are performed, the robot motion video captured by a camera in a simulation environment is generated and acquired, the motion video recording data of the robot is transmitted to a motion skill video scoring module, the virtual robot is constructed by the virtual robot 211 composed of the plurality of rigid bodies according to the robot parameter information by using the simulation experiment platform, the virtual robot modeling is performed by constructing the virtual robot with the virtual structure composed of a plurality of rigid bodies according to the robot parameter information, the method comprises the steps of connecting each part of a virtual structure through virtual joints, using a rotating joint with 1 degree of freedom for knee joints and elbow joints, setting the other joints as spherical joints with 3 degrees of freedom, performing control strategy modeling through a fully connected neural network, wherein each robot role comprises 13 joints and 34 degrees of freedom, setting virtual association parameters of a robot entity, wherein each virtual association parameter of the robot entity comprises 45 kg of total mass of the robot, 1.62 m of height of the robot, 197D of state space dimension of the robot, 36D of motion space dimension of the robot, constructing a robot control strategy network model, wherein the robot control strategy network model comprises taking state information of the robot as input and outputting torque values for controlling each joint, the state information of the robot comprises spatial positions and motion speeds of each joint of the robot, performing control strategy modeling through the fully connected neural network, wherein the input and output dimensions are 197 and 36 respectively, obtaining a robot control strategy network model, training the scene is a ground surface which is initialized randomly, and the grid of the robot makes different motions on the ground, and is provided with various types and physical parameters during the initialization of the ground grid so as to improve the robustness of robot control strategy, and various types comprise flatness and the physical parameters and various grid parameters and various coefficient of friction coefficients;
According to camera parameter information, camera modeling is carried out, a pinhole model camera without collision attribute is built, virtual camera 212 is obtained, internal parameter of the camera is randomly sampled and generated within a certain range to cover the internal parameter range of the camera used when multi-source video data are collected, a camera lens operation strategy network model is built, input of the camera lens operation strategy network model comprises world pose of the camera and the robot in a plurality of frames in the past, relative pose of the camera and the robot, and projected pixel coordinates of a robot joint in a picture, output of the camera lens operation strategy network model is a camera next operation command, the next operation command is decomposed into a lens operation rotation command and a lens operation displacement command, virtual robot and a virtual camera are instantiated, the instantiated virtual robot and the instantiated virtual camera lens operation strategy are combined into an intelligent object, virtual robot control and motion video recording comprises the first instantiation of the camera and the robot in a plurality of frames, the robot control network strategy and the camera lens operation strategy network model is optimized, the network strategy is accurately studied from the network model, and the network model of the camera is accurately controlled from the network model, and the network strategy of the network model is accurately studied.
In one embodiment, the motor skills intelligence scoring subsystem includes:
The feature extraction storage unit is used for extracting and storing features of robot motion video recording data, robot motion example video data and motion video expansion data;
And the scoring model feature scoring unit is used for constructing a neural network scoring model, comparing the feature similarity of the robot motion video recording data, the robot motion example video data and the motion video expansion data, and generating a motion video recording data scoring result.
The intelligent scoring subsystem for the motor skills comprises a feature extraction and storage unit, a feature extraction and storage unit and a feature extraction and storage unit, wherein the feature extraction and storage unit is used for extracting and storing robot motion video recording data, robot motion example video data and motion video expansion data; the scoring model feature scoring unit is used for constructing a neural network scoring model, comparing the feature similarity of the robot motion video recording data with the robot motion example video data and the motion video expansion data to generate a motion video recording data scoring result, constructing a neural network scoring model, and comparing the feature similarity of the robot motion video recording data with the robot motion example video data and the motion video expansion data, wherein the input of the neural network scoring model is the projection feature of the state transition of the front frame and the rear frame of the robot on an image plane, the output of the neural network scoring model is a scoring value between 0 and 1, the model is trained to output 1 for the state transition sampled from the example data distribution, the model is trained to output 0 for the state transition recorded in the motion process of the robot, the neural network scoring model is judged on an image space, the scoring model is realized by using a fully connected neural network, the input is an observation vector with 52 dimensions, the image projection coordinate (13 x 2) of each joint of the robot and the information (13 x 2) of the projection position, the model is a plurality of optical flow 1 layers are connected in a layer, and the optical flow 1 is activated through a plurality of layers, and the optical flow 1 is activated, and the optical flow 1 is scored through a plurality of layers, and the optical flow 1 is activated.
In one embodiment, a policy optimization agent update subsystem includes:
The intelligent learning optimization unit is used for acquiring an optimized robot control strategy and an optimized camera lens operation strategy by cooperatively optimizing the robot control strategy and the camera lens operation strategy through rewarding feedback of the neural network scoring model based on scoring result feedback of the motion video recording data of the neural network scoring model;
and the agent strategy iteration updating unit is used for iteratively optimizing the agent strategy according to the optimized robot control strategy and the optimized camera lens-operating strategy, and updating the agent strategy into the virtual robot and virtual camera combined agent.
The principle and effect of the technical scheme are that the strategy optimization intelligent agent updating subsystem comprises an intelligent agent learning optimizing unit, an intelligent agent learning module, an intelligent agent strategy iteration updating unit, an intelligent agent control strategy and an intelligent agent control strategy updating unit, wherein the intelligent agent learning optimizing unit comprises an intelligent agent learning optimizing unit, the intelligent agent learning module is used for carrying out iterative optimization intelligent agent strategy updating according to the optimizing robot control strategy and the optimizing camera mirror strategy, the intelligent agent optimizing strategy is updated into a virtual robot and virtual camera combined intelligent agent, the optimizing robot control strategy and the optimizing camera mirror strategy are updated into the virtual robot and virtual camera combined intelligent agent, the iterative optimization intelligent agent strategy is introduced into an initial stage of the optimizing robot control strategy and the optimizing camera mirror strategy to assist algorithm convergence, a more robust training effect is achieved, a strategy network and a reward function are defined, the strategy is updated by adopting a reinforcement learning algorithm strategy, the updated camera mirror and the robot control strategy are used for carrying out data collection and iteration of a new round, the intelligent robot strategy is prevented from falling down, and the whole body motion of the intelligent robot is prevented from being limited by the iterative strategy updating unit, or the intelligent agent strategy is always limited by the whole body, and the intelligent agent strategy is prevented from falling down.
Although embodiments of the present invention have been disclosed above, it is not limited to the details and embodiments shown and described, it is well suited to various fields of use for which the invention would be readily apparent to those skilled in the art, and accordingly, the invention is not limited to the specific details and illustrations shown and described herein, without departing from the general concepts defined in the claims and their equivalents.
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202411699867.0A CN119204085B (en) | 2024-11-26 | 2024-11-26 | Multi-source video data robot skill learning method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202411699867.0A CN119204085B (en) | 2024-11-26 | 2024-11-26 | Multi-source video data robot skill learning method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN119204085A CN119204085A (en) | 2024-12-27 |
CN119204085B true CN119204085B (en) | 2025-02-18 |
Family
ID=94042393
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202411699867.0A Active CN119204085B (en) | 2024-11-26 | 2024-11-26 | Multi-source video data robot skill learning method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN119204085B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN119830993B (en) * | 2025-03-14 | 2025-06-20 | 北京通用人工智能研究院 | Multi-view video reward mechanism learning system and its construction method |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114748169A (en) * | 2022-03-31 | 2022-07-15 | 华中科技大学 | Autonomous endoscope moving method of laparoscopic surgery robot based on image experience |
CN115396595A (en) * | 2022-08-04 | 2022-11-25 | 北京通用人工智能研究院 | Video generation method and device, electronic equipment and storage medium |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10766136B1 (en) * | 2017-11-03 | 2020-09-08 | Amazon Technologies, Inc. | Artificial intelligence system for modeling and evaluating robotic success at task performance |
KR102619004B1 (en) * | 2018-12-14 | 2023-12-29 | 삼성전자 주식회사 | Robot control apparatus and method for learning task skill of the robot |
US11524402B2 (en) * | 2020-05-21 | 2022-12-13 | Intrinsic Innovation Llc | User feedback for robotic demonstration learning |
CN115442519B (en) * | 2022-08-08 | 2023-12-15 | 珠海普罗米修斯视觉技术有限公司 | Video processing method, device and computer-readable storage medium |
CN118003321A (en) * | 2023-07-31 | 2024-05-10 | 重庆越千创新科技有限公司 | Real-time control method and system for photographic robot |
US20240342557A1 (en) * | 2024-06-18 | 2024-10-17 | Archana Balkrishna Yadav | AI-Powered Robotic Defender for Performance Analysis and Advanced Sports Training |
-
2024
- 2024-11-26 CN CN202411699867.0A patent/CN119204085B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114748169A (en) * | 2022-03-31 | 2022-07-15 | 华中科技大学 | Autonomous endoscope moving method of laparoscopic surgery robot based on image experience |
CN115396595A (en) * | 2022-08-04 | 2022-11-25 | 北京通用人工智能研究院 | Video generation method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN119204085A (en) | 2024-12-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US12236513B2 (en) | Virtual character posture adjustment | |
WO2021143289A1 (en) | Animation processing method and apparatus, and computer storage medium and electronic device | |
CN119204085B (en) | Multi-source video data robot skill learning method and system | |
CN110599573A (en) | Method for realizing real-time human face interactive animation based on monocular camera | |
US11945125B2 (en) | Auxiliary photographing device for dyskinesia analysis, and control method and apparatus for auxiliary photographing device for dyskinesia analysis | |
CN111645065A (en) | Mechanical arm motion planning method based on deep reinforcement learning | |
CN111028317B (en) | Animation generation method, device and equipment for virtual object and storage medium | |
CN114888801B (en) | Mechanical arm control method and system based on offline strategy reinforcement learning | |
CN110796593A (en) | Image processing method, device, medium and electronic equipment based on artificial intelligence | |
CN116977506A (en) | Model action redirection method, device, electronic equipment and storage medium | |
Lin et al. | Balancing and reconstruction of segmented postures for humanoid robots in imitation of motion | |
CN116719409A (en) | Operating skill learning method based on active interaction of intelligent agents | |
Xue et al. | Learning to simulate complex scenes for street scene segmentation | |
Ramachandruni et al. | Attentive task-net: Self supervised task-attention network for imitation learning using video demonstration | |
Liu et al. | Differentiable robot rendering | |
Dai et al. | Research on 2D Animation Simulation Based on Artificial Intelligence and Biomechanical Modeling. | |
CN115648203B (en) | Method for realizing real-time mirror image behavior of robot based on lightweight neural network | |
CN117218713A (en) | Action resolving method, device, equipment and storage medium | |
Liang et al. | Interactive experience design of traditional dance in new media era based on action detection | |
Chen et al. | Fast adaptive character animation synthesis algorithm based on depth image sequence | |
Wang | A survey of visual analysis of human motion and its applications | |
Wu et al. | Video driven adaptive grasp planning of virtual hand using deep reinforcement learning | |
CN111311648A (en) | Hand-object interaction process tracking method based on cooperative differential evolution filtering | |
CN118429390B (en) | Self-supervision target tracking method and system based on image synthesis and domain countermeasure learning | |
Chen | YOLO Algorithm in Analysis and Design of Athletes’ Actions in College Physical Education |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |