[go: up one dir, main page]

US11062617B2 - Training system for autonomous driving control policy - Google Patents

Training system for autonomous driving control policy Download PDF

Info

Publication number
US11062617B2
US11062617B2 US16/968,608 US201916968608A US11062617B2 US 11062617 B2 US11062617 B2 US 11062617B2 US 201916968608 A US201916968608 A US 201916968608A US 11062617 B2 US11062617 B2 US 11062617B2
Authority
US
United States
Prior art keywords
control policy
simulator
policy
driving
unmanned vehicle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
US16/968,608
Other versions
US20200372822A1 (en
Inventor
Rongjun QIN
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Polixir Technologies Ltd
Original Assignee
Polixir Technologies Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Polixir Technologies Ltd filed Critical Polixir Technologies Ltd
Assigned to POLIXIR TECHNOLOGIES LIMITED reassignment POLIXIR TECHNOLOGIES LIMITED ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: QIN, Rongjun
Publication of US20200372822A1 publication Critical patent/US20200372822A1/en
Application granted granted Critical
Publication of US11062617B2 publication Critical patent/US11062617B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B19/00Programme-control systems
    • G05B19/02Programme-control systems electric
    • G05B19/04Programme control other than numerical control, i.e. in sequence controllers or logic controllers
    • G05B19/042Programme control other than numerical control, i.e. in sequence controllers or logic controllers using digital processors
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • G06N5/025Extracting rules from data
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B19/00Teaching not covered by other main groups of this subclass
    • G09B19/16Control of vehicles or other craft
    • G09B19/167Control of land vehicles
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B9/00Simulators for teaching or training purposes
    • G09B9/02Simulators for teaching or training purposes for teaching control of vehicles or other craft
    • G09B9/04Simulators for teaching or training purposes for teaching control of vehicles or other craft for teaching control of land vehicles
    • G09B9/05Simulators for teaching or training purposes for teaching control of vehicles or other craft for teaching control of land vehicles the view from a vehicle being simulated

Definitions

  • the present invention relates to a training system for an autonomous driving control policy, which is used to control unmanned devices such as unmanned vehicles, robots and UAV, and belongs to the technical field of autonomous driving.
  • Autonomous driving aims to eventually replace drivers from assisting the drivers in driving to realize safe, compliant and convenient personal autonomous traffic systems.
  • most driving control policies are based on manual rule schemes or real-time planning schemes. These existing schemes are not intelligent and have serious defects in realizing safe driving, and an autonomous driving control policy which covers all scenes, especially extreme scenes has not yet been designed.
  • Driving data of the drivers are acquired to train a model by supervised learning, so that outputs of the model are similar to human driving habits.
  • a large amount of driving data needs to be collected for model training, which involves a lot of human participation; and the large amount of collected driving data includes little extreme scene data, as a result, the model still cannot cover all driving scenes.
  • the model trained by supervised learning has a blind scene area and cannot complete driving tasks smoothly when used in unseen scenes.
  • Reinforcement learning can improve the decision-making capacity of intelligent agents by performing interactive trial-and-error between the intelligent agents and the environment to make sure that the intelligent agents gradually learn the optimal control policy in the environment to autonomously perform control.
  • lots of interactive trial-and-error needs to be performed between the intelligent agents and the environment; and in an actual autonomous driving scene, unmanned vehicles are required to perform a large quantity of independent explorations in the physical world.
  • unmanned vehicles are required to perform a large quantity of independent explorations in the physical world.
  • Such approach is extremely dangerous and costly.
  • the present invention provides a training system for generating a safe and autonomous driving control policy to solve the problems in the prior art and to overcome the shortcomings in the prior art.
  • a training system for an autonomous driving control policy comprises three modules of a construction of a simulator, a policy search, and a policy transfer;
  • Construction of the simulator a simulation to static factors such as power systems of vehicles and driving roads as well as a simulation to dynamic factors such as pedestrians, non-motor vehicles, and surrounding vehicles are involved;
  • the objective function includes a destination determination value for determining whether or not a vehicle has arrived at a destination, a compliance determination value for determining whether or not the vehicle has violated traffic regulations in the driving process, a safety determination value for determining whether or not the vehicle has been collided in the driving process, and a comfort determination value for determining whether or not the vehicle has excessively accelerated in the driving process, and is obtained by means of weighted summation of all the determination values; and
  • Policy transfer the policy searched out in the simulator is retrained according to data acquired by an unmanned vehicle entity to obtain a driving control policy used for the unmanned vehicle entity.
  • the dynamic factors in the road videos are detected by means of a manual annotation method or an object detection algorithm
  • surrounding information S(o,t) and position information L(o,t) of each dynamic factor o at all times t are extracted, the surrounding information S(o,t) and position movement information L(o,t) ⁇ L(o,t ⁇ 1) are paired, that is, S(o,t) is marked as L(o,t) ⁇ L(o,t ⁇ 1), and a labeled data set including all the dynamic factors at all the times is constructed;
  • a prediction model H which inputs a prediction value of S(o,t) and outputs a prediction value of L(o,t) ⁇ L(o,t ⁇ 1) is trained from the labeled data set by means of a supervised learning method such as a deep neural network learning algorithm or a decision tree learning algorithm; and
  • the prediction model is generated for each dynamic factor and can predict the difference between the current position and the next position of the dynamic factor according to an input state, and accordingly, the dynamic factors have the capability to respond to the environment, and it is unnecessary to keep the road scenes in the simulator completely consistent with the scenes captured in the videos.
  • An autonomous driving control policy aims to perform continuous control according to continuously input perceptual information to form a driving process.
  • parameters of a policy model are designed, for example, a multi-layer feedforward neural network, a convolution neural network, or a residual network is used as an implementation model of the control policy, and the control policy parameters are determined as connection weights among units of the neural network through training; and
  • the parameters of the policy model of the maximum evaluation value are searched for by means of an evolutionary algorithm or a reinforcement learning algorithm in a space defined by the parameters of the policy model.
  • the search process generally comprises the following steps:
  • Step 4 Updating a population by means of the evolutionary algorithm according to the result obtained in Step 3; or, updating a driving policy model by means of a reinforcement learning method;
  • a control action sequence (a1, a2, a3, . . . , an) is executed on the unmanned vehicle entity, and perception states (s0, s1, s2, s3, . . . , sn) of all executed action are collected;
  • an initial state is set as s0, and the same action sequence (a1, a2, a3, . . . , an) is executed; and perception states (s0, u1, u2, u3, . . . , un) are acquired;
  • g is trained by means of the evolutionary algorithm or the reinforcement learning method to make sure that the data from the unmanned vehicle entity is similar to data from the simulator as far as possible, that is, ⁇ i (si ⁇ ui) 2 is minimized.
  • control policy ⁇ obtained through training in the simulator is directly used for the unmanned vehicle entity.
  • FIG. 1 is a block diagram of main modules of a training system for an autonomous driving control policy.
  • a training system for an autonomous driving control policy mainly comprises and is technically characterized by three modules including a construction of a simulator, a policy search, and a policy transfer, as shown in FIG. 1 .
  • the construction of the simulator module includes a simulation to static factors such as power systems of vehicles and driving roads as well as a simulation to dynamic factors such as pedestrians, non-motor vehicles, and surrounding vehicles.
  • the policy search module sets an objective function in a constructed simulator and then searches for a driving control policy of the optimal objective function by means of a machine learning method, wherein the objective function includes a destination determination value for determining whether or not a vehicle has arrived at a destination, a compliance determination value for determining whether or not the vehicle has violated traffic regulations in the driving process, a safety determination value for determining whether or not the vehicle has been collided in the driving process, and a comfort determination value for determining whether or not the vehicle has excessively accelerated in the driving process, and is obtained by means of weighted summation of all the determination values.
  • the objective function includes a destination determination value for determining whether or not a vehicle has arrived at a destination, a compliance determination value for determining whether or not the vehicle has violated traffic regulations in the driving process, a safety determination value for determining whether or not the vehicle has been collided in the driving process, and a comfort determination value for determining whether or not the vehicle has excessively accelerated in the driving process, and is obtained by means of
  • the policy transfer module retrains the policy searched out in the simulator according to data acquired by an unmanned vehicle entity to obtain a driving control policy used for the unmanned vehicle entity.
  • videos of vehicles, pedestrians, and non-motor vehicles on roads in different scenes are captured by a traffic camera, a high-altitude camera, a UAV, or other devices;
  • the dynamic factors in the road videos are detected by means of a manual annotation method or an object detection algorithm, and the position sequence of each dynamic factor is constructed;
  • the position sequences of the dynamic factors are played in the simulator to generate a motion trajectory of the dynamic factors.
  • Embodiment 1 the motion trajectory of the captured dynamic factors is replayed in the simulator, and such approach has the following defects: first, road scenes in the simulator should be consistent with the scenes captured in the videos; and second, the dynamic factors do not have a capability to respond to the environment and are merely replayed.
  • An improved solution based on a machine learning method is described below.
  • the road videos are captured by the traffic camera, the high-altitude camera, the UAV, or other devices;
  • the dynamic factors in the road videos are detected by means of the manual annotation method or the object detection algorithm;
  • surrounding information S(o,t) including information of static factors visible at 360° around the dynamic factor, information of the rest of the dynamic factors, and the like
  • position information L(o,t) of each dynamic factor o at all times t are extracted the surrounding information S(o,t) is paired with position movement information L(o,t) ⁇ L(o,t ⁇ 1), that is, S(o,t) is marked as L(o,t) ⁇ L(o,t ⁇ 1), and a labeled data set including all the dynamic factors at all the times is constructed;
  • a prediction model H which inputs a prediction value of S(o,t) and outputs a prediction value of L(o,t) ⁇ L(o,t ⁇ 1) is trained from the labeled data set by means of a supervised learning method such as a deep neural network learning algorithm or a decision tree learning algorithm; and
  • the prediction model is generated for each dynamic factor and can predict the difference between the current position and the next position of the dynamic factor according to an input state, and accordingly, the dynamic factors have the capability to respond to the environment, and it is unnecessary to keep the road scenes in the simulator completely consistent with the scenes captured in the videos.
  • An autonomous driving control policy aims to perform continuous control according to continuously input perceptual information to form a driving process.
  • an objective function is designed as a weighted sum of a destination determination value for determining whether or not the vehicle has arrived at the destination, a compliance determination value for determining whether or not the vehicle has violated traffic regulations, a safety determination value for determining whether or not the vehicle has been collided in the driving process, and a comfort determination value for determining whether or not the vehicle has excessively accelerated in the driving process.
  • the destination determination value is equal to 1; when the vehicle has been collided, ⁇ 100 is added to the safety determination value; if the vehicle has violated traffic regulations, ⁇ 1 is added to the compliance determination value; if the vehicle has excessively accelerated or decelerated, or has driven at a large angular speed, ⁇ 0.01 is added to the comfort determination value, and finally these values are added together to obtain an evaluation index for marking each driving process.
  • control policy model for example, a multi-layer feedforward neural network, a convolution neural network, or a residual network is used as an implementation model of the control policy, it is necessary to further determine, through training, the control policy parameters as connection weights among units of the neural network.
  • the policy model parameters of the maximum evaluation value are searched for by means of an evolutionary algorithm or a reinforcement learning algorithm in a space defined by the policy model parameters.
  • the search process generally comprises the following steps:
  • Random control policy parameters are generated to obtain an initial control policy ⁇ k ;
  • the initial control policy ⁇ k is run in the simulator to obtain a motion trajectory of an unmanned vehicle in the simulator and to respectively evaluate a destination determination value, a safety determination value, a compliance determination value, and a comfort determination value of the motion trajectory, and these values are added together to obtain a result of an evaluation index after running the control policy;
  • a population is updated by means of the evolutionary algorithm according to the result obtained in Step 3; or, a driving policy model is updated by means of a reinforcement learning method;
  • Step 2 is repeated until all cycles are completed.
  • a control action sequence (a1, a2, a3, . . . , an) is executed on the unmanned vehicle entity, and perception states (s0, s1, s2, s3, . . . , sn) of all executed action are collected;
  • an initial state is set as s0, and the same action sequence (a1, a2, a3, . . . , an) is executed; and perception states (s, u1, u2, u3, . . . , un) are collected;
  • g is trained by means of the evolutionary algorithm or the reinforcement learning method to make sure that the data from the unmanned vehicle entity is similar to data from the simulator as far as possible, that is, ⁇ i (si ⁇ ui) 2 is minimized.
  • control policy ⁇ obtained through training in the simulator is directly used for the unmanned vehicle entity.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Business, Economics & Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Educational Technology (AREA)
  • Educational Administration (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Automation & Control Theory (AREA)
  • Physiology (AREA)
  • Genetics & Genomics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Feedback Control In General (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a training system for autonomous driving control policy, which comprises a simulator construction module based on machine learning, a driving control policy search module based on confrontation learning, and a driving control policy model transfer module.

Description

CROSS-REFERENCE TO RELATED APPLICATION
This application is a 371 of international application of PCT application serial no. PCT/CN2019/095711, filed on Jul. 12, 2019, which claims the priority benefit of China application no. 201910030302.6, filed on Jan. 14, 2019. The entirety of each of the above mentioned patent applications is hereby incorporated by reference herein and made a part of this specification.
BACKGROUND OF THE INVENTION 1. Technical Field
The present invention relates to a training system for an autonomous driving control policy, which is used to control unmanned devices such as unmanned vehicles, robots and UAV, and belongs to the technical field of autonomous driving.
2. Description of Related Art
Autonomous driving aims to eventually replace drivers from assisting the drivers in driving to realize safe, compliant and convenient personal autonomous traffic systems. In existing autonomous driving systems, most driving control policies are based on manual rule schemes or real-time planning schemes. These existing schemes are not intelligent and have serious defects in realizing safe driving, and an autonomous driving control policy which covers all scenes, especially extreme scenes has not yet been designed.
Recently, machine learning have been introduced into some autonomous driving schemes. Driving data of the drivers are acquired to train a model by supervised learning, so that outputs of the model are similar to human driving habits. However, by adoption of such approach, a large amount of driving data needs to be collected for model training, which involves a lot of human participation; and the large amount of collected driving data includes little extreme scene data, as a result, the model still cannot cover all driving scenes. Consequentially, the model trained by supervised learning has a blind scene area and cannot complete driving tasks smoothly when used in unseen scenes.
Reinforcement learning can improve the decision-making capacity of intelligent agents by performing interactive trial-and-error between the intelligent agents and the environment to make sure that the intelligent agents gradually learn the optimal control policy in the environment to autonomously perform control. However, in the reinforcement learning process, lots of interactive trial-and-error needs to be performed between the intelligent agents and the environment; and in an actual autonomous driving scene, unmanned vehicles are required to perform a large quantity of independent explorations in the physical world. Clearly, such approach is extremely dangerous and costly.
Thus, in autonomous driving tasks, a novel training solution for an autonomous driving policy is urgently needed to solve this problem.
BRIEF SUMMARY OF THE INVENTION
Objective: the present invention provides a training system for generating a safe and autonomous driving control policy to solve the problems in the prior art and to overcome the shortcomings in the prior art.
Technical solution: a training system for an autonomous driving control policy comprises three modules of a construction of a simulator, a policy search, and a policy transfer;
Construction of the simulator: a simulation to static factors such as power systems of vehicles and driving roads as well as a simulation to dynamic factors such as pedestrians, non-motor vehicles, and surrounding vehicles are involved;
Policy search: in a constructed simulator, an objective function is set, and then a driving control policy of the optimal objective function is searched for by means of a machine learning method; wherein, the objective function includes a destination determination value for determining whether or not a vehicle has arrived at a destination, a compliance determination value for determining whether or not the vehicle has violated traffic regulations in the driving process, a safety determination value for determining whether or not the vehicle has been collided in the driving process, and a comfort determination value for determining whether or not the vehicle has excessively accelerated in the driving process, and is obtained by means of weighted summation of all the determination values; and
Policy transfer: the policy searched out in the simulator is retrained according to data acquired by an unmanned vehicle entity to obtain a driving control policy used for the unmanned vehicle entity.
The dynamic factors are simulated in the simulator through the following solution:
Firstly, road videos are captured;
Secondly, the dynamic factors in the road videos are detected by means of a manual annotation method or an object detection algorithm;
Thirdly, surrounding information S(o,t) and position information L(o,t) of each dynamic factor o at all times t are extracted, the surrounding information S(o,t) and position movement information L(o,t)−L(o,t−1) are paired, that is, S(o,t) is marked as L(o,t)−L(o,t−1), and a labeled data set including all the dynamic factors at all the times is constructed;
Fourthly, a prediction model H which inputs a prediction value of S(o,t) and outputs a prediction value of L(o,t)−L(o,t−1) is trained from the labeled data set by means of a supervised learning method such as a deep neural network learning algorithm or a decision tree learning algorithm; and
Finally, in the simulator, surrounding information S(o) and position information L(o) of each dynamic factor o are extracted, a prediction model H(S(o)) is called to obtain a value v, and accordingly, L(o)+v is the next position of the dynamic factor.
In this solution, the prediction model is generated for each dynamic factor and can predict the difference between the current position and the next position of the dynamic factor according to an input state, and accordingly, the dynamic factors have the capability to respond to the environment, and it is unnecessary to keep the road scenes in the simulator completely consistent with the scenes captured in the videos.
Policy Search:
An autonomous driving control policy aims to perform continuous control according to continuously input perceptual information to form a driving process.
Firstly, according to the requirement of a system user for the driving policy, for example, the safety, compliance, and comfort of the vehicle are required to be guaranteed when a vehicle arrives at a driving destination, an objective function is designed;
Secondly, parameters of a policy model are designed, for example, a multi-layer feedforward neural network, a convolution neural network, or a residual network is used as an implementation model of the control policy, and the control policy parameters are determined as connection weights among units of the neural network through training; and
Thirdly, as for the objective function, the parameters of the policy model of the maximum evaluation value are searched for by means of an evolutionary algorithm or a reinforcement learning algorithm in a space defined by the parameters of the policy model. The search process generally comprises the following steps:
1. Setting k=0;
2. Generating random control policy parameters to obtain an initial control policy πk;
3. Running the initial control policy πk in the simulator to obtain a motion trajectory of an unmanned vehicle in the simulator and to respectively evaluate a destination determination value, a safety determination value, a compliance determination value, and a comfort determination value of the motion trajectory, and adding these values together to obtain a result of an evaluation index after running the control policy;
4. Updating a population by means of the evolutionary algorithm according to the result obtained in Step 3; or, updating a driving policy model by means of a reinforcement learning method;
5. After the update, obtaining a driving policy model to be executed next time, and setting k=k+1; and
6. Repeating Step 2 until all cycles are completed.
Policy Transfer:
[Solution 1] Initialization of a transfer model: A control policy model is run in the unmanned vehicle entity with an autonomous driving control policy model obtained through training in the simulator as a starting point, and is updated by means of obtained data.
[Solution 2] The simulator transition correction and transfer:
Firstly, a control action sequence (a1, a2, a3, . . . , an) is executed on the unmanned vehicle entity, and perception states (s0, s1, s2, s3, . . . , sn) of all executed action are collected;
Secondly, in the simulator, an initial state is set as s0, and the same action sequence (a1, a2, a3, . . . , an) is executed; and perception states (s0, u1, u2, u3, . . . , un) are acquired;
Thirdly, in the simulator, a transition correction function g is constructed, an action a from a current state s and a control policy π is input to g, and a correction action a′ replacing the action a is output from g and is actually executed in the environment, that is, a′=g(s, π(s)); and
Fourthly, g is trained by means of the evolutionary algorithm or the reinforcement learning method to make sure that the data from the unmanned vehicle entity is similar to data from the simulator as far as possible, that is, Σi(si−ui)2 is minimized.
After the above-mentioned correction, the control policy π obtained through training in the simulator is directly used for the unmanned vehicle entity.
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
FIG. 1 is a block diagram of main modules of a training system for an autonomous driving control policy.
DETAILED DESCRIPTION OF THE INVENTION
The present invention is further expounded below in combination with the specific embodiments which are only used to explain the present invention and are not intended to limit the scope of the present invention. After reading the present invention, those skilled in the art can obtain various equivalent modifications of the present invention, and all these equivalent modifications should also fall within the scope defined by the appended claims of the present application.
A training system for an autonomous driving control policy mainly comprises and is technically characterized by three modules including a construction of a simulator, a policy search, and a policy transfer, as shown in FIG. 1.
The construction of the simulator module includes a simulation to static factors such as power systems of vehicles and driving roads as well as a simulation to dynamic factors such as pedestrians, non-motor vehicles, and surrounding vehicles.
The policy search module sets an objective function in a constructed simulator and then searches for a driving control policy of the optimal objective function by means of a machine learning method, wherein the objective function includes a destination determination value for determining whether or not a vehicle has arrived at a destination, a compliance determination value for determining whether or not the vehicle has violated traffic regulations in the driving process, a safety determination value for determining whether or not the vehicle has been collided in the driving process, and a comfort determination value for determining whether or not the vehicle has excessively accelerated in the driving process, and is obtained by means of weighted summation of all the determination values.
The policy transfer module retrains the policy searched out in the simulator according to data acquired by an unmanned vehicle entity to obtain a driving control policy used for the unmanned vehicle entity.
Construction of the static factors, including dynamical models of the vehicles, road models, and so on in the simulator, is mature in the field, and the difficulty in the simulator construction lies in construction of the dynamic factors, which include behavioral models of the pedestrians, the non-motor vehicles, and the surrounding vehicles. Specific implementations for simulating dynamic factors are as follows:
Embodiment 1
Firstly, videos of vehicles, pedestrians, and non-motor vehicles on roads in different scenes are captured by a traffic camera, a high-altitude camera, a UAV, or other devices;
Secondly, the dynamic factors in the road videos are detected by means of a manual annotation method or an object detection algorithm, and the position sequence of each dynamic factor is constructed; and
Thirdly, the position sequences of the dynamic factors are played in the simulator to generate a motion trajectory of the dynamic factors.
Embodiment 2
In Embodiment 1, the motion trajectory of the captured dynamic factors is replayed in the simulator, and such approach has the following defects: first, road scenes in the simulator should be consistent with the scenes captured in the videos; and second, the dynamic factors do not have a capability to respond to the environment and are merely replayed. An improved solution based on a machine learning method is described below.
Firstly, the road videos are captured by the traffic camera, the high-altitude camera, the UAV, or other devices;
Secondly, the dynamic factors in the road videos are detected by means of the manual annotation method or the object detection algorithm;
Thirdly, surrounding information S(o,t) (including information of static factors visible at 360° around the dynamic factor, information of the rest of the dynamic factors, and the like) and position information L(o,t) of each dynamic factor o at all times t are extracted the surrounding information S(o,t) is paired with position movement information L(o,t)−L(o,t−1), that is, S(o,t) is marked as L(o,t)−L(o,t−1), and a labeled data set including all the dynamic factors at all the times is constructed;
Fourthly, a prediction model H which inputs a prediction value of S(o,t) and outputs a prediction value of L(o,t)−L(o,t−1) is trained from the labeled data set by means of a supervised learning method such as a deep neural network learning algorithm or a decision tree learning algorithm; and
Finally, in the simulator, surrounding information S(o) and position information L(o) of each dynamic factor o are extracted, a prediction model H(S(o)) is called to obtain a value v, and accordingly, L(o)+v is the next position of the dynamic factor.
In this solution, the prediction model is generated for each dynamic factor and can predict the difference between the current position and the next position of the dynamic factor according to an input state, and accordingly, the dynamic factors have the capability to respond to the environment, and it is unnecessary to keep the road scenes in the simulator completely consistent with the scenes captured in the videos.
Policy Search:
An autonomous driving control policy aims to perform continuous control according to continuously input perceptual information to form a driving process.
Firstly, according to the requirement of a system user for the driving policy, for example, the safety, compliance, and comfort of the vehicle are required to be guaranteed when a vehicle arrives at a driving destination, an objective function is designed as a weighted sum of a destination determination value for determining whether or not the vehicle has arrived at the destination, a compliance determination value for determining whether or not the vehicle has violated traffic regulations, a safety determination value for determining whether or not the vehicle has been collided in the driving process, and a comfort determination value for determining whether or not the vehicle has excessively accelerated in the driving process. For example, if the vehicle has finally arrived at the destination within a given time in the driving process, the destination determination value is equal to 1; when the vehicle has been collided, −100 is added to the safety determination value; if the vehicle has violated traffic regulations, −1 is added to the compliance determination value; if the vehicle has excessively accelerated or decelerated, or has driven at a large angular speed, −0.01 is added to the comfort determination value, and finally these values are added together to obtain an evaluation index for marking each driving process.
Secondly, to design parameters of a control policy model, for example, a multi-layer feedforward neural network, a convolution neural network, or a residual network is used as an implementation model of the control policy, it is necessary to further determine, through training, the control policy parameters as connection weights among units of the neural network.
Thirdly, as for the objective function, the policy model parameters of the maximum evaluation value are searched for by means of an evolutionary algorithm or a reinforcement learning algorithm in a space defined by the policy model parameters. The search process generally comprises the following steps:
1. k=0 is set;
2. Random control policy parameters are generated to obtain an initial control policy πk;
3. The initial control policy πk is run in the simulator to obtain a motion trajectory of an unmanned vehicle in the simulator and to respectively evaluate a destination determination value, a safety determination value, a compliance determination value, and a comfort determination value of the motion trajectory, and these values are added together to obtain a result of an evaluation index after running the control policy;
4. A population is updated by means of the evolutionary algorithm according to the result obtained in Step 3; or, a driving policy model is updated by means of a reinforcement learning method;
5. After the update, a driving policy model to be executed next time is obtained, and k=k+1 is set; and
6. Step 2 is repeated until all cycles are completed.
Policy Transfer:
[Solution 1] Initialization of a transfer model: A control policy model is run in the unmanned vehicle entity with an autonomous driving control policy model obtained through training in the simulator as a starting point and is updated by means of obtained data.
[Solution 2] The simulator transition correction and transfer:
Firstly, a control action sequence (a1, a2, a3, . . . , an) is executed on the unmanned vehicle entity, and perception states (s0, s1, s2, s3, . . . , sn) of all executed action are collected;
Secondly, in the simulator, an initial state is set as s0, and the same action sequence (a1, a2, a3, . . . , an) is executed; and perception states (s, u1, u2, u3, . . . , un) are collected;
Thirdly, a function g is constructed to correct the deviation of the simulator, an action a=π(s) from a current state s and a control policy π is input to the function g, and a correction action a′ replacing the action a is output from the function g and is actually executed in the environment, that is, a′=g(s, a); and
Fourthly, g is trained by means of the evolutionary algorithm or the reinforcement learning method to make sure that the data from the unmanned vehicle entity is similar to data from the simulator as far as possible, that is, Σi(si−ui)2 is minimized.
After the above-mentioned correction, the control policy π obtained through training in the simulator is directly used for the unmanned vehicle entity.

Claims (4)

What is claimed is:
1. A unmanned vehicle, comprising a processor, configured to:
construct a simulator to simulate static factors of power systems of vehicles and driving roads, and dynamic factors of pedestrians, non-motor vehicles, and surrounding vehicles,
wherein road videos are captured by a road camera, and the dynamic factors are detected in the road videos;
set a driving objective function in the simulator which is constructed, and a driving control policy of an optimal objective function is searched by using a machine learning algorithm; and
execute a policy transfer to retrain the driving control policy searched out in the simulator according to data acquired by the unmanned vehicle to obtain a retrained driving control policy used for the unmanned vehicle, and
control movement of the unmanned vehicle based on the retrained driving control policy, wherein
an execution of the policy transfer comprises:
running a control policy model in the unmanned vehicle with an autonomous driving control policy model obtained through training in the simulator as a starting point, and updating the control policy model by obtained data;
executing a control action sequence (a1, a2, a3, . . . , an) on the unmanned vehicle, and collecting perception states (s0, s1, s2, s3, . . . , sn) of all executed action;
setting an initial state in the simulator as s0, and executing a same action sequence (a1, a2, a3, . . . , an); and collecting perception states (s0, u1, u2, u3, . . . , un);
constructing a transition correction function g in the simulator, inputting an action a from a current state s and the driving control policy π to the g, and outputting a correction action a′ replacing the action a from the g and executing the a′ in an environment, wherein a′=g(s, π(s)); and
training the g by using an evolutionary algorithm or a reinforcement learning algorithm to minimize a difference between data from the unmanned vehicle and data from the simulator, wherein Σi(si−ui)2 is minimized; and
after training the g, obtaining the retrained driving control policy π through training in the simulator, and the retrained driving control policy π is directly used for controlling the movement of the unmanned vehicle.
2. The unmanned vehicle according to claim 1, wherein the processor is further configured to:
extract surrounding information S(o,t) and position information L(o,t) of each dynamic factor o at all times t, pair the surrounding information S(o,t) and position movement information L(o,t)−L(o,t−1), wherein the S(o,t) is marked as the L(o,t)−L(o,t−1), and construct a labeled data set including all the dynamic factors at all the times;
train a prediction model H which inputs a prediction value of the S(o,t) and outputs a prediction value of the L(o,t)−L(o,t−1) from the labeled data set by using a supervised learning algorithm; and
extract surrounding information S(o) and position information L(o) of each said dynamic factor o in the simulator, call a prediction model H(S(o)) to obtain a value v, and accordingly, L(o)+v is a next position of the dynamic factor.
3. The unmanned vehicle according to claim 1, wherein an autonomous driving control policy aims to perform continuous control according to continuously input perceptual information to form a driving process, and the processor is further configured to:
design, according to a requirement of a system user for a driving policy, an objective function;
design parameters of a policy model, use a multi-layer feedforward neural network, a convolution neural network, or a residual network as an implementation model of a control policy, and determine control policy parameters as connection weights among units of the neural network through training; and
as for the objective function, search for the parameters of the policy model having a maximum evaluation value by using an evolutionary algorithm or a reinforcement learning algorithm in a space defined by the parameters of the policy model.
4. The unmanned vehicle according to claim 3, wherein a search process comprises the following steps:
(1) setting k=0;
(2) generating random control policy parameters to obtain an initial control policy πk;
(3) running the initial control policy πk in the simulator to obtain a motion trajectory of an unmanned vehicle in the simulator and to respectively evaluate a destination determination value, a safety determination value, a compliance determination value, and a comfort determination value of the motion trajectory, and adding these values together to obtain a result of an evaluation index after running the control policy;
(4) updating a population by using the evolutionary algorithm according to the result obtained in the Step (3); or, updating a driving policy model by using a reinforcement learning algorithm;
(5) after the update, obtaining the driving policy model to be executed next time, and setting k=k+1; and
(6) repeating the Step (2) until all cycles are completed.
US16/968,608 2019-01-14 2019-07-12 Training system for autonomous driving control policy Active US11062617B2 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN201910030302.6 2019-01-14
CN201910030302.6A CN109765820B (en) 2019-01-14 2019-01-14 A kind of training system for automatic Pilot control strategy
PCT/CN2019/095711 WO2020147276A1 (en) 2019-01-14 2019-07-12 Training system for automatic driving control strategy

Publications (2)

Publication Number Publication Date
US20200372822A1 US20200372822A1 (en) 2020-11-26
US11062617B2 true US11062617B2 (en) 2021-07-13

Family

ID=66453751

Family Applications (1)

Application Number Title Priority Date Filing Date
US16/968,608 Active US11062617B2 (en) 2019-01-14 2019-07-12 Training system for autonomous driving control policy

Country Status (3)

Country Link
US (1) US11062617B2 (en)
CN (1) CN109765820B (en)
WO (1) WO2020147276A1 (en)

Families Citing this family (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109765820B (en) * 2019-01-14 2019-08-09 南栖仙策(南京)科技有限公司 A kind of training system for automatic Pilot control strategy
CN110322017A (en) * 2019-08-13 2019-10-11 吉林大学 Automatic Pilot intelligent vehicle Trajectory Tracking Control strategy based on deeply study
CN111222630B (en) * 2020-01-17 2023-07-25 北京工业大学 A Learning Method for Autonomous Driving Rules Based on Deep Reinforcement Learning
CN111258314B (en) * 2020-01-20 2022-07-15 中国科学院深圳先进技术研究院 Collaborative evolution-based decision-making emergence method for automatic driving vehicle
CN111310919B (en) * 2020-02-08 2020-10-16 南栖仙策(南京)科技有限公司 Driving control strategy training method based on scene segmentation and local path planning
CN111324358B (en) * 2020-02-14 2020-10-16 南栖仙策(南京)科技有限公司 Training method for automatic operation and maintenance strategy of information system
CN111339675B (en) * 2020-03-10 2020-12-01 南栖仙策(南京)科技有限公司 Training method for intelligent marketing strategy based on machine learning simulation environment
CN112700642B (en) * 2020-12-19 2022-09-23 北京工业大学 A method for improving traffic efficiency by using intelligent networked vehicles
CN112650240B (en) * 2020-12-21 2024-08-20 深圳大学 Automatic driving method for training multi-agent multi-scene data set
CN112651446B (en) * 2020-12-29 2023-04-14 杭州趣链科技有限公司 Unmanned automobile training method based on alliance chain
CN112906126B (en) * 2021-01-15 2023-04-07 北京航空航天大学 Vehicle hardware in-loop simulation training system and method based on deep reinforcement learning
CN112395777B (en) * 2021-01-21 2021-04-16 南栖仙策(南京)科技有限公司 Engine calibration parameter optimization method based on automobile exhaust emission simulation environment
CN113110592B (en) * 2021-04-23 2022-09-23 南京大学 Unmanned aerial vehicle obstacle avoidance and path planning method
CN113276883B (en) * 2021-04-28 2023-04-21 南京大学 Driving strategy planning method and implementation device for unmanned vehicles based on dynamic generation environment
CN113050433B (en) * 2021-05-31 2021-09-14 中国科学院自动化研究所 Robot control strategy migration method, device and system
CN117441174A (en) * 2021-05-31 2024-01-23 罗伯特·博世有限公司 Method and apparatus for training a neural network for imitating the behavior of a presenter
CN113741420B (en) * 2021-07-28 2023-12-19 浙江工业大学 A data-driven sampling search method and system
CN113743469B (en) * 2021-08-04 2024-05-28 北京理工大学 Automatic driving decision method integrating multi-source data and comprehensive multi-dimensional indexes
CN113885491A (en) * 2021-08-29 2022-01-04 北京工业大学 Unmanned decision-making and control method based on federal deep reinforcement learning
CN113934966B (en) * 2021-09-17 2024-07-26 北京理工大学 Method for using graph convolution reinforcement learning to minimize information age in group perception
CN113848913B (en) * 2021-09-28 2023-01-06 北京三快在线科技有限公司 Control method and control device of unmanned equipment
CN113837063B (en) * 2021-10-15 2024-05-10 中国石油大学(华东) Reinforcement learning-based curling motion field analysis and auxiliary decision-making method
CN114489712A (en) * 2021-12-22 2022-05-13 中智行(苏州)科技有限公司 A production method for training data of unmanned automatic model
CN114179835B (en) * 2021-12-30 2024-01-05 清华大学苏州汽车研究院(吴江) Automatic driving vehicle decision training method based on reinforcement learning in real scene
CN114384901B (en) * 2022-01-12 2022-09-06 浙江中智达科技有限公司 Reinforced learning aided driving decision-making method oriented to dynamic traffic environment
CN114117829B (en) * 2022-01-24 2022-04-22 清华大学 Dynamic modeling method and system for man-vehicle-road closed loop system under limit working condition
CN114104005B (en) * 2022-01-26 2022-04-19 苏州浪潮智能科技有限公司 Decision-making method, device and equipment of automatic driving equipment and readable storage medium
CN114510012B (en) * 2022-02-16 2024-11-29 中国电子科技集团公司第五十四研究所 Unmanned cluster evolution system and method based on meta-action sequence reinforcement learning
CN114580302A (en) * 2022-03-16 2022-06-03 重庆大学 Decision planning method for automatic driving automobile based on maximum entropy reinforcement learning
CN114771561B (en) * 2022-03-31 2025-05-30 中国人民解放军陆军工程大学 A method, device and storage medium for generating a strategy for autonomous driving
CN115437924B (en) * 2022-08-17 2025-07-22 电子科技大学 Uncertainty estimation method of end-to-end automatic driving decision algorithm
CN115512554B (en) * 2022-09-02 2023-07-28 北京百度网讯科技有限公司 Parameter model training and traffic signal control method, device, equipment and medium
CN115761144B (en) * 2022-12-08 2024-06-04 上海人工智能创新中心 Automatic driving strategy pre-training method based on self-supervision geometric modeling
CN116842698B (en) * 2023-05-31 2024-08-09 华能伊敏煤电有限责任公司 Unmanned transportation simulation test method
CN118323198B (en) * 2024-06-13 2024-08-27 新石器慧通(北京)科技有限公司 Training and using method and device of decision model in automatic driving vehicle and vehicle

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170262790A1 (en) * 2016-03-11 2017-09-14 Route4Me, Inc. Complex dynamic route sequencing for multi-vehicle fleets using traffic and real-world constraints
US20170372431A1 (en) * 2016-06-24 2017-12-28 Swiss Reinsurance Company Ltd. Autonomous or partially autonomous motor vehicles with automated risk-controlled systems and corresponding method thereof
CN107609633A (en) 2017-05-03 2018-01-19 同济大学 The position prediction model construction method of vehicle traveling influence factor based on deep learning in car networking complex network
US20180048801A1 (en) * 2016-08-09 2018-02-15 Contrast, Inc. Real-time hdr video for vehicle control
CN107862346A (en) 2017-12-01 2018-03-30 驭势科技(北京)有限公司 A kind of method and apparatus for carrying out driving strategy model training
US20180164825A1 (en) * 2016-12-09 2018-06-14 Zendrive, Inc. Method and system for risk modeling in autonomous vehicles
CN108447076A (en) 2018-03-16 2018-08-24 清华大学 Multi-object tracking method based on depth enhancing study
US20180293514A1 (en) * 2017-04-11 2018-10-11 International Business Machines Corporation New rule creation using mdp and inverse reinforcement learning
US20180373997A1 (en) * 2017-06-21 2018-12-27 International Business Machines Corporation Automatically state adjustment in reinforcement learning
US20190122378A1 (en) * 2017-04-17 2019-04-25 The United States Of America, As Represented By The Secretary Of The Navy Apparatuses and methods for machine vision systems including creation of a point cloud model and/or three dimensional model based on multiple images from different perspectives and combination of depth cues from camera motion and defocus with various applications including navigation systems, and pattern matching systems as well as estimating relative blur between images for use in depth from defocus or autofocusing applications
US20190146508A1 (en) * 2017-11-14 2019-05-16 Uber Technologies, Inc. Dynamic vehicle routing using annotated maps and profiles
CN109765820A (en) 2019-01-14 2019-05-17 南栖仙策(南京)科技有限公司 A kind of training system for automatic Pilot control strategy
US20190163176A1 (en) * 2017-11-30 2019-05-30 drive.ai Inc. Method for transferring control of an autonomous vehicle to a remote operator
US20190212749A1 (en) * 2018-01-07 2019-07-11 Nvidia Corporation Guiding vehicles through vehicle maneuvers using machine learning models
US20190266418A1 (en) * 2018-02-27 2019-08-29 Nvidia Corporation Real-time detection of lanes and boundaries by autonomous vehicles
US20190317499A1 (en) * 2016-08-08 2019-10-17 Hitachi Automotive Systems, Ltd. Automatic Driving Device
US20200026283A1 (en) * 2016-09-21 2020-01-23 Oxford University Innovation Limited Autonomous route determination

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103605285A (en) * 2013-11-21 2014-02-26 南京理工大学 Fuzzy nerve network control method for automobile driving robot system
CN104049640B (en) * 2014-06-27 2016-06-15 金陵科技学院 Unmanned vehicle attitude robust fault tolerant control method based on Neural Network Observer
CN104199437A (en) * 2014-08-15 2014-12-10 上海交通大学 Parameter Optimization Method of Fractional Order PIλDμ Controller Based on Regional Pole Index
CN105488528B (en) * 2015-11-26 2019-06-07 北京工业大学 Neural network image classification method based on improving expert inquiry method
CN107506830A (en) * 2017-06-20 2017-12-22 同济大学 Towards the artificial intelligence training platform of intelligent automobile programmed decision-making module

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170262790A1 (en) * 2016-03-11 2017-09-14 Route4Me, Inc. Complex dynamic route sequencing for multi-vehicle fleets using traffic and real-world constraints
US20170372431A1 (en) * 2016-06-24 2017-12-28 Swiss Reinsurance Company Ltd. Autonomous or partially autonomous motor vehicles with automated risk-controlled systems and corresponding method thereof
US20190317499A1 (en) * 2016-08-08 2019-10-17 Hitachi Automotive Systems, Ltd. Automatic Driving Device
US20180048801A1 (en) * 2016-08-09 2018-02-15 Contrast, Inc. Real-time hdr video for vehicle control
US20200026283A1 (en) * 2016-09-21 2020-01-23 Oxford University Innovation Limited Autonomous route determination
US20180164825A1 (en) * 2016-12-09 2018-06-14 Zendrive, Inc. Method and system for risk modeling in autonomous vehicles
US20180293514A1 (en) * 2017-04-11 2018-10-11 International Business Machines Corporation New rule creation using mdp and inverse reinforcement learning
US20190122378A1 (en) * 2017-04-17 2019-04-25 The United States Of America, As Represented By The Secretary Of The Navy Apparatuses and methods for machine vision systems including creation of a point cloud model and/or three dimensional model based on multiple images from different perspectives and combination of depth cues from camera motion and defocus with various applications including navigation systems, and pattern matching systems as well as estimating relative blur between images for use in depth from defocus or autofocusing applications
CN107609633A (en) 2017-05-03 2018-01-19 同济大学 The position prediction model construction method of vehicle traveling influence factor based on deep learning in car networking complex network
US20180373997A1 (en) * 2017-06-21 2018-12-27 International Business Machines Corporation Automatically state adjustment in reinforcement learning
US20190146508A1 (en) * 2017-11-14 2019-05-16 Uber Technologies, Inc. Dynamic vehicle routing using annotated maps and profiles
US20190163176A1 (en) * 2017-11-30 2019-05-30 drive.ai Inc. Method for transferring control of an autonomous vehicle to a remote operator
CN107862346A (en) 2017-12-01 2018-03-30 驭势科技(北京)有限公司 A kind of method and apparatus for carrying out driving strategy model training
US20190212749A1 (en) * 2018-01-07 2019-07-11 Nvidia Corporation Guiding vehicles through vehicle maneuvers using machine learning models
US20190266418A1 (en) * 2018-02-27 2019-08-29 Nvidia Corporation Real-time detection of lanes and boundaries by autonomous vehicles
CN108447076A (en) 2018-03-16 2018-08-24 清华大学 Multi-object tracking method based on depth enhancing study
CN109765820A (en) 2019-01-14 2019-05-17 南栖仙策(南京)科技有限公司 A kind of training system for automatic Pilot control strategy

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"International Search Report (Form PCT/ISA/210) of PCT/CN2019/095711", dated Oct. 14, 2019, with English translation thereof, pp. 1-4.

Also Published As

Publication number Publication date
WO2020147276A1 (en) 2020-07-23
US20200372822A1 (en) 2020-11-26
CN109765820A (en) 2019-05-17
CN109765820B (en) 2019-08-09

Similar Documents

Publication Publication Date Title
US11062617B2 (en) Training system for autonomous driving control policy
CN112668235B (en) Robot control method based on DDPG algorithm of offline model pre-training learning
Salvato et al. Crossing the reality gap: A survey on sim-to-real transferability of robot controllers in reinforcement learning
US11429854B2 (en) Method and device for a computerized mechanical device
Beliaev et al. Imitation learning by estimating expertise of demonstrators
WO2021103834A1 (en) Method for generating lane changing decision model, lane changing decision method for driverless vehicle, and device
Li et al. Infogail: Interpretable imitation learning from visual demonstrations
CN108819948B (en) Driver behavior modeling method based on reverse reinforcement learning
Rehder et al. Lane change intention awareness for assisted and automated driving on highways
CN113826051A (en) Generating digital twins of interactions between solid system parts
EP4150426A2 (en) Tools for performance testing and/or training autonomous vehicle planners
Gopalan et al. Simultaneously learning transferable symbols and language groundings from perceptual data for instruction following
CN106096729A (en) A kind of towards the depth-size strategy learning method of complex task in extensive environment
CN108791302B (en) Driver behavior modeling system
CN112106060A (en) Control strategy determination method and system
Levine Motor skill learning with local trajectory methods
CN116353623A (en) Driving control method based on self-supervision imitation learning
Hilleli et al. Toward deep reinforcement learning without a simulator: An autonomous steering example
Arbabi et al. Planning for autonomous driving via interaction-aware probabilistic action policies
CN113627249A (en) Navigation system training method and device based on confrontation contrast learning and navigation system
Yılmaz et al. Deep deterministic policy gradient reinforcement learning for collision-free navigation of mobile robots in unknown environments
Dewantara Building a socially acceptable navigation and behavior of a mobile robot using Q-learning
Floyd et al. Building learning by observation agents using jloaf
Zhang et al. Traversability-aware legged navigation by learning from real-world visual data
Geiger et al. Experimental and causal view on information integration in autonomous agents

Legal Events

Date Code Title Description
FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

AS Assignment

Owner name: POLIXIR TECHNOLOGIES LIMITED, CHINA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:QIN, RONGJUN;REEL/FRAME:053480/0230

Effective date: 20200714

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT RECEIVED

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2551); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

Year of fee payment: 4