US11062617B2 - Training system for autonomous driving control policy - Google Patents
Training system for autonomous driving control policy Download PDFInfo
- Publication number
- US11062617B2 US11062617B2 US16/968,608 US201916968608A US11062617B2 US 11062617 B2 US11062617 B2 US 11062617B2 US 201916968608 A US201916968608 A US 201916968608A US 11062617 B2 US11062617 B2 US 11062617B2
- Authority
- US
- United States
- Prior art keywords
- control policy
- simulator
- policy
- driving
- unmanned vehicle
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000012549 training Methods 0.000 title claims abstract description 20
- 238000012546 transfer Methods 0.000 claims abstract description 13
- 238000010801 machine learning Methods 0.000 claims abstract description 6
- 238000000034 method Methods 0.000 claims description 26
- 230000006870 function Effects 0.000 claims description 19
- 230000009471 action Effects 0.000 claims description 18
- 230000008569 process Effects 0.000 claims description 17
- 230000033001 locomotion Effects 0.000 claims description 13
- 238000013528 artificial neural network Methods 0.000 claims description 11
- 230000002787 reinforcement Effects 0.000 claims description 11
- 238000012937 correction Methods 0.000 claims description 9
- 238000011156 evaluation Methods 0.000 claims description 7
- 230000008447 perception Effects 0.000 claims description 6
- 230000003068 static effect Effects 0.000 claims description 5
- 230000007704 transition Effects 0.000 claims description 4
- 238000013461 design Methods 0.000 claims description 3
- 238000010276 construction Methods 0.000 abstract description 8
- 238000004088 simulation Methods 0.000 description 4
- 238000013459 approach Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 238000003066 decision tree Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003542 behavioural effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B19/00—Programme-control systems
- G05B19/02—Programme-control systems electric
- G05B19/04—Programme control other than numerical control, i.e. in sequence controllers or logic controllers
- G05B19/042—Programme control other than numerical control, i.e. in sequence controllers or logic controllers using digital processors
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/12—Computing arrangements based on biological models using genetic models
- G06N3/126—Evolutionary algorithms, e.g. genetic algorithms or genetic programming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
- G06N5/022—Knowledge engineering; Knowledge acquisition
- G06N5/025—Extracting rules from data
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B19/00—Teaching not covered by other main groups of this subclass
- G09B19/16—Control of vehicles or other craft
- G09B19/167—Control of land vehicles
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B9/00—Simulators for teaching or training purposes
- G09B9/02—Simulators for teaching or training purposes for teaching control of vehicles or other craft
- G09B9/04—Simulators for teaching or training purposes for teaching control of vehicles or other craft for teaching control of land vehicles
- G09B9/05—Simulators for teaching or training purposes for teaching control of vehicles or other craft for teaching control of land vehicles the view from a vehicle being simulated
Definitions
- the present invention relates to a training system for an autonomous driving control policy, which is used to control unmanned devices such as unmanned vehicles, robots and UAV, and belongs to the technical field of autonomous driving.
- Autonomous driving aims to eventually replace drivers from assisting the drivers in driving to realize safe, compliant and convenient personal autonomous traffic systems.
- most driving control policies are based on manual rule schemes or real-time planning schemes. These existing schemes are not intelligent and have serious defects in realizing safe driving, and an autonomous driving control policy which covers all scenes, especially extreme scenes has not yet been designed.
- Driving data of the drivers are acquired to train a model by supervised learning, so that outputs of the model are similar to human driving habits.
- a large amount of driving data needs to be collected for model training, which involves a lot of human participation; and the large amount of collected driving data includes little extreme scene data, as a result, the model still cannot cover all driving scenes.
- the model trained by supervised learning has a blind scene area and cannot complete driving tasks smoothly when used in unseen scenes.
- Reinforcement learning can improve the decision-making capacity of intelligent agents by performing interactive trial-and-error between the intelligent agents and the environment to make sure that the intelligent agents gradually learn the optimal control policy in the environment to autonomously perform control.
- lots of interactive trial-and-error needs to be performed between the intelligent agents and the environment; and in an actual autonomous driving scene, unmanned vehicles are required to perform a large quantity of independent explorations in the physical world.
- unmanned vehicles are required to perform a large quantity of independent explorations in the physical world.
- Such approach is extremely dangerous and costly.
- the present invention provides a training system for generating a safe and autonomous driving control policy to solve the problems in the prior art and to overcome the shortcomings in the prior art.
- a training system for an autonomous driving control policy comprises three modules of a construction of a simulator, a policy search, and a policy transfer;
- Construction of the simulator a simulation to static factors such as power systems of vehicles and driving roads as well as a simulation to dynamic factors such as pedestrians, non-motor vehicles, and surrounding vehicles are involved;
- the objective function includes a destination determination value for determining whether or not a vehicle has arrived at a destination, a compliance determination value for determining whether or not the vehicle has violated traffic regulations in the driving process, a safety determination value for determining whether or not the vehicle has been collided in the driving process, and a comfort determination value for determining whether or not the vehicle has excessively accelerated in the driving process, and is obtained by means of weighted summation of all the determination values; and
- Policy transfer the policy searched out in the simulator is retrained according to data acquired by an unmanned vehicle entity to obtain a driving control policy used for the unmanned vehicle entity.
- the dynamic factors in the road videos are detected by means of a manual annotation method or an object detection algorithm
- surrounding information S(o,t) and position information L(o,t) of each dynamic factor o at all times t are extracted, the surrounding information S(o,t) and position movement information L(o,t) ⁇ L(o,t ⁇ 1) are paired, that is, S(o,t) is marked as L(o,t) ⁇ L(o,t ⁇ 1), and a labeled data set including all the dynamic factors at all the times is constructed;
- a prediction model H which inputs a prediction value of S(o,t) and outputs a prediction value of L(o,t) ⁇ L(o,t ⁇ 1) is trained from the labeled data set by means of a supervised learning method such as a deep neural network learning algorithm or a decision tree learning algorithm; and
- the prediction model is generated for each dynamic factor and can predict the difference between the current position and the next position of the dynamic factor according to an input state, and accordingly, the dynamic factors have the capability to respond to the environment, and it is unnecessary to keep the road scenes in the simulator completely consistent with the scenes captured in the videos.
- An autonomous driving control policy aims to perform continuous control according to continuously input perceptual information to form a driving process.
- parameters of a policy model are designed, for example, a multi-layer feedforward neural network, a convolution neural network, or a residual network is used as an implementation model of the control policy, and the control policy parameters are determined as connection weights among units of the neural network through training; and
- the parameters of the policy model of the maximum evaluation value are searched for by means of an evolutionary algorithm or a reinforcement learning algorithm in a space defined by the parameters of the policy model.
- the search process generally comprises the following steps:
- Step 4 Updating a population by means of the evolutionary algorithm according to the result obtained in Step 3; or, updating a driving policy model by means of a reinforcement learning method;
- a control action sequence (a1, a2, a3, . . . , an) is executed on the unmanned vehicle entity, and perception states (s0, s1, s2, s3, . . . , sn) of all executed action are collected;
- an initial state is set as s0, and the same action sequence (a1, a2, a3, . . . , an) is executed; and perception states (s0, u1, u2, u3, . . . , un) are acquired;
- g is trained by means of the evolutionary algorithm or the reinforcement learning method to make sure that the data from the unmanned vehicle entity is similar to data from the simulator as far as possible, that is, ⁇ i (si ⁇ ui) 2 is minimized.
- control policy ⁇ obtained through training in the simulator is directly used for the unmanned vehicle entity.
- FIG. 1 is a block diagram of main modules of a training system for an autonomous driving control policy.
- a training system for an autonomous driving control policy mainly comprises and is technically characterized by three modules including a construction of a simulator, a policy search, and a policy transfer, as shown in FIG. 1 .
- the construction of the simulator module includes a simulation to static factors such as power systems of vehicles and driving roads as well as a simulation to dynamic factors such as pedestrians, non-motor vehicles, and surrounding vehicles.
- the policy search module sets an objective function in a constructed simulator and then searches for a driving control policy of the optimal objective function by means of a machine learning method, wherein the objective function includes a destination determination value for determining whether or not a vehicle has arrived at a destination, a compliance determination value for determining whether or not the vehicle has violated traffic regulations in the driving process, a safety determination value for determining whether or not the vehicle has been collided in the driving process, and a comfort determination value for determining whether or not the vehicle has excessively accelerated in the driving process, and is obtained by means of weighted summation of all the determination values.
- the objective function includes a destination determination value for determining whether or not a vehicle has arrived at a destination, a compliance determination value for determining whether or not the vehicle has violated traffic regulations in the driving process, a safety determination value for determining whether or not the vehicle has been collided in the driving process, and a comfort determination value for determining whether or not the vehicle has excessively accelerated in the driving process, and is obtained by means of
- the policy transfer module retrains the policy searched out in the simulator according to data acquired by an unmanned vehicle entity to obtain a driving control policy used for the unmanned vehicle entity.
- videos of vehicles, pedestrians, and non-motor vehicles on roads in different scenes are captured by a traffic camera, a high-altitude camera, a UAV, or other devices;
- the dynamic factors in the road videos are detected by means of a manual annotation method or an object detection algorithm, and the position sequence of each dynamic factor is constructed;
- the position sequences of the dynamic factors are played in the simulator to generate a motion trajectory of the dynamic factors.
- Embodiment 1 the motion trajectory of the captured dynamic factors is replayed in the simulator, and such approach has the following defects: first, road scenes in the simulator should be consistent with the scenes captured in the videos; and second, the dynamic factors do not have a capability to respond to the environment and are merely replayed.
- An improved solution based on a machine learning method is described below.
- the road videos are captured by the traffic camera, the high-altitude camera, the UAV, or other devices;
- the dynamic factors in the road videos are detected by means of the manual annotation method or the object detection algorithm;
- surrounding information S(o,t) including information of static factors visible at 360° around the dynamic factor, information of the rest of the dynamic factors, and the like
- position information L(o,t) of each dynamic factor o at all times t are extracted the surrounding information S(o,t) is paired with position movement information L(o,t) ⁇ L(o,t ⁇ 1), that is, S(o,t) is marked as L(o,t) ⁇ L(o,t ⁇ 1), and a labeled data set including all the dynamic factors at all the times is constructed;
- a prediction model H which inputs a prediction value of S(o,t) and outputs a prediction value of L(o,t) ⁇ L(o,t ⁇ 1) is trained from the labeled data set by means of a supervised learning method such as a deep neural network learning algorithm or a decision tree learning algorithm; and
- the prediction model is generated for each dynamic factor and can predict the difference between the current position and the next position of the dynamic factor according to an input state, and accordingly, the dynamic factors have the capability to respond to the environment, and it is unnecessary to keep the road scenes in the simulator completely consistent with the scenes captured in the videos.
- An autonomous driving control policy aims to perform continuous control according to continuously input perceptual information to form a driving process.
- an objective function is designed as a weighted sum of a destination determination value for determining whether or not the vehicle has arrived at the destination, a compliance determination value for determining whether or not the vehicle has violated traffic regulations, a safety determination value for determining whether or not the vehicle has been collided in the driving process, and a comfort determination value for determining whether or not the vehicle has excessively accelerated in the driving process.
- the destination determination value is equal to 1; when the vehicle has been collided, ⁇ 100 is added to the safety determination value; if the vehicle has violated traffic regulations, ⁇ 1 is added to the compliance determination value; if the vehicle has excessively accelerated or decelerated, or has driven at a large angular speed, ⁇ 0.01 is added to the comfort determination value, and finally these values are added together to obtain an evaluation index for marking each driving process.
- control policy model for example, a multi-layer feedforward neural network, a convolution neural network, or a residual network is used as an implementation model of the control policy, it is necessary to further determine, through training, the control policy parameters as connection weights among units of the neural network.
- the policy model parameters of the maximum evaluation value are searched for by means of an evolutionary algorithm or a reinforcement learning algorithm in a space defined by the policy model parameters.
- the search process generally comprises the following steps:
- Random control policy parameters are generated to obtain an initial control policy ⁇ k ;
- the initial control policy ⁇ k is run in the simulator to obtain a motion trajectory of an unmanned vehicle in the simulator and to respectively evaluate a destination determination value, a safety determination value, a compliance determination value, and a comfort determination value of the motion trajectory, and these values are added together to obtain a result of an evaluation index after running the control policy;
- a population is updated by means of the evolutionary algorithm according to the result obtained in Step 3; or, a driving policy model is updated by means of a reinforcement learning method;
- Step 2 is repeated until all cycles are completed.
- a control action sequence (a1, a2, a3, . . . , an) is executed on the unmanned vehicle entity, and perception states (s0, s1, s2, s3, . . . , sn) of all executed action are collected;
- an initial state is set as s0, and the same action sequence (a1, a2, a3, . . . , an) is executed; and perception states (s, u1, u2, u3, . . . , un) are collected;
- g is trained by means of the evolutionary algorithm or the reinforcement learning method to make sure that the data from the unmanned vehicle entity is similar to data from the simulator as far as possible, that is, ⁇ i (si ⁇ ui) 2 is minimized.
- control policy ⁇ obtained through training in the simulator is directly used for the unmanned vehicle entity.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Business, Economics & Management (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Educational Technology (AREA)
- Educational Administration (AREA)
- Aviation & Aerospace Engineering (AREA)
- Entrepreneurship & Innovation (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Automation & Control Theory (AREA)
- Physiology (AREA)
- Genetics & Genomics (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Feedback Control In General (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
Claims (4)
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910030302.6 | 2019-01-14 | ||
CN201910030302.6A CN109765820B (en) | 2019-01-14 | 2019-01-14 | A kind of training system for automatic Pilot control strategy |
PCT/CN2019/095711 WO2020147276A1 (en) | 2019-01-14 | 2019-07-12 | Training system for automatic driving control strategy |
Publications (2)
Publication Number | Publication Date |
---|---|
US20200372822A1 US20200372822A1 (en) | 2020-11-26 |
US11062617B2 true US11062617B2 (en) | 2021-07-13 |
Family
ID=66453751
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/968,608 Active US11062617B2 (en) | 2019-01-14 | 2019-07-12 | Training system for autonomous driving control policy |
Country Status (3)
Country | Link |
---|---|
US (1) | US11062617B2 (en) |
CN (1) | CN109765820B (en) |
WO (1) | WO2020147276A1 (en) |
Families Citing this family (35)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109765820B (en) * | 2019-01-14 | 2019-08-09 | 南栖仙策(南京)科技有限公司 | A kind of training system for automatic Pilot control strategy |
CN110322017A (en) * | 2019-08-13 | 2019-10-11 | 吉林大学 | Automatic Pilot intelligent vehicle Trajectory Tracking Control strategy based on deeply study |
CN111222630B (en) * | 2020-01-17 | 2023-07-25 | 北京工业大学 | A Learning Method for Autonomous Driving Rules Based on Deep Reinforcement Learning |
CN111258314B (en) * | 2020-01-20 | 2022-07-15 | 中国科学院深圳先进技术研究院 | Collaborative evolution-based decision-making emergence method for automatic driving vehicle |
CN111310919B (en) * | 2020-02-08 | 2020-10-16 | 南栖仙策(南京)科技有限公司 | Driving control strategy training method based on scene segmentation and local path planning |
CN111324358B (en) * | 2020-02-14 | 2020-10-16 | 南栖仙策(南京)科技有限公司 | Training method for automatic operation and maintenance strategy of information system |
CN111339675B (en) * | 2020-03-10 | 2020-12-01 | 南栖仙策(南京)科技有限公司 | Training method for intelligent marketing strategy based on machine learning simulation environment |
CN112700642B (en) * | 2020-12-19 | 2022-09-23 | 北京工业大学 | A method for improving traffic efficiency by using intelligent networked vehicles |
CN112650240B (en) * | 2020-12-21 | 2024-08-20 | 深圳大学 | Automatic driving method for training multi-agent multi-scene data set |
CN112651446B (en) * | 2020-12-29 | 2023-04-14 | 杭州趣链科技有限公司 | Unmanned automobile training method based on alliance chain |
CN112906126B (en) * | 2021-01-15 | 2023-04-07 | 北京航空航天大学 | Vehicle hardware in-loop simulation training system and method based on deep reinforcement learning |
CN112395777B (en) * | 2021-01-21 | 2021-04-16 | 南栖仙策(南京)科技有限公司 | Engine calibration parameter optimization method based on automobile exhaust emission simulation environment |
CN113110592B (en) * | 2021-04-23 | 2022-09-23 | 南京大学 | Unmanned aerial vehicle obstacle avoidance and path planning method |
CN113276883B (en) * | 2021-04-28 | 2023-04-21 | 南京大学 | Driving strategy planning method and implementation device for unmanned vehicles based on dynamic generation environment |
CN113050433B (en) * | 2021-05-31 | 2021-09-14 | 中国科学院自动化研究所 | Robot control strategy migration method, device and system |
CN117441174A (en) * | 2021-05-31 | 2024-01-23 | 罗伯特·博世有限公司 | Method and apparatus for training a neural network for imitating the behavior of a presenter |
CN113741420B (en) * | 2021-07-28 | 2023-12-19 | 浙江工业大学 | A data-driven sampling search method and system |
CN113743469B (en) * | 2021-08-04 | 2024-05-28 | 北京理工大学 | Automatic driving decision method integrating multi-source data and comprehensive multi-dimensional indexes |
CN113885491A (en) * | 2021-08-29 | 2022-01-04 | 北京工业大学 | Unmanned decision-making and control method based on federal deep reinforcement learning |
CN113934966B (en) * | 2021-09-17 | 2024-07-26 | 北京理工大学 | Method for using graph convolution reinforcement learning to minimize information age in group perception |
CN113848913B (en) * | 2021-09-28 | 2023-01-06 | 北京三快在线科技有限公司 | Control method and control device of unmanned equipment |
CN113837063B (en) * | 2021-10-15 | 2024-05-10 | 中国石油大学(华东) | Reinforcement learning-based curling motion field analysis and auxiliary decision-making method |
CN114489712A (en) * | 2021-12-22 | 2022-05-13 | 中智行(苏州)科技有限公司 | A production method for training data of unmanned automatic model |
CN114179835B (en) * | 2021-12-30 | 2024-01-05 | 清华大学苏州汽车研究院(吴江) | Automatic driving vehicle decision training method based on reinforcement learning in real scene |
CN114384901B (en) * | 2022-01-12 | 2022-09-06 | 浙江中智达科技有限公司 | Reinforced learning aided driving decision-making method oriented to dynamic traffic environment |
CN114117829B (en) * | 2022-01-24 | 2022-04-22 | 清华大学 | Dynamic modeling method and system for man-vehicle-road closed loop system under limit working condition |
CN114104005B (en) * | 2022-01-26 | 2022-04-19 | 苏州浪潮智能科技有限公司 | Decision-making method, device and equipment of automatic driving equipment and readable storage medium |
CN114510012B (en) * | 2022-02-16 | 2024-11-29 | 中国电子科技集团公司第五十四研究所 | Unmanned cluster evolution system and method based on meta-action sequence reinforcement learning |
CN114580302A (en) * | 2022-03-16 | 2022-06-03 | 重庆大学 | Decision planning method for automatic driving automobile based on maximum entropy reinforcement learning |
CN114771561B (en) * | 2022-03-31 | 2025-05-30 | 中国人民解放军陆军工程大学 | A method, device and storage medium for generating a strategy for autonomous driving |
CN115437924B (en) * | 2022-08-17 | 2025-07-22 | 电子科技大学 | Uncertainty estimation method of end-to-end automatic driving decision algorithm |
CN115512554B (en) * | 2022-09-02 | 2023-07-28 | 北京百度网讯科技有限公司 | Parameter model training and traffic signal control method, device, equipment and medium |
CN115761144B (en) * | 2022-12-08 | 2024-06-04 | 上海人工智能创新中心 | Automatic driving strategy pre-training method based on self-supervision geometric modeling |
CN116842698B (en) * | 2023-05-31 | 2024-08-09 | 华能伊敏煤电有限责任公司 | Unmanned transportation simulation test method |
CN118323198B (en) * | 2024-06-13 | 2024-08-27 | 新石器慧通(北京)科技有限公司 | Training and using method and device of decision model in automatic driving vehicle and vehicle |
Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170262790A1 (en) * | 2016-03-11 | 2017-09-14 | Route4Me, Inc. | Complex dynamic route sequencing for multi-vehicle fleets using traffic and real-world constraints |
US20170372431A1 (en) * | 2016-06-24 | 2017-12-28 | Swiss Reinsurance Company Ltd. | Autonomous or partially autonomous motor vehicles with automated risk-controlled systems and corresponding method thereof |
CN107609633A (en) | 2017-05-03 | 2018-01-19 | 同济大学 | The position prediction model construction method of vehicle traveling influence factor based on deep learning in car networking complex network |
US20180048801A1 (en) * | 2016-08-09 | 2018-02-15 | Contrast, Inc. | Real-time hdr video for vehicle control |
CN107862346A (en) | 2017-12-01 | 2018-03-30 | 驭势科技(北京)有限公司 | A kind of method and apparatus for carrying out driving strategy model training |
US20180164825A1 (en) * | 2016-12-09 | 2018-06-14 | Zendrive, Inc. | Method and system for risk modeling in autonomous vehicles |
CN108447076A (en) | 2018-03-16 | 2018-08-24 | 清华大学 | Multi-object tracking method based on depth enhancing study |
US20180293514A1 (en) * | 2017-04-11 | 2018-10-11 | International Business Machines Corporation | New rule creation using mdp and inverse reinforcement learning |
US20180373997A1 (en) * | 2017-06-21 | 2018-12-27 | International Business Machines Corporation | Automatically state adjustment in reinforcement learning |
US20190122378A1 (en) * | 2017-04-17 | 2019-04-25 | The United States Of America, As Represented By The Secretary Of The Navy | Apparatuses and methods for machine vision systems including creation of a point cloud model and/or three dimensional model based on multiple images from different perspectives and combination of depth cues from camera motion and defocus with various applications including navigation systems, and pattern matching systems as well as estimating relative blur between images for use in depth from defocus or autofocusing applications |
US20190146508A1 (en) * | 2017-11-14 | 2019-05-16 | Uber Technologies, Inc. | Dynamic vehicle routing using annotated maps and profiles |
CN109765820A (en) | 2019-01-14 | 2019-05-17 | 南栖仙策(南京)科技有限公司 | A kind of training system for automatic Pilot control strategy |
US20190163176A1 (en) * | 2017-11-30 | 2019-05-30 | drive.ai Inc. | Method for transferring control of an autonomous vehicle to a remote operator |
US20190212749A1 (en) * | 2018-01-07 | 2019-07-11 | Nvidia Corporation | Guiding vehicles through vehicle maneuvers using machine learning models |
US20190266418A1 (en) * | 2018-02-27 | 2019-08-29 | Nvidia Corporation | Real-time detection of lanes and boundaries by autonomous vehicles |
US20190317499A1 (en) * | 2016-08-08 | 2019-10-17 | Hitachi Automotive Systems, Ltd. | Automatic Driving Device |
US20200026283A1 (en) * | 2016-09-21 | 2020-01-23 | Oxford University Innovation Limited | Autonomous route determination |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103605285A (en) * | 2013-11-21 | 2014-02-26 | 南京理工大学 | Fuzzy nerve network control method for automobile driving robot system |
CN104049640B (en) * | 2014-06-27 | 2016-06-15 | 金陵科技学院 | Unmanned vehicle attitude robust fault tolerant control method based on Neural Network Observer |
CN104199437A (en) * | 2014-08-15 | 2014-12-10 | 上海交通大学 | Parameter Optimization Method of Fractional Order PIλDμ Controller Based on Regional Pole Index |
CN105488528B (en) * | 2015-11-26 | 2019-06-07 | 北京工业大学 | Neural network image classification method based on improving expert inquiry method |
CN107506830A (en) * | 2017-06-20 | 2017-12-22 | 同济大学 | Towards the artificial intelligence training platform of intelligent automobile programmed decision-making module |
-
2019
- 2019-01-14 CN CN201910030302.6A patent/CN109765820B/en active Active
- 2019-07-12 WO PCT/CN2019/095711 patent/WO2020147276A1/en active Application Filing
- 2019-07-12 US US16/968,608 patent/US11062617B2/en active Active
Patent Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170262790A1 (en) * | 2016-03-11 | 2017-09-14 | Route4Me, Inc. | Complex dynamic route sequencing for multi-vehicle fleets using traffic and real-world constraints |
US20170372431A1 (en) * | 2016-06-24 | 2017-12-28 | Swiss Reinsurance Company Ltd. | Autonomous or partially autonomous motor vehicles with automated risk-controlled systems and corresponding method thereof |
US20190317499A1 (en) * | 2016-08-08 | 2019-10-17 | Hitachi Automotive Systems, Ltd. | Automatic Driving Device |
US20180048801A1 (en) * | 2016-08-09 | 2018-02-15 | Contrast, Inc. | Real-time hdr video for vehicle control |
US20200026283A1 (en) * | 2016-09-21 | 2020-01-23 | Oxford University Innovation Limited | Autonomous route determination |
US20180164825A1 (en) * | 2016-12-09 | 2018-06-14 | Zendrive, Inc. | Method and system for risk modeling in autonomous vehicles |
US20180293514A1 (en) * | 2017-04-11 | 2018-10-11 | International Business Machines Corporation | New rule creation using mdp and inverse reinforcement learning |
US20190122378A1 (en) * | 2017-04-17 | 2019-04-25 | The United States Of America, As Represented By The Secretary Of The Navy | Apparatuses and methods for machine vision systems including creation of a point cloud model and/or three dimensional model based on multiple images from different perspectives and combination of depth cues from camera motion and defocus with various applications including navigation systems, and pattern matching systems as well as estimating relative blur between images for use in depth from defocus or autofocusing applications |
CN107609633A (en) | 2017-05-03 | 2018-01-19 | 同济大学 | The position prediction model construction method of vehicle traveling influence factor based on deep learning in car networking complex network |
US20180373997A1 (en) * | 2017-06-21 | 2018-12-27 | International Business Machines Corporation | Automatically state adjustment in reinforcement learning |
US20190146508A1 (en) * | 2017-11-14 | 2019-05-16 | Uber Technologies, Inc. | Dynamic vehicle routing using annotated maps and profiles |
US20190163176A1 (en) * | 2017-11-30 | 2019-05-30 | drive.ai Inc. | Method for transferring control of an autonomous vehicle to a remote operator |
CN107862346A (en) | 2017-12-01 | 2018-03-30 | 驭势科技(北京)有限公司 | A kind of method and apparatus for carrying out driving strategy model training |
US20190212749A1 (en) * | 2018-01-07 | 2019-07-11 | Nvidia Corporation | Guiding vehicles through vehicle maneuvers using machine learning models |
US20190266418A1 (en) * | 2018-02-27 | 2019-08-29 | Nvidia Corporation | Real-time detection of lanes and boundaries by autonomous vehicles |
CN108447076A (en) | 2018-03-16 | 2018-08-24 | 清华大学 | Multi-object tracking method based on depth enhancing study |
CN109765820A (en) | 2019-01-14 | 2019-05-17 | 南栖仙策(南京)科技有限公司 | A kind of training system for automatic Pilot control strategy |
Non-Patent Citations (1)
Title |
---|
"International Search Report (Form PCT/ISA/210) of PCT/CN2019/095711", dated Oct. 14, 2019, with English translation thereof, pp. 1-4. |
Also Published As
Publication number | Publication date |
---|---|
WO2020147276A1 (en) | 2020-07-23 |
US20200372822A1 (en) | 2020-11-26 |
CN109765820A (en) | 2019-05-17 |
CN109765820B (en) | 2019-08-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11062617B2 (en) | Training system for autonomous driving control policy | |
CN112668235B (en) | Robot control method based on DDPG algorithm of offline model pre-training learning | |
Salvato et al. | Crossing the reality gap: A survey on sim-to-real transferability of robot controllers in reinforcement learning | |
US11429854B2 (en) | Method and device for a computerized mechanical device | |
Beliaev et al. | Imitation learning by estimating expertise of demonstrators | |
WO2021103834A1 (en) | Method for generating lane changing decision model, lane changing decision method for driverless vehicle, and device | |
Li et al. | Infogail: Interpretable imitation learning from visual demonstrations | |
CN108819948B (en) | Driver behavior modeling method based on reverse reinforcement learning | |
Rehder et al. | Lane change intention awareness for assisted and automated driving on highways | |
CN113826051A (en) | Generating digital twins of interactions between solid system parts | |
EP4150426A2 (en) | Tools for performance testing and/or training autonomous vehicle planners | |
Gopalan et al. | Simultaneously learning transferable symbols and language groundings from perceptual data for instruction following | |
CN106096729A (en) | A kind of towards the depth-size strategy learning method of complex task in extensive environment | |
CN108791302B (en) | Driver behavior modeling system | |
CN112106060A (en) | Control strategy determination method and system | |
Levine | Motor skill learning with local trajectory methods | |
CN116353623A (en) | Driving control method based on self-supervision imitation learning | |
Hilleli et al. | Toward deep reinforcement learning without a simulator: An autonomous steering example | |
Arbabi et al. | Planning for autonomous driving via interaction-aware probabilistic action policies | |
CN113627249A (en) | Navigation system training method and device based on confrontation contrast learning and navigation system | |
Yılmaz et al. | Deep deterministic policy gradient reinforcement learning for collision-free navigation of mobile robots in unknown environments | |
Dewantara | Building a socially acceptable navigation and behavior of a mobile robot using Q-learning | |
Floyd et al. | Building learning by observation agents using jloaf | |
Zhang et al. | Traversability-aware legged navigation by learning from real-world visual data | |
Geiger et al. | Experimental and causal view on information integration in autonomous agents |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
AS | Assignment |
Owner name: POLIXIR TECHNOLOGIES LIMITED, CHINA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:QIN, RONGJUN;REEL/FRAME:053480/0230 Effective date: 20200714 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT RECEIVED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2551); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY Year of fee payment: 4 |