CN114523990A - Automatic driving decision-making method and device based on hierarchical reinforcement learning - Google Patents
Automatic driving decision-making method and device based on hierarchical reinforcement learning Download PDFInfo
- Publication number
- CN114523990A CN114523990A CN202210304345.0A CN202210304345A CN114523990A CN 114523990 A CN114523990 A CN 114523990A CN 202210304345 A CN202210304345 A CN 202210304345A CN 114523990 A CN114523990 A CN 114523990A
- Authority
- CN
- China
- Prior art keywords
- prediction
- reinforcement learning
- data
- track
- vehicle
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W60/00—Drive control systems specially adapted for autonomous road vehicles
- B60W60/001—Planning or execution of driving tasks
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W50/00—Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
- B60W50/0097—Predicting future conditions
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2552/00—Input parameters relating to infrastructure
- B60W2552/50—Barriers
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2554/00—Input parameters relating to objects
- B60W2554/40—Dynamic objects, e.g. animals, windblown objects
- B60W2554/404—Characteristics
- B60W2554/4041—Position
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2555/00—Input parameters relating to exterior conditions, not covered by groups B60W2552/00, B60W2554/00
- B60W2555/60—Traffic rules, e.g. speed limits or right of way
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Automation & Control Theory (AREA)
- Human Computer Interaction (AREA)
- Transportation (AREA)
- Mechanical Engineering (AREA)
- Traffic Control Systems (AREA)
Abstract
The invention provides an automatic driving decision method and device based on hierarchical reinforcement learning, which comprises the following steps: acquiring upstream data, wherein the upstream data comprises perception fusion data, positioning data and control data; inputting the upstream data into a deep learning model, and outputting to obtain a receptive field model and a first prediction track; inputting the receptive field model and the first predicted track into a reinforcement learning algorithm, and outputting to obtain a first planned track; controlling the vehicle to execute corresponding operation according to the first planned track; the deep learning model and the reinforcement learning algorithm are combined, so that the stability and the foresight of decision making are ensured, and the calculation consumption of prediction is reduced.
Description
Technical Field
The invention relates to the field of automatic driving, in particular to an automatic driving decision method and device based on hierarchical reinforcement learning.
Background
The decision making technology currently applied to automatic driving is mainly a decision making method based on a state machine. The method uses the state transition of logic inference to enter the lower layer decision from the given logic of an upper layer, such as: it is a first decision whether to eat or not and then what to eat. The decision of the state machine is stable and reliable, but the state machine is continuously expanded and complicated with the increase of the scenes needing to be used. In the field of automatic driving, traffic scenes to be covered are diversified, state machines do not have good generalization no matter in depth or breadth, and the number of codes and the maintenance difficulty are exponentially increased along with the increase of automatic driving tasks. If the upper layer decision changes, the lower layer decision needs to be perfected one by one. Therefore, there is a limit to applicability and versatility.
The model can be more generalized by using emerging technologies such as deep learning and reinforcement learning, and the model based on the deep reinforcement learning can obtain more excellent decision, but in the decision of automatic driving, due to the limitation of an upstream module, a decision module cannot obtain very accurate deduction. Meanwhile, due to the frequently used correlation optimization solving tool in artificial intelligence: the interpretability of the neural network is reduced in a series of transformation processes, and the problem of a decision module cannot be improved in a good pertinence mode. The interpretation of some semantic layers of the data in the model can be obtained more directly through layered reinforcement learning, so that the interpretability of the intelligent agent is greatly increased, and the stability and the foresight of the decision module are further increased.
However, the problem is particularly obvious in connection with prediction of decision, in which artificial intelligence techniques such as reinforcement learning are only used in a decision layer, and the divergence between the decision layer and the upstream and downstream still occurs in practical use. The prediction module generally predicts vehicles in a certain range around at present, although the receptive field of the prediction module is designed to a certain extent, for decision making, only a part of concerned vehicles are used in the using process, and some marginal vehicles can make invalid prediction in the process, so that the computational waste and the disturbance of the prediction effect are increased. Meanwhile, for important obstacles in the decision making process, more accurate prediction is often needed, but most prediction methods are the same method for the obstacles in one operation cycle, and a better effect is not obtained. For the decisions of some complex scenes, a certain bottleneck can be formed in the prediction layer, and a certain influence can be caused on the active decisions which are required to be taken by the decision module.
Disclosure of Invention
In view of the above, the present invention provides an automatic driving decision method and an automatic driving decision device based on hierarchical reinforcement learning, which combine a deep learning model and a reinforcement learning algorithm to ensure stability and foresight of decisions and reduce computational consumption of prediction.
In a first aspect, an embodiment of the present invention provides an automatic driving decision method based on hierarchical reinforcement learning, where the method includes:
acquiring upstream data, wherein the upstream data comprises perception fusion data, positioning data and control data;
inputting the upstream data into a deep learning model, and outputting to obtain a receptive field model and a first prediction track;
inputting the receptive field model and the first predicted track into a reinforcement learning algorithm, and outputting to obtain a first planning track;
and controlling the vehicle to execute corresponding operation according to the first planned track.
Further, the perception fusion data comprises obstacle coordinate information, traffic light type information and vehicle type information; the positioning data comprises position information of the vehicle and position information of surrounding vehicles; the control data includes state information of the own vehicle and steering wheel angle information.
Further, the method further comprises:
inputting the upstream data into a vehicle prediction algorithm for pre-training, and constructing a prediction model;
acquiring current upstream data;
inputting the current upstream data into the prediction model, and outputting to obtain the predicted track;
the vehicle prediction algorithm is a grid method, an LSTM method or an anchor point method.
Further, the method further comprises:
dynamically arranging and combining the receptive field model and the predicted track to obtain lower-layer input data;
training the lower-layer input data through the reinforcement learning algorithm to obtain comprehensive evaluation;
and mapping the comprehensive evaluation by a mathematical formula to obtain the evaluation of a decision module and the evaluation of a prediction module.
Further, the method further comprises:
dynamically adjusting the receptive field model in real time according to the environmental information and the decision of the previous moment to obtain an adjusted receptive field model;
predicting the adjusted receptive field model at the current moment to obtain a second predicted track of dynamic transformation;
and generating a second planning track according to the adjusted receptive field model and the second predicted track.
In a second aspect, an embodiment of the present invention provides an automatic driving decision apparatus based on hierarchical reinforcement learning, where the apparatus includes:
the prediction module is used for acquiring upstream data, and the upstream data comprises perception fusion data, positioning data and control data; inputting the upstream data into a deep learning model, and outputting to obtain a receptive field model and a first prediction track;
the decision planning module is used for inputting the receptive field model and the first predicted track into a reinforcement learning algorithm and outputting to obtain a first planned track;
and the control module is used for controlling the vehicle to execute corresponding operation according to the first planned track.
Further, the perception fusion data comprises obstacle coordinate information, traffic light type information and vehicle type information; the positioning data comprises position information of the vehicle and position information of surrounding vehicles; the control data includes state information of the own vehicle and steering wheel angle information.
Further, the apparatus further comprises:
the pre-training module is used for inputting the upstream data into a vehicle prediction algorithm for pre-training and constructing a prediction model;
the acquisition module is used for acquiring current upstream data;
the input module is used for inputting the current upstream data into the prediction model and outputting to obtain the predicted track;
the vehicle prediction algorithm is a grid method, an LSTM method or an anchor point method.
In a third aspect, an embodiment of the present invention provides an electronic device, including a memory and a processor, where the memory stores a computer program operable on the processor, and the processor implements the method described above when executing the computer program.
In a fourth aspect, embodiments of the invention provide a computer readable medium having non-volatile program code executable by a processor, the program code causing the processor to perform the method as described above.
The embodiment of the invention provides an automatic driving decision method and device based on hierarchical reinforcement learning, which comprises the following steps: acquiring upstream data, wherein the upstream data comprises perception fusion data, positioning data and control data; inputting the upstream data into a deep learning model, and outputting to obtain a receptive field model and a first prediction track; inputting the receptive field model and the first predicted track into a reinforcement learning algorithm, and outputting to obtain a first planned track; controlling the vehicle to execute corresponding operation according to the first planned track; the deep learning model and the reinforcement learning algorithm are combined, so that the stability and the foresight of decision making are ensured, and the calculation consumption of prediction is reduced.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a diagram illustrating an original receptive field model according to a first embodiment of the present invention;
FIG. 2 is a view of a dynamically changing receptive field model according to an embodiment of the present invention;
fig. 3 is a flowchart of an automatic driving decision method based on hierarchical reinforcement learning according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a prediction and decision process according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a pre-training process and an inference process according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a reinforcement learning algorithm process according to an embodiment of the present invention;
fig. 7 is a schematic diagram of an automatic driving decision device based on hierarchical reinforcement learning according to an embodiment of the present invention.
Icon:
1-a prediction module; 2-decision planning module; and 3, controlling the module.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
At present, automatic driving technology is gradually expanded into ordinary life at the level of L2/L3 as equipment and requirements are increasingly iterated, and stable perception positioning technology is no longer a rare thing. The difficulties of the automatic driving technology are gradually exposed in the prediction and decision module, and the problems of uncertainty of how to make an automatic driving intelligent agent beyond human thinking and how to make redundancy upstream of the prediction and decision module are the bottlenecks to be broken through.
With the development of deep learning technology, the prediction module obtains enough evidence in the public data set, the prediction algorithm with better effect can accurately predict the track of 3s at present, the end point displacement error and the average displacement error can be stabilized at about 1m, and the actual test performance has no obvious defect. However, in use, the prediction is often limited to the input form of prediction, and the prediction of the edge obstacle also affects the prediction accuracy of the obstacle near the vehicle to some extent. Often, the obstacle of interest is only a fraction of the range of perception for the driver of the vehicle. The data of obstacles in a large range, especially the data of edge obstacles, are increased, relatively more noises exist, all of which influence the prediction accuracy of the vehicle which needs to be concerned, and the biggest problem is that the most important obstacles of a decision module cannot be known based on the view angle of a prediction module.
The decision module is still in the exploration stage, and lacks of unified measurement and measurement, so that currently, the industry uses more artificial decision makers mainly based on a state machine and the exploration of some decision methods for reinforcement learning. The decision problem is considered from the engineering reliability point of view: only the decision mode based on the state machine is put into practical use. The method of reinforcement learning is more of the conversion exploration phase from simulation to practice.
Reinforcement Learning is a special optimization problem solution, a large number of simulation tests can be performed in a laboratory environment, the problem is solved, and the method has important value for the field of automatic driving, and a Hierarchical Reinforcement Learning technology (Hierarchical Reinforcement Learning) is a mainstream Reinforcement Learning technology, can solve a short board of a basic Reinforcement Learning algorithm in practical application, and has important research and application values for the automatic driving technology.
Although the decision model based on the deep reinforcement learning can replace a state machine to make a decision, the decision model needs to be matched with the upstream in a relevant way, and not only needs to ensure the stability and the foresight of the decision, but also needs to be matched with a prediction module in a relevant way in the decision making process.
The prediction module and the decision module are coupled, the prediction module is used as a management layer (manager) of reinforcement learning, and the decision module is used as an execution layer (controller) of reinforcement learning. The prediction and decision framework with the interaction capability among the modules is completed through the mutual combination of the prediction module and the decision module and the HRL-based framework.
Meanwhile, the dynamic adjustment of the prediction receptive field can directly influence the foresight of the decision module due to the emphasis of the decision module on different obstacles, such as: when a vehicle selects an afflux ramp, the vehicle is sure to dynamically focus on an adjacent lane instead of all vehicles in a surrounding receptive field, and at this time, the predicted focus range on the adjacent single lane is expected to be as long as possible within a certain range, rather than maintaining the original receptive field unchanged in size. When the traffic jam is caused, the vehicles in the front of the periphery are concerned more, and the lanes far away do not need to pay excessive attention, so that on one hand, the calculation power consumption of prediction can be reduced, and on the other hand, the forward-looking and holding capacity of the decision module can be improved. As shown in fig. 1 and fig. 2, the circular range in fig. 1 is an original receptive field model, the original receptive field model takes the own vehicle as the center and takes the radius r as the sensing range (the binary group a is a circle, and is combined with the decision planning module at the lower layer); the elliptical range in fig. 2 is the dynamically changing receptive field.
For the understanding of the present embodiment, the following detailed description will be given of the embodiments of the present invention.
The first embodiment is as follows:
fig. 3 is a flowchart of an automatic driving decision method based on hierarchical reinforcement learning according to an embodiment of the present invention.
Referring to fig. 3, the method includes the steps of:
step S101, obtaining upstream data, wherein the upstream data comprises perception fusion data, positioning data and control data;
here, after the upstream data is acquired, the upstream data may be subjected to data playback in the vehicle simulator. The vehicle simulator includes cara, sumo, and the like.
Step S102, inputting upstream data into a deep learning model, and outputting to obtain a receptive field model and a first prediction track;
specifically, after the upstream data is input into the deep learning model for training, the updating of the prediction module is completed, so that the prediction module obtains the prior knowledge. The parameters of the receptive field model are a binary group, which is used for the modeling of the elliptical receptive field, and subsequently, the non-convex receptive field with multiple parameter groups can be adopted; the first predicted trajectory may be a predicted trajectory of 3 s; when each vehicle outputs k tracks and corresponding probabilities, taking the track with the output probability larger than 30% as a first predicted track; wherein 30% is the dividing line.
Step S103, inputting the receptive field model and the first predicted track into a reinforcement learning algorithm, and outputting to obtain a first planned track;
specifically, a receptive field model and a first predicted track are obtained, decision inference is carried out on the combined situation of the predicted tracks of each obstacle (given that the decision of the vehicle is supposed to be executed, a decision planning module adopts a planner with better existing effect, such as MPC (model predictive control) and LQR (Linear Quadratic Regulator), and the like, and a control module can complete corresponding track execution), so that the decision planning module outputs the first planned track. And meanwhile, the decision planning module feeds back the weight of the receptive field model so as to update the parameters used for the next prediction.
And step S104, controlling the vehicle to execute corresponding operation according to the first planned track.
Specifically, referring to the schematic diagram of the prediction and decision process shown in fig. 4, in the prediction module, the upstream data is used as the input of the deep learning model, and the input is output to obtain the receptive field model and the first predicted trajectory; in the decision planning module, the receptive field model and the first predicted track are used as input of the reinforcement learning algorithm, the first planned track is obtained through output, the first planned track is sent to the control module, and the control module controls the vehicle to execute corresponding operation according to the first planned track.
Further, the perception fusion data comprises obstacle coordinate information, traffic light type information and vehicle type information; the positioning data comprises position information of the vehicle and position information of surrounding vehicles; the control data includes state information of the own vehicle and steering wheel angle information.
Further, the method comprises the following steps:
step S201, inputting upstream data into a vehicle prediction algorithm for pre-training, and constructing a prediction model;
step S202, acquiring current upstream data;
step S203, inputting the current upstream data into a prediction model, and outputting to obtain a prediction track;
the vehicle prediction algorithm is a grid method, an LSTM (Long Short-Term Memory network) or an anchor point method.
Here, in the pre-training process of the vehicle prediction algorithm, parameters of upper learning are ADE (Average Displacement Error) and FDE (final Displacement Error) whose trajectories are weighted according to the degree of attention of the vehicle. The initial purpose of designing the learning parameters is to make the trajectory as accurate as possible, and to ensure the reliability of the dynamic prediction.
Specifically, referring to fig. 5, in the pre-training process, the upstream data is input into a vehicle prediction algorithm for pre-training, and a prediction model is constructed; and applying the trained prediction model to the reasoning process, namely inputting the current upstream data into the trained prediction model and outputting to obtain a prediction track.
Further, referring to fig. 6, the method further includes the steps of:
step S301, dynamically arranging and combining the reception field model and the predicted track in the simulation environment to obtain lower-layer input data;
step S302, training the lower-layer input data through a reinforcement learning algorithm to obtain comprehensive evaluation;
here, the reinforcement learning algorithm performs related evaluation through a set evaluation index during training (for example, triggering a fallback mechanism of the decision planning module to give a larger negative value reward, triggering different decelerations to give a smaller negative value reward, and giving a certain positive value reward after completing an action).
Step S303, carrying out mathematical formula mapping on the comprehensive evaluation to obtain the evaluation of the decision module and the evaluation of the prediction module.
Specifically, as shown in fig. 6, a receptive field regulatory loss function is calculated according to the adjusted receptive field model, a motion loss function is calculated according to the comprehensive evaluation, the receptive field regulatory loss function and the motion loss function are added to obtain an updated model, the prediction model is updated through the updated model, and the updated model is sent to the decision planning module, so that the decision planning module continues to explore.
Further, the method comprises the following steps:
step S401, dynamically adjusting the receptive field model in real time according to the environmental information and the decision of the previous moment to obtain an adjusted receptive field model;
here, the dynamically adjusted prediction will reduce the computation consumption of the prediction module to a certain extent, and for the current open-source framework, the related computation overhead can be greatly reduced. HRL (Hierarchical Reinforcement Learning) is adopted to complete the upstream and downstream interaction of the prediction module and the decision planning module, so that the interpretability of the decision is increased, and the decision problem (unprotected left turn or jamming and the like) of urban roads is solved.
Step S402, predicting the adjusted receptive field model at the current moment to obtain a second predicted track of dynamic transformation;
and step S403, generating a second planning track according to the adjusted receptive field model and the second predicted track.
Here, the decision planning module performs a decision planning operation to generate a second predicted trajectory, and sends the second predicted trajectory to the control module. Among other things, there are assumptions in the simulation environment: the control module may follow the decision planning module without hysteresis. According to the interaction between the prediction module and the decision planning module, the adjusted receptive field model is obtained through the feedback reward value learning of the decision planning module, the long-distance decision can be optimized, and the foresight and the reliability of the decision are improved. And reinforcement learning in a simulation environment is carried out according to a plurality of predicted tracks, so that the randomness of samples is increased, the data augmentation work is completed, and the generalization performance of a reinforcement learning intelligent agent is increased.
In a simulation test, a trained prediction model can ensure that a conservative long-order decision is made in complex traffic. Corresponding verification is already made on the public data set by a corresponding prediction module, meanwhile, related closed tests are also carried out on a subsequent decision planning module, the accuracy of the prediction module can reach a higher level, and the feasibility of the complete system is good in code computation power consumption and other implicit indexes (such as smooth riding experience, surrounding observation and the like), and is superior to the realization of an open-source framework and partial current algorithms (such as an automatic driving open-source framework, a partial decoupling combination of a prediction algorithm and a decision algorithm).
The embodiment of the invention provides an automatic driving decision method based on hierarchical reinforcement learning, which comprises the following steps: acquiring upstream data, wherein the upstream data comprises perception fusion data, positioning data and control data;
inputting the upstream data into a deep learning model, and outputting to obtain a receptive field model and a first prediction track; inputting the receptive field model and the first predicted track into a reinforcement learning algorithm, and outputting to obtain a first planned track; controlling the vehicle to execute corresponding operation according to the first planned track; the deep learning model and the reinforcement learning algorithm are combined, so that the stability and the foresight of decision making are ensured, and the calculation consumption of prediction is reduced.
Example two:
fig. 7 is a schematic diagram of an automatic driving decision device based on hierarchical reinforcement learning according to an embodiment of the present invention.
Referring to fig. 7, the apparatus includes:
the system comprises a prediction module 1, a data processing module and a data processing module, wherein the prediction module is used for acquiring upstream data, and the upstream data comprises perception fusion data, positioning data and control data; inputting the upstream data into a deep learning model, and outputting to obtain a receptive field model and a first prediction track;
the decision planning module 2 is used for inputting the receptive field model and the first predicted trajectory into a reinforcement learning algorithm and outputting to obtain a first planned trajectory;
and the control module 3 is used for controlling the vehicle to execute corresponding operation according to the first planned track.
Here, compared with the conventional method of decoupling prediction and decision planning, the problem of dynamic receptive field of prediction cannot be solved at all, so that the downstream decision planning module is limited by long-term dynamic consideration and cannot make long-term decisions, and meanwhile, the maintenance difficulty of the decision planning module is increased due to invalid prediction.
The reinforcement learning algorithm used by the method can obtain better generalization capability and smaller maintenance expense compared with a state machine. Meanwhile, the HRL method is used, so that the condition that the difficulty of sparse learning of the automatic driving reward value is high in the common reinforcement learning method can be better processed.
Further, the perception fusion data comprises obstacle coordinate information, traffic light type information and vehicle type information; the positioning data comprises position information of the vehicle and position information of surrounding vehicles; the control data includes state information of the own vehicle and steering wheel angle information.
Further, the apparatus further comprises:
a pre-training module (not shown) for inputting the upstream data into a vehicle prediction algorithm for pre-training, and constructing a prediction model;
an acquisition module (not shown) for acquiring current upstream data;
an input module (not shown) for inputting the current upstream data into the prediction model and outputting to obtain a predicted track;
the vehicle prediction algorithm is a grid method, an LSTM method or an anchor point method.
The embodiment of the invention provides an automatic driving decision-making device based on hierarchical reinforcement learning, which comprises: acquiring upstream data, wherein the upstream data comprises perception fusion data, positioning data and control data;
inputting the upstream data into a deep learning model, and outputting to obtain a receptive field model and a first prediction track; inputting the receptive field model and the first predicted track into a reinforcement learning algorithm, and outputting to obtain a first planned track; controlling the vehicle to execute corresponding operation according to the first planned track; the deep learning model and the reinforcement learning algorithm are combined, so that the stability and the foresight of decision making are ensured, and the calculation consumption of prediction is reduced.
The embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and when the processor executes the computer program, the steps of the hierarchical reinforcement learning-based automatic driving decision method provided by the above embodiments are implemented.
Embodiments of the present invention further provide a computer-readable medium having non-volatile program codes executable by a processor, where the computer-readable medium stores a computer program, and the computer program is executed by the processor to perform the steps of the hierarchical reinforcement learning-based automatic driving decision method according to the above embodiments.
The computer program product provided in the embodiment of the present invention includes a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute the method described in the foregoing method embodiment, and specific implementation may refer to the method embodiment, which is not described herein again.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In addition, in the description of the embodiments of the present invention, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
The functions may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.
Claims (10)
1. An automatic driving decision method based on hierarchical reinforcement learning, characterized in that the method comprises:
acquiring upstream data, wherein the upstream data comprises perception fusion data, positioning data and control data;
inputting the upstream data into a deep learning model, and outputting to obtain a receptive field model and a first prediction track;
inputting the receptive field model and the first predicted track into a reinforcement learning algorithm, and outputting to obtain a first planning track;
and controlling the vehicle to execute corresponding operation according to the first planned track.
2. The automated driving decision method based on hierarchical reinforcement learning according to claim 1, characterized in that the perception fusion data comprises obstacle coordinate information, traffic light category information and vehicle category information; the positioning data comprises position information of the vehicle and position information of surrounding vehicles; the control data includes state information of the own vehicle and steering wheel angle information.
3. The hierarchical reinforcement learning-based automatic driving decision method according to claim 1, characterized in that the method further comprises:
inputting the upstream data into a vehicle prediction algorithm for pre-training, and constructing a prediction model;
acquiring current upstream data;
inputting the current upstream data into the prediction model, and outputting to obtain the predicted track;
the vehicle prediction algorithm is a grid method, an LSTM method or an anchor point method.
4. The hierarchical reinforcement learning-based automatic driving decision method according to claim 1, characterized in that the method further comprises:
dynamically arranging and combining the receptive field model and the predicted track to obtain lower-layer input data;
training the lower-layer input data through the reinforcement learning algorithm to obtain comprehensive evaluation;
and mapping the comprehensive evaluation by a mathematical formula to obtain the evaluation of a decision module and the evaluation of a prediction module.
5. The hierarchical reinforcement learning-based automatic driving decision method according to claim 1, characterized in that the method further comprises:
dynamically adjusting the receptive field model in real time according to the environmental information and the decision of the previous moment to obtain an adjusted receptive field model;
predicting the adjusted receptive field model at the current moment to obtain a second predicted track of dynamic transformation;
and generating a second planning track according to the adjusted receptive field model and the second predicted track.
6. An automatic driving decision device based on layered reinforcement learning, characterized in that the device comprises:
the prediction module is used for acquiring upstream data, and the upstream data comprises perception fusion data, positioning data and control data; inputting the upstream data into a deep learning model, and outputting to obtain a receptive field model and a first prediction track;
the decision planning module is used for inputting the receptive field model and the first predicted track into a reinforcement learning algorithm and outputting to obtain a first planned track;
and the control module is used for controlling the vehicle to execute corresponding operation according to the first planned track.
7. The automated driving decision device based on hierarchical reinforcement learning according to claim 6, characterized in that the perception fusion data comprises obstacle coordinate information, traffic light category information and vehicle category information; the positioning data comprises position information of the vehicle and position information of surrounding vehicles; the control data includes state information of the own vehicle and steering wheel angle information.
8. The automated driving decision device based on hierarchical reinforcement learning according to claim 6, characterized in that the device further comprises:
the pre-training module is used for inputting the upstream data into a vehicle prediction algorithm for pre-training and constructing a prediction model;
the acquisition module is used for acquiring current upstream data;
the input module is used for inputting the current upstream data into the prediction model and outputting to obtain the predicted track;
the vehicle prediction algorithm is a grid method, an LSTM method or an anchor point method.
9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1 to 5 when executing the computer program.
10. A computer-readable medium having non-volatile program code executable by a processor, wherein the program code causes the processor to perform the method of any of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210304345.0A CN114523990A (en) | 2022-03-25 | 2022-03-25 | Automatic driving decision-making method and device based on hierarchical reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210304345.0A CN114523990A (en) | 2022-03-25 | 2022-03-25 | Automatic driving decision-making method and device based on hierarchical reinforcement learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114523990A true CN114523990A (en) | 2022-05-24 |
Family
ID=81626609
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210304345.0A Pending CN114523990A (en) | 2022-03-25 | 2022-03-25 | Automatic driving decision-making method and device based on hierarchical reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114523990A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117035032A (en) * | 2023-04-14 | 2023-11-10 | 北京百度网讯科技有限公司 | Method for model training by fusing text data and automatic driving data and vehicle |
CN118618404A (en) * | 2024-07-09 | 2024-09-10 | 长春工业大学 | An intelligent driving optimization control method integrating global optimization and safety protection |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190072966A1 (en) * | 2017-09-07 | 2019-03-07 | TuSimple | Prediction-based system and method for trajectory planning of autonomous vehicles |
CN110568841A (en) * | 2019-08-05 | 2019-12-13 | 西藏宁算科技集团有限公司 | Automatic driving decision method and system |
US20200363813A1 (en) * | 2019-05-15 | 2020-11-19 | Baidu Usa Llc | Online agent using reinforcement learning to plan an open space trajectory for autonomous vehicles |
US20210200212A1 (en) * | 2019-12-31 | 2021-07-01 | Uatc, Llc | Jointly Learnable Behavior and Trajectory Planning for Autonomous Vehicles |
US20210229678A1 (en) * | 2020-01-23 | 2021-07-29 | Baidu Usa Llc | Cross-platform control profiling tool for autonomous vehicle control |
-
2022
- 2022-03-25 CN CN202210304345.0A patent/CN114523990A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190072966A1 (en) * | 2017-09-07 | 2019-03-07 | TuSimple | Prediction-based system and method for trajectory planning of autonomous vehicles |
US20200363813A1 (en) * | 2019-05-15 | 2020-11-19 | Baidu Usa Llc | Online agent using reinforcement learning to plan an open space trajectory for autonomous vehicles |
CN110568841A (en) * | 2019-08-05 | 2019-12-13 | 西藏宁算科技集团有限公司 | Automatic driving decision method and system |
US20210200212A1 (en) * | 2019-12-31 | 2021-07-01 | Uatc, Llc | Jointly Learnable Behavior and Trajectory Planning for Autonomous Vehicles |
US20210229678A1 (en) * | 2020-01-23 | 2021-07-29 | Baidu Usa Llc | Cross-platform control profiling tool for autonomous vehicle control |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117035032A (en) * | 2023-04-14 | 2023-11-10 | 北京百度网讯科技有限公司 | Method for model training by fusing text data and automatic driving data and vehicle |
CN118618404A (en) * | 2024-07-09 | 2024-09-10 | 长春工业大学 | An intelligent driving optimization control method integrating global optimization and safety protection |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20220363259A1 (en) | Method for generating lane changing decision-making model, method for lane changing decision-making of unmanned vehicle and electronic device | |
Walraven et al. | Traffic flow optimization: A reinforcement learning approach | |
US20200192393A1 (en) | Self-Modification of an Autonomous Driving System | |
CN107479547B (en) | Behavioral Decision Algorithm of Decision Tree Based on Teaching Learning | |
CN118739948A (en) | Control method and system for motor controller | |
EP4330107B1 (en) | Motion planning | |
CN114523990A (en) | Automatic driving decision-making method and device based on hierarchical reinforcement learning | |
Choo et al. | Adaptive multi-scale prognostics and health management for smart manufacturing systems | |
Dong et al. | Facilitating connected autonomous vehicle operations using space-weighted information fusion and deep reinforcement learning based control | |
CN112990485A (en) | Knowledge strategy selection method and device based on reinforcement learning | |
Rais et al. | Decision making for autonomous vehicles in highway scenarios using Harmonic SK Deep SARSA | |
Zhu et al. | Motion forecasting with unlikelihood training in continuous space | |
Hua et al. | Multi-agent reinforcement learning for connected and automated vehicles control: Recent advancements and future prospects | |
Hu et al. | Safety-aware human-lead vehicle platooning by proactively reacting to uncertain human behaving | |
Faqir et al. | Combined extreme learning machine and max pressure algorithms for traffic signal control | |
Bang et al. | Safe merging in mixed traffic with confidence | |
Chen et al. | Risk-Anticipatory Autonomous Driving Strategies Considering Vehicles’ Weights Based on Hierarchical Deep Reinforcement Learning | |
Caballero et al. | Some statistical challenges in automated driving systems | |
Ozkan et al. | Trust-aware control of automated vehicles in car-following interactions with human drivers | |
Ma et al. | Evolving testing scenario generation method and intelligence evaluation framework for automated vehicles | |
Mazouchi et al. | A risk-averse preview-based q-learning algorithm: Application to highway driving of autonomous vehicles | |
Mandalis et al. | A transformer-based method for vessel traffic flow forecasting | |
Klöpper et al. | Planning for mechatronics systems—Architecture, methods and case study | |
Rathore et al. | Intelligent decision making in autonomous vehicles using cognition aided reinforcement learning | |
Wu et al. | CenLight: Centralized traffic grid signal optimization via action and state decomposition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |