CN113485103A - Aircraft conflict resolution method based on deep reinforcement learning - Google Patents
Aircraft conflict resolution method based on deep reinforcement learning Download PDFInfo
- Publication number
- CN113485103A CN113485103A CN202110729530.XA CN202110729530A CN113485103A CN 113485103 A CN113485103 A CN 113485103A CN 202110729530 A CN202110729530 A CN 202110729530A CN 113485103 A CN113485103 A CN 113485103A
- Authority
- CN
- China
- Prior art keywords
- module
- conflict
- aircraft
- environment
- reinforcement learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000002787 reinforcement Effects 0.000 title claims abstract description 23
- 238000000034 method Methods 0.000 title claims abstract description 20
- 230000009471 action Effects 0.000 claims abstract description 18
- 238000004891 communication Methods 0.000 claims description 17
- 238000013461 design Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 8
- 238000004088 simulation Methods 0.000 claims description 6
- 230000003993 interaction Effects 0.000 claims description 4
- 238000012549 training Methods 0.000 claims description 4
- RZVHIXYEVGDQDX-UHFFFAOYSA-N 9,10-anthraquinone Chemical compound C1=CC=C2C(=O)C3=CC=CC=C3C(=O)C2=C1 RZVHIXYEVGDQDX-UHFFFAOYSA-N 0.000 claims description 2
- 230000009466 transformation Effects 0.000 claims description 2
- SLVOVFVZZFUEAS-UHFFFAOYSA-N 2-[2-[2-[bis(carboxymethyl)amino]ethoxy]ethyl-(carboxymethyl)amino]acetic acid Chemical compound OC(=O)CN(CC(O)=O)CCOCCN(CC(O)=O)CC(O)=O SLVOVFVZZFUEAS-UHFFFAOYSA-N 0.000 claims 1
- 230000006870 function Effects 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 6
- 238000011156 evaluation Methods 0.000 description 4
- 230000006399 behavior Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000003860 storage Methods 0.000 description 3
- 230000000875 corresponding effect Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000005315 distribution function Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000000087 stabilizing effect Effects 0.000 description 2
- GJFNRSDCSTVPCJ-UHFFFAOYSA-N 1,8-bis(dimethylamino)naphthalene Chemical compound C1=CC(N(C)C)=C2C(N(C)C)=CC=CC2=C1 GJFNRSDCSTVPCJ-UHFFFAOYSA-N 0.000 description 1
- 208000027137 acute motor axonal neuropathy Diseases 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 238000007711 solidification Methods 0.000 description 1
- 230000008023 solidification Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/04—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
- G05B13/042—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Traffic Control Systems (AREA)
Abstract
The invention provides an aircraft conflict resolution method based on deep reinforcement learning, which is based on a deep certainty strategy gradient algorithm, constructs each component and conflict scene of an agent through an Open AI Open source reinforcement learning environment interface Gym, and adopts a DDPG algorithm to learn a resolution strategy. The conflict deployment action of the aircraft intelligent agent relates to the adjustment of a course angle, a flight speed and a height, and the state of the conflict deployment action mainly comprises the description of multiple dimensions such as position information, speed and the like. The algorithm provided by the invention greatly helps to relieve the conflict of the aircrafts in the air traffic control, and can reduce the workload of the control of a controller.
Description
Technical Field
The invention relates to the technical field of civil aviation intelligent air traffic control, in particular to an aircraft conflict resolution method based on deep reinforcement learning.
Background
In 2019, the passenger throughput of the airport in China all the year is more than 13 hundred million people, 135162.9 ten thousand people are completed, and the number is increased by 6.9 percent compared with the number in the last year. According to the prediction of the international air transport association IATA, the number of global air passengers reaches 82 hundred million in 2037, and 16 hundred million Chinese passengers are contained. In order to alleviate the enormous traffic pressure, various air traffic flow management aids and technologies come into force, such as airport collaborative decision management systems, AMAN/DMAN systems, remote towers, collision detection and resolution technologies, and the like. The realization of the efficient collision detection and disengagement technology is the primary task for guaranteeing flight safety, and is particularly important for complex and high-density airspace environments. The operation has great significance for maintaining flight order, preventing collision of aircrafts, relieving air traffic pressure and guaranteeing air traffic safety.
Disclosure of Invention
In order to solve the problems of simple model, poor algorithm self-adaption, low efficiency and the like in the prior art, the invention provides an aircraft conflict resolution method based on deep reinforcement learning, a conflict scene model is built through an Open source air management platform OpenScope, the communication of an aircraft intelligent body is realized by combining an Gym interface, a deep reinforcement learning algorithm is adopted, a deep certainty strategy gradient DDPG is adopted to train the aircraft intelligent body to complete a conflict resolution task, compared with the existing heuristic algorithm, the invention considers the uncertainty of the environment, such as the error of taking maneuvering behavior in the aircraft conflict resolution process, and simultaneously the construction of a simulation environment is carried out by combining an Open AI reinforcement learning interface Gym platform, so that the training process has the advantages of being simpler and more efficient.
In order to achieve the purpose, the invention adopts the following technical scheme to realize the purpose: an aircraft conflict resolution method based on deep reinforcement learning comprises a conflict environment generation module, an intelligent agent communication module and a DDPG reinforcement learning algorithm module; the conflict environment generation module comprises an environment modeling submodule and a conflict scene design submodule, the intelligent body communication module comprises an Gym interface communication submodule and an Openscope empty management submodule, and the DDPG reinforcement learning algorithm module comprises a strategy network submodule Actor, a value network submodule Critic and a historical data experience pool submodule.
The environment modeling submodule is used for modeling an environment of reinforcement learning, and comprises setting and management of parameters such as an airspace range, a flight starting point, a target point, a flight speed and flight density.
The conflict scene design submodule can design different types of preset conflict scenes for the intelligent aircraft, including head-on coming head-on conflicts and lateral cross conflicts, wherein the head-on conflicts are actually a special case of the cross conflicts, namely the head-on conflicts when the magnitude of the course included angle is a flat angle, the conflict scene design submodule can design various different cross conflicts according to the course included angles with different magnitudes, the course included angle refers to the included angle of the course angles of two aircrafts, the magnitude of the included angle between the projection of the longitudinal axis of the course angle of the aircraft carrier on the horizontal plane and the geography meridian is regulated, the geography meridian is used as the starting point, the east direction is positive, and the value range is between plus or minus 180 degrees.
The Gym interface communication sub-module can complete the communication between the intelligent agent of the aircraft and other aircraft, including position information and course information, and can also complete the communication with the DDPG reinforcement learning algorithm module through the module, namely, the Gym interface communication sub-module can send the state information of all aircraft to the algorithm module from the view point of Shangdi, thereby better training the intelligent agent to learn conflict avoidance actions.
The OpenScope air management sub-module provides a human-computer interaction interface and a control interface, and meanwhile, flight control of the aircraft intelligent body is achieved, such as control of the states of the heading, the speed, the altitude and the like. The air management environment is mainly an approach control airspace, and each aircraft is limited by the flight performance of the aircraft type. The intelligent agent continuously interacts with the environment to obtain a feedback value of the environment, and learning of the release strategy is performed through a DDPG algorithm.
The strategy network sub-module belongs to a part of a DDPG algorithm, mainly completes the network learning process of the mapping from the state to the action of an intelligent agent, which is often called as a strategy network, particularly the output of the network is the optimal action of the state of the current network input, in addition, in order to make the learning process more stable, the network weight is more stably updated in an iterative manner, a Target strategy network Target is introduced, the original strategy network is called as an Online strategy network Online, the fixed updating times are set, and the weight of the Online strategy network is copied to the Target network according to the number.
Aiming at a value network submodule, the actions generated by the policy network are mainly evaluated, the mapping from the state and the actions to the Q value is learned, then the policy network learns and releases the policy according to the Q value, the policy network is consistent with the policy network submodule, and a Target evaluation network and an Online evaluation network are also introduced.
The historical data experience pool sub-module mainly completes two functions of storage and sampling, wherein the storage refers to the storage of the historical track of the intelligent agent, namely the state, the action, the reward and the next state, a mark quantity for whether the current task is completed or not can be added according to needs, and the sampling refers to the fact that the intelligent agent inputs the historical track according to a certain batch size in the learning process to learn.
Agent accumulates reward R by maximizing t timetTo learn the optimal release strategy, formulated as follows:
wherein s isiAnd aiRespectively representing the state and the action, r(s)i,ai) For a single prize value, γ is a discount factor, indicating how important the future prize is. In policy network module aiFor deterministic behavior strategies, behavior a at time ttThe determined values are obtained directly by the function, namely:
at=μ(st∣θμ)
where μ represents the state-to-action distribution function, stIndicating the state at time t, [ theta ]μIs a weight parameter of the policy network. The solving model adopts a Bellman equation to carry out iterative optimization to finally obtain an optimal strategy, and the equation is expressed as follows:
Qμ(st,at)=E[r(st,at)+γQμ(st+1,μ(st+1))]
wherein Qμ(st,at) Representing a policy network's action cost function, i.e. state stTake action atEvaluation of quality of r(s)t,at) Indicating an instant prize at time t. The value network further calculates each state s at the moment ttThe overall strategy was evaluated by the following Q value expectation, formulated as follows:
where ρ isβThe distribution function of the state s represents the probability of the state of the agent at a certain moment, and the evaluation of the strategy uses the function JβTo show that the parameters of the value network are optimized by gradientsThe updating is carried out in a manner that thetaQAnd thetaμFor the parameters of the value network, N is the number of samples generated by the agent's practice, and the update is as follows:
compared with the prior art, the invention has the following beneficial effects:
1. the deep reinforcement learning technology combines the characteristic fitting capability of deep learning and the autonomous decision-making capability of reinforcement learning, and can well solve the problems of single release mode, solidification, low efficiency and the like of the existing model.
2. The deep reinforcement learning technology considers the conflict in the horizontal direction, the conflict in the vertical direction and the change adjustment of the course and the speed, designs various practical reward values and only expands the releasing strategy to various conflict scenes.
3. The deep reinforcement learning technology does not depend on an accurate aircraft dynamics model, and the conflict resolution method of the multi-dimensional state and the action is adopted to better accord with the actual command habit of an air traffic controller and effectively cope with the uncertainty of the external conditions and the aircraft running state.
Drawings
FIG. 1 is a schematic diagram of the algorithm of the present invention;
FIG. 2 is a system operation diagram of the sub-modules of the present invention.
Detailed Description
In order to better understand the technical principles of the present invention, the present invention will be further described with reference to the accompanying drawings.
As shown in fig. 1, the aircraft conflict resolution method based on the DDPG algorithm architecture includes four sub-modules, which specifically include: the system comprises a strategy network submodule, a value network submodule, a historical data experience pool submodule and a simulation environment module.
The policy networkThe network subsystem comprises an online strategy network and a copy strategy network, wherein the online strategy network is used for interactive learning with the environment in real time, namely, the intelligent agent takes the current state as input, and the strategy network outputs corresponding action. The copy strategy network is mainly used for stabilizing the training process, namely, the parameters of the strategy network are stably updated by regularly fixing the network parameters. Policy network by computing policy gradientsUpdating parameters of the network, wherein the parameters comprise the gradient of the strategy network parameter and the gradient of the value network parameter;
the value network sub-module comprises an online value network and a duplicate value network, wherein the online value network is used for evaluating the advantages and disadvantages of the current strategy, and the duplicate value network is used for stabilizing the updating process of the parameters of the value network and is realized by fixing the parameters of the network periodically. The input of the value network is a binary group formed by the current state and the action output by the strategy network, the output is a corresponding state value function V value or action value function Q value, and the network calculates the Bellman equation to obtain a target value yiThe difference with the output value of the value network is used as a loss function, and the network parameters are optimized through the gradient value;
the historical data experience pool submodule is mainly used for storing and updating a sample library, wherein one sample is a quadruple and specifically comprises the state of an agent, the action of the agent, the reward value generated by the interaction of the agent and the environment and the next state of the agent. The capacity of the experience pool is relatively fixed, the upper limit value of the sample capacity is set, the number of samples is continuously increased along with the continuous interaction of the intelligent agent and the environment, and when the number of the samples exceeds the threshold, the samples which are the longest in distance from the current time are automatically removed, so that the updating of the sample library is realized.
The simulation environment module mainly refers to a constructed conflict scene. The intelligent agent learning environment is realized through an Gym interface, and the control environment is built through an open-source air control platform OpenScope. Firstly, the airspace in the airport approaching area is mapped, and the longitude and latitude coordinates of an airport fixed point are projected by plane coordinates through coordinate transformation. Next, the internal structure of the agent is built Gym, including the implementation of components such as state collections, action spaces, and state updates. Finally, the constructed environment is registered Gym and a conflict repository is defined.
As shown in fig. 2, the system working schematic diagram of the present invention includes a conflict environment generation module, an agent communication module, and a DDPG reinforcement learning algorithm module; the conflict environment generation module comprises an environment modeling submodule and a conflict scene design submodule, the intelligent body communication module comprises an Gym interface communication submodule and an Openscope empty management submodule, and the DDPG reinforcement learning algorithm module comprises a strategy network submodule Actor, a value network submodule Critic and a historical data experience pool submodule. The conflict environment generation module is communicated with the algorithm module through the intelligent agent communication module.
Claims (4)
1. An aircraft conflict resolution method based on deep reinforcement learning is characterized by comprising a conflict environment generation module, an intelligent agent communication module and a DDPG reinforcement learning module;
(1) the conflict environment generation module comprises an environment modeling submodule and a conflict scene design submodule;
(2) the intelligent body communication module comprises an Gym interface communication sub-module and an Openscope empty pipe sub-module;
(3) the DDPG reinforcement learning module comprises a strategy network sub-module Actor, a value network sub-module Critic and a historical data experience pool sub-module.
2. The method of claim 1, wherein each module further comprises:
(1) the environment modeling submodule is used for modeling an environment of reinforcement learning, and comprises setting and management of parameters such as an airspace range, a flight starting point, a target point, a flight speed and flight density;
(2) the conflict scene design submodule can design different types of preset conflict scenes for the intelligent aircraft, including head-to-head conflicts and lateral cross conflicts in an oncoming mode; the Gym interface communication sub-module can complete the communication between the aircraft intelligent body and other aircraft, including position information and heading information;
(3) the OpenScope air management sub-module provides a man-machine interaction interface simulation environment and a control interface, and meanwhile, flight control of the aircraft intelligent body is achieved, such as control of the states of course, speed, altitude and the like.
3. The method according to claim 2, wherein the simulation environment module is a constructed conflict scene, the learning environment of the agent is realized through an Gym interface, the control environment is constructed through an open source air control platform OpenScope, the airspace of an airport approaching region is mapped, the longitude and latitude coordinates of an airport fixed point are projected through coordinate transformation to form a Gym internal structure of the agent, and the implementation of components such as a state set, an action space and state updating is included.
4. The deep-reinforcement-learning aircraft conflict resolution method of claim 1, 2 or 3, comprising:
(1) the simulation environment is complicated in airspace, different height limits exist in each sector, and the intelligent agent avoids conflicts but exceeds the limits by adjusting the heights and is punished to a certain extent;
(2) the action space of the intelligent agent comprises course angle adjustment, height adjustment and flight speed adjustment, and is limited by the performance parameters of the BADA aircraft model;
(3) the state space of the intelligent agent comprises a plurality of dimensions such as position information, flight speed, course angle and the like, and is normalized before the training process so as to accelerate the convergence speed of the network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110729530.XA CN113485103A (en) | 2021-06-29 | 2021-06-29 | Aircraft conflict resolution method based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110729530.XA CN113485103A (en) | 2021-06-29 | 2021-06-29 | Aircraft conflict resolution method based on deep reinforcement learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113485103A true CN113485103A (en) | 2021-10-08 |
Family
ID=77936359
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110729530.XA Pending CN113485103A (en) | 2021-06-29 | 2021-06-29 | Aircraft conflict resolution method based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113485103A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114373337A (en) * | 2022-01-17 | 2022-04-19 | 北京航空航天大学 | Flight conflict autonomous releasing method under flight path uncertainty condition |
CN114415737A (en) * | 2022-04-01 | 2022-04-29 | 天津七一二通信广播股份有限公司 | Implementation method of unmanned aerial vehicle reinforcement learning training system |
CN115240475A (en) * | 2022-09-23 | 2022-10-25 | 四川大学 | Method and device for aircraft approach planning by fusing flight data and radar images |
CN116822618A (en) * | 2023-08-30 | 2023-09-29 | 北京汉勃科技有限公司 | Deep reinforcement learning exploration method and assembly based on dynamic noise network |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104216416A (en) * | 2014-08-26 | 2014-12-17 | 北京航空航天大学 | Aircraft conflict resolution method and equipment |
CN108803656A (en) * | 2018-06-12 | 2018-11-13 | 南京航空航天大学 | A kind of flight control method and system based on complicated low latitude |
CN111882047A (en) * | 2020-09-28 | 2020-11-03 | 四川大学 | A fast anti-collision method for air traffic control based on reinforcement learning and linear programming |
CN111882027A (en) * | 2020-06-02 | 2020-11-03 | 东南大学 | Robot Reinforcement Learning Training Environment System for RoboMaster AI Challenge |
US20200372809A1 (en) * | 2019-05-21 | 2020-11-26 | International Business Machines Corporation | Traffic control with reinforcement learning |
FR3103615A1 (en) * | 2019-11-25 | 2021-05-28 | Thales | DECISION AID DEVICE AND PROCEDURE FOR THE MANAGEMENT OF AIR CONFLICTS |
-
2021
- 2021-06-29 CN CN202110729530.XA patent/CN113485103A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104216416A (en) * | 2014-08-26 | 2014-12-17 | 北京航空航天大学 | Aircraft conflict resolution method and equipment |
CN108803656A (en) * | 2018-06-12 | 2018-11-13 | 南京航空航天大学 | A kind of flight control method and system based on complicated low latitude |
US20200372809A1 (en) * | 2019-05-21 | 2020-11-26 | International Business Machines Corporation | Traffic control with reinforcement learning |
FR3103615A1 (en) * | 2019-11-25 | 2021-05-28 | Thales | DECISION AID DEVICE AND PROCEDURE FOR THE MANAGEMENT OF AIR CONFLICTS |
WO2021105055A1 (en) * | 2019-11-25 | 2021-06-03 | Thales | Decision assistance device and method for managing aerial conflicts |
CN111882027A (en) * | 2020-06-02 | 2020-11-03 | 东南大学 | Robot Reinforcement Learning Training Environment System for RoboMaster AI Challenge |
CN111882047A (en) * | 2020-09-28 | 2020-11-03 | 四川大学 | A fast anti-collision method for air traffic control based on reinforcement learning and linear programming |
Non-Patent Citations (2)
Title |
---|
ANTON BLABERG等: "Simulating ADS-B Attacks in Air Traffic Management", 《2020 AIAA/IEEE 39TH DIGITAL AVIONICS SYSTEMS CONFERENCE》 * |
江波等: "基于深度强化学习的航路点飞行冲突解脱", 《航空计算技术》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114373337A (en) * | 2022-01-17 | 2022-04-19 | 北京航空航天大学 | Flight conflict autonomous releasing method under flight path uncertainty condition |
CN114373337B (en) * | 2022-01-17 | 2022-11-22 | 北京航空航天大学 | Flight conflict autonomous releasing method under flight path uncertainty condition |
CN114415737A (en) * | 2022-04-01 | 2022-04-29 | 天津七一二通信广播股份有限公司 | Implementation method of unmanned aerial vehicle reinforcement learning training system |
CN115240475A (en) * | 2022-09-23 | 2022-10-25 | 四川大学 | Method and device for aircraft approach planning by fusing flight data and radar images |
CN116822618A (en) * | 2023-08-30 | 2023-09-29 | 北京汉勃科技有限公司 | Deep reinforcement learning exploration method and assembly based on dynamic noise network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113485103A (en) | Aircraft conflict resolution method based on deep reinforcement learning | |
CN111536979B (en) | A UAV inspection path planning method based on stochastic optimization | |
CN110502032A (en) | A Behavioral Control-Based Method for UAV Swarm Formation Flight | |
CN108459616B (en) | A route planning method for UAV swarm cooperative coverage based on artificial bee colony algorithm | |
Dong et al. | Study on the resolution of multi-aircraft flight conflicts based on an IDQN | |
CN115060263A (en) | Flight path planning method considering low-altitude wind and energy consumption of unmanned aerial vehicle | |
CN111045445A (en) | Aircraft intelligent collision avoidance method, equipment and medium based on reinforcement learning | |
CN113593308A (en) | Intelligent approach method for civil aircraft | |
CN114791743A (en) | A collaborative trajectory planning method for UAV swarms considering communication delay | |
CN115357044A (en) | Method, equipment and medium for planning inspection path of unmanned aerial vehicle cluster distribution network line | |
Zhu et al. | Multi-constrained intelligent gliding guidance via optimal control and DQN | |
Wu et al. | Multi-phase trajectory optimization for an aerial-aquatic vehicle considering the influence of navigation error | |
Li et al. | A warm-started trajectory planner for fixed-wing unmanned aerial vehicle formation | |
CN114967732A (en) | Method and device for formation and aggregation of unmanned aerial vehicles, computer equipment and storage medium | |
CN116772848A (en) | A green real-time planning method for four-dimensional flight trajectories in aircraft terminal areas | |
CN119088073A (en) | A UAV swarm mission planning algorithm based on hierarchical multi-agent deep reinforcement learning and its evaluation method | |
CN116880541A (en) | An adaptive conflict relief method for drones in urban scenes | |
CN116414149A (en) | An online avoidance system of no-fly zone for aircraft based on deep reinforcement learning | |
CN116795138A (en) | A multi-UAV intelligent trajectory planning method for data collection | |
Li et al. | Fast formation transformation and obstacle avoidance control for multi-agent system | |
CN115774455A (en) | Distributed unmanned cluster trajectory planning method for avoiding deadlock in complex obstacle environment | |
CN114879490A (en) | Iterative optimization and control method for unmanned aerial vehicle perching maneuver | |
CN115686076A (en) | Unmanned aerial vehicle path planning method based on incremental development depth reinforcement learning | |
Huang et al. | Optimization of Path Planning Algorithm in Intelligent Air Traffic Management System | |
Chen et al. | Rerouting planning of suborbital debris hazard zone based on reinforcement learning DDPG |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20211008 |