[go: up one dir, main page]

CN113485103A - Aircraft conflict resolution method based on deep reinforcement learning - Google Patents

Aircraft conflict resolution method based on deep reinforcement learning Download PDF

Info

Publication number
CN113485103A
CN113485103A CN202110729530.XA CN202110729530A CN113485103A CN 113485103 A CN113485103 A CN 113485103A CN 202110729530 A CN202110729530 A CN 202110729530A CN 113485103 A CN113485103 A CN 113485103A
Authority
CN
China
Prior art keywords
module
conflict
aircraft
environment
reinforcement learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110729530.XA
Other languages
Chinese (zh)
Inventor
韩云祥
张建伟
何爱平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN202110729530.XA priority Critical patent/CN113485103A/en
Publication of CN113485103A publication Critical patent/CN113485103A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention provides an aircraft conflict resolution method based on deep reinforcement learning, which is based on a deep certainty strategy gradient algorithm, constructs each component and conflict scene of an agent through an Open AI Open source reinforcement learning environment interface Gym, and adopts a DDPG algorithm to learn a resolution strategy. The conflict deployment action of the aircraft intelligent agent relates to the adjustment of a course angle, a flight speed and a height, and the state of the conflict deployment action mainly comprises the description of multiple dimensions such as position information, speed and the like. The algorithm provided by the invention greatly helps to relieve the conflict of the aircrafts in the air traffic control, and can reduce the workload of the control of a controller.

Description

Aircraft conflict resolution method based on deep reinforcement learning
Technical Field
The invention relates to the technical field of civil aviation intelligent air traffic control, in particular to an aircraft conflict resolution method based on deep reinforcement learning.
Background
In 2019, the passenger throughput of the airport in China all the year is more than 13 hundred million people, 135162.9 ten thousand people are completed, and the number is increased by 6.9 percent compared with the number in the last year. According to the prediction of the international air transport association IATA, the number of global air passengers reaches 82 hundred million in 2037, and 16 hundred million Chinese passengers are contained. In order to alleviate the enormous traffic pressure, various air traffic flow management aids and technologies come into force, such as airport collaborative decision management systems, AMAN/DMAN systems, remote towers, collision detection and resolution technologies, and the like. The realization of the efficient collision detection and disengagement technology is the primary task for guaranteeing flight safety, and is particularly important for complex and high-density airspace environments. The operation has great significance for maintaining flight order, preventing collision of aircrafts, relieving air traffic pressure and guaranteeing air traffic safety.
Disclosure of Invention
In order to solve the problems of simple model, poor algorithm self-adaption, low efficiency and the like in the prior art, the invention provides an aircraft conflict resolution method based on deep reinforcement learning, a conflict scene model is built through an Open source air management platform OpenScope, the communication of an aircraft intelligent body is realized by combining an Gym interface, a deep reinforcement learning algorithm is adopted, a deep certainty strategy gradient DDPG is adopted to train the aircraft intelligent body to complete a conflict resolution task, compared with the existing heuristic algorithm, the invention considers the uncertainty of the environment, such as the error of taking maneuvering behavior in the aircraft conflict resolution process, and simultaneously the construction of a simulation environment is carried out by combining an Open AI reinforcement learning interface Gym platform, so that the training process has the advantages of being simpler and more efficient.
In order to achieve the purpose, the invention adopts the following technical scheme to realize the purpose: an aircraft conflict resolution method based on deep reinforcement learning comprises a conflict environment generation module, an intelligent agent communication module and a DDPG reinforcement learning algorithm module; the conflict environment generation module comprises an environment modeling submodule and a conflict scene design submodule, the intelligent body communication module comprises an Gym interface communication submodule and an Openscope empty management submodule, and the DDPG reinforcement learning algorithm module comprises a strategy network submodule Actor, a value network submodule Critic and a historical data experience pool submodule.
The environment modeling submodule is used for modeling an environment of reinforcement learning, and comprises setting and management of parameters such as an airspace range, a flight starting point, a target point, a flight speed and flight density.
The conflict scene design submodule can design different types of preset conflict scenes for the intelligent aircraft, including head-on coming head-on conflicts and lateral cross conflicts, wherein the head-on conflicts are actually a special case of the cross conflicts, namely the head-on conflicts when the magnitude of the course included angle is a flat angle, the conflict scene design submodule can design various different cross conflicts according to the course included angles with different magnitudes, the course included angle refers to the included angle of the course angles of two aircrafts, the magnitude of the included angle between the projection of the longitudinal axis of the course angle of the aircraft carrier on the horizontal plane and the geography meridian is regulated, the geography meridian is used as the starting point, the east direction is positive, and the value range is between plus or minus 180 degrees.
The Gym interface communication sub-module can complete the communication between the intelligent agent of the aircraft and other aircraft, including position information and course information, and can also complete the communication with the DDPG reinforcement learning algorithm module through the module, namely, the Gym interface communication sub-module can send the state information of all aircraft to the algorithm module from the view point of Shangdi, thereby better training the intelligent agent to learn conflict avoidance actions.
The OpenScope air management sub-module provides a human-computer interaction interface and a control interface, and meanwhile, flight control of the aircraft intelligent body is achieved, such as control of the states of the heading, the speed, the altitude and the like. The air management environment is mainly an approach control airspace, and each aircraft is limited by the flight performance of the aircraft type. The intelligent agent continuously interacts with the environment to obtain a feedback value of the environment, and learning of the release strategy is performed through a DDPG algorithm.
The strategy network sub-module belongs to a part of a DDPG algorithm, mainly completes the network learning process of the mapping from the state to the action of an intelligent agent, which is often called as a strategy network, particularly the output of the network is the optimal action of the state of the current network input, in addition, in order to make the learning process more stable, the network weight is more stably updated in an iterative manner, a Target strategy network Target is introduced, the original strategy network is called as an Online strategy network Online, the fixed updating times are set, and the weight of the Online strategy network is copied to the Target network according to the number.
Aiming at a value network submodule, the actions generated by the policy network are mainly evaluated, the mapping from the state and the actions to the Q value is learned, then the policy network learns and releases the policy according to the Q value, the policy network is consistent with the policy network submodule, and a Target evaluation network and an Online evaluation network are also introduced.
The historical data experience pool sub-module mainly completes two functions of storage and sampling, wherein the storage refers to the storage of the historical track of the intelligent agent, namely the state, the action, the reward and the next state, a mark quantity for whether the current task is completed or not can be added according to needs, and the sampling refers to the fact that the intelligent agent inputs the historical track according to a certain batch size in the learning process to learn.
Agent accumulates reward R by maximizing t timetTo learn the optimal release strategy, formulated as follows:
Figure BDA0003138817380000021
wherein s isiAnd aiRespectively representing the state and the action, r(s)i,ai) For a single prize value, γ is a discount factor, indicating how important the future prize is. In policy network module aiFor deterministic behavior strategies, behavior a at time ttThe determined values are obtained directly by the function, namely:
at=μ(st∣θμ)
where μ represents the state-to-action distribution function, stIndicating the state at time t, [ theta ]μIs a weight parameter of the policy network. The solving model adopts a Bellman equation to carry out iterative optimization to finally obtain an optimal strategy, and the equation is expressed as follows:
Qμ(st,at)=E[r(st,at)+γQμ(st+1,μ(st+1))]
wherein Qμ(st,at) Representing a policy network's action cost function, i.e. state stTake action atEvaluation of quality of r(s)t,at) Indicating an instant prize at time t. The value network further calculates each state s at the moment ttThe overall strategy was evaluated by the following Q value expectation, formulated as follows:
Figure BDA0003138817380000031
where ρ isβThe distribution function of the state s represents the probability of the state of the agent at a certain moment, and the evaluation of the strategy uses the function JβTo show that the parameters of the value network are optimized by gradients
Figure BDA0003138817380000032
The updating is carried out in a manner that thetaQAnd thetaμFor the parameters of the value network, N is the number of samples generated by the agent's practice, and the update is as follows:
Figure BDA0003138817380000033
compared with the prior art, the invention has the following beneficial effects:
1. the deep reinforcement learning technology combines the characteristic fitting capability of deep learning and the autonomous decision-making capability of reinforcement learning, and can well solve the problems of single release mode, solidification, low efficiency and the like of the existing model.
2. The deep reinforcement learning technology considers the conflict in the horizontal direction, the conflict in the vertical direction and the change adjustment of the course and the speed, designs various practical reward values and only expands the releasing strategy to various conflict scenes.
3. The deep reinforcement learning technology does not depend on an accurate aircraft dynamics model, and the conflict resolution method of the multi-dimensional state and the action is adopted to better accord with the actual command habit of an air traffic controller and effectively cope with the uncertainty of the external conditions and the aircraft running state.
Drawings
FIG. 1 is a schematic diagram of the algorithm of the present invention;
FIG. 2 is a system operation diagram of the sub-modules of the present invention.
Detailed Description
In order to better understand the technical principles of the present invention, the present invention will be further described with reference to the accompanying drawings.
As shown in fig. 1, the aircraft conflict resolution method based on the DDPG algorithm architecture includes four sub-modules, which specifically include: the system comprises a strategy network submodule, a value network submodule, a historical data experience pool submodule and a simulation environment module.
The policy networkThe network subsystem comprises an online strategy network and a copy strategy network, wherein the online strategy network is used for interactive learning with the environment in real time, namely, the intelligent agent takes the current state as input, and the strategy network outputs corresponding action. The copy strategy network is mainly used for stabilizing the training process, namely, the parameters of the strategy network are stably updated by regularly fixing the network parameters. Policy network by computing policy gradients
Figure BDA0003138817380000041
Updating parameters of the network, wherein the parameters comprise the gradient of the strategy network parameter and the gradient of the value network parameter;
the value network sub-module comprises an online value network and a duplicate value network, wherein the online value network is used for evaluating the advantages and disadvantages of the current strategy, and the duplicate value network is used for stabilizing the updating process of the parameters of the value network and is realized by fixing the parameters of the network periodically. The input of the value network is a binary group formed by the current state and the action output by the strategy network, the output is a corresponding state value function V value or action value function Q value, and the network calculates the Bellman equation to obtain a target value yiThe difference with the output value of the value network is used as a loss function, and the network parameters are optimized through the gradient value;
the historical data experience pool submodule is mainly used for storing and updating a sample library, wherein one sample is a quadruple and specifically comprises the state of an agent, the action of the agent, the reward value generated by the interaction of the agent and the environment and the next state of the agent. The capacity of the experience pool is relatively fixed, the upper limit value of the sample capacity is set, the number of samples is continuously increased along with the continuous interaction of the intelligent agent and the environment, and when the number of the samples exceeds the threshold, the samples which are the longest in distance from the current time are automatically removed, so that the updating of the sample library is realized.
The simulation environment module mainly refers to a constructed conflict scene. The intelligent agent learning environment is realized through an Gym interface, and the control environment is built through an open-source air control platform OpenScope. Firstly, the airspace in the airport approaching area is mapped, and the longitude and latitude coordinates of an airport fixed point are projected by plane coordinates through coordinate transformation. Next, the internal structure of the agent is built Gym, including the implementation of components such as state collections, action spaces, and state updates. Finally, the constructed environment is registered Gym and a conflict repository is defined.
As shown in fig. 2, the system working schematic diagram of the present invention includes a conflict environment generation module, an agent communication module, and a DDPG reinforcement learning algorithm module; the conflict environment generation module comprises an environment modeling submodule and a conflict scene design submodule, the intelligent body communication module comprises an Gym interface communication submodule and an Openscope empty management submodule, and the DDPG reinforcement learning algorithm module comprises a strategy network submodule Actor, a value network submodule Critic and a historical data experience pool submodule. The conflict environment generation module is communicated with the algorithm module through the intelligent agent communication module.

Claims (4)

1. An aircraft conflict resolution method based on deep reinforcement learning is characterized by comprising a conflict environment generation module, an intelligent agent communication module and a DDPG reinforcement learning module;
(1) the conflict environment generation module comprises an environment modeling submodule and a conflict scene design submodule;
(2) the intelligent body communication module comprises an Gym interface communication sub-module and an Openscope empty pipe sub-module;
(3) the DDPG reinforcement learning module comprises a strategy network sub-module Actor, a value network sub-module Critic and a historical data experience pool sub-module.
2. The method of claim 1, wherein each module further comprises:
(1) the environment modeling submodule is used for modeling an environment of reinforcement learning, and comprises setting and management of parameters such as an airspace range, a flight starting point, a target point, a flight speed and flight density;
(2) the conflict scene design submodule can design different types of preset conflict scenes for the intelligent aircraft, including head-to-head conflicts and lateral cross conflicts in an oncoming mode; the Gym interface communication sub-module can complete the communication between the aircraft intelligent body and other aircraft, including position information and heading information;
(3) the OpenScope air management sub-module provides a man-machine interaction interface simulation environment and a control interface, and meanwhile, flight control of the aircraft intelligent body is achieved, such as control of the states of course, speed, altitude and the like.
3. The method according to claim 2, wherein the simulation environment module is a constructed conflict scene, the learning environment of the agent is realized through an Gym interface, the control environment is constructed through an open source air control platform OpenScope, the airspace of an airport approaching region is mapped, the longitude and latitude coordinates of an airport fixed point are projected through coordinate transformation to form a Gym internal structure of the agent, and the implementation of components such as a state set, an action space and state updating is included.
4. The deep-reinforcement-learning aircraft conflict resolution method of claim 1, 2 or 3, comprising:
(1) the simulation environment is complicated in airspace, different height limits exist in each sector, and the intelligent agent avoids conflicts but exceeds the limits by adjusting the heights and is punished to a certain extent;
(2) the action space of the intelligent agent comprises course angle adjustment, height adjustment and flight speed adjustment, and is limited by the performance parameters of the BADA aircraft model;
(3) the state space of the intelligent agent comprises a plurality of dimensions such as position information, flight speed, course angle and the like, and is normalized before the training process so as to accelerate the convergence speed of the network.
CN202110729530.XA 2021-06-29 2021-06-29 Aircraft conflict resolution method based on deep reinforcement learning Pending CN113485103A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110729530.XA CN113485103A (en) 2021-06-29 2021-06-29 Aircraft conflict resolution method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110729530.XA CN113485103A (en) 2021-06-29 2021-06-29 Aircraft conflict resolution method based on deep reinforcement learning

Publications (1)

Publication Number Publication Date
CN113485103A true CN113485103A (en) 2021-10-08

Family

ID=77936359

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110729530.XA Pending CN113485103A (en) 2021-06-29 2021-06-29 Aircraft conflict resolution method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN113485103A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114373337A (en) * 2022-01-17 2022-04-19 北京航空航天大学 Flight conflict autonomous releasing method under flight path uncertainty condition
CN114415737A (en) * 2022-04-01 2022-04-29 天津七一二通信广播股份有限公司 Implementation method of unmanned aerial vehicle reinforcement learning training system
CN115240475A (en) * 2022-09-23 2022-10-25 四川大学 Method and device for aircraft approach planning by fusing flight data and radar images
CN116822618A (en) * 2023-08-30 2023-09-29 北京汉勃科技有限公司 Deep reinforcement learning exploration method and assembly based on dynamic noise network

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104216416A (en) * 2014-08-26 2014-12-17 北京航空航天大学 Aircraft conflict resolution method and equipment
CN108803656A (en) * 2018-06-12 2018-11-13 南京航空航天大学 A kind of flight control method and system based on complicated low latitude
CN111882047A (en) * 2020-09-28 2020-11-03 四川大学 A fast anti-collision method for air traffic control based on reinforcement learning and linear programming
CN111882027A (en) * 2020-06-02 2020-11-03 东南大学 Robot Reinforcement Learning Training Environment System for RoboMaster AI Challenge
US20200372809A1 (en) * 2019-05-21 2020-11-26 International Business Machines Corporation Traffic control with reinforcement learning
FR3103615A1 (en) * 2019-11-25 2021-05-28 Thales DECISION AID DEVICE AND PROCEDURE FOR THE MANAGEMENT OF AIR CONFLICTS

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104216416A (en) * 2014-08-26 2014-12-17 北京航空航天大学 Aircraft conflict resolution method and equipment
CN108803656A (en) * 2018-06-12 2018-11-13 南京航空航天大学 A kind of flight control method and system based on complicated low latitude
US20200372809A1 (en) * 2019-05-21 2020-11-26 International Business Machines Corporation Traffic control with reinforcement learning
FR3103615A1 (en) * 2019-11-25 2021-05-28 Thales DECISION AID DEVICE AND PROCEDURE FOR THE MANAGEMENT OF AIR CONFLICTS
WO2021105055A1 (en) * 2019-11-25 2021-06-03 Thales Decision assistance device and method for managing aerial conflicts
CN111882027A (en) * 2020-06-02 2020-11-03 东南大学 Robot Reinforcement Learning Training Environment System for RoboMaster AI Challenge
CN111882047A (en) * 2020-09-28 2020-11-03 四川大学 A fast anti-collision method for air traffic control based on reinforcement learning and linear programming

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ANTON BLABERG等: "Simulating ADS-B Attacks in Air Traffic Management", 《2020 AIAA/IEEE 39TH DIGITAL AVIONICS SYSTEMS CONFERENCE》 *
江波等: "基于深度强化学习的航路点飞行冲突解脱", 《航空计算技术》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114373337A (en) * 2022-01-17 2022-04-19 北京航空航天大学 Flight conflict autonomous releasing method under flight path uncertainty condition
CN114373337B (en) * 2022-01-17 2022-11-22 北京航空航天大学 Flight conflict autonomous releasing method under flight path uncertainty condition
CN114415737A (en) * 2022-04-01 2022-04-29 天津七一二通信广播股份有限公司 Implementation method of unmanned aerial vehicle reinforcement learning training system
CN115240475A (en) * 2022-09-23 2022-10-25 四川大学 Method and device for aircraft approach planning by fusing flight data and radar images
CN116822618A (en) * 2023-08-30 2023-09-29 北京汉勃科技有限公司 Deep reinforcement learning exploration method and assembly based on dynamic noise network

Similar Documents

Publication Publication Date Title
CN113485103A (en) Aircraft conflict resolution method based on deep reinforcement learning
CN111536979B (en) A UAV inspection path planning method based on stochastic optimization
CN110502032A (en) A Behavioral Control-Based Method for UAV Swarm Formation Flight
CN108459616B (en) A route planning method for UAV swarm cooperative coverage based on artificial bee colony algorithm
Dong et al. Study on the resolution of multi-aircraft flight conflicts based on an IDQN
CN115060263A (en) Flight path planning method considering low-altitude wind and energy consumption of unmanned aerial vehicle
CN111045445A (en) Aircraft intelligent collision avoidance method, equipment and medium based on reinforcement learning
CN113593308A (en) Intelligent approach method for civil aircraft
CN114791743A (en) A collaborative trajectory planning method for UAV swarms considering communication delay
CN115357044A (en) Method, equipment and medium for planning inspection path of unmanned aerial vehicle cluster distribution network line
Zhu et al. Multi-constrained intelligent gliding guidance via optimal control and DQN
Wu et al. Multi-phase trajectory optimization for an aerial-aquatic vehicle considering the influence of navigation error
Li et al. A warm-started trajectory planner for fixed-wing unmanned aerial vehicle formation
CN114967732A (en) Method and device for formation and aggregation of unmanned aerial vehicles, computer equipment and storage medium
CN116772848A (en) A green real-time planning method for four-dimensional flight trajectories in aircraft terminal areas
CN119088073A (en) A UAV swarm mission planning algorithm based on hierarchical multi-agent deep reinforcement learning and its evaluation method
CN116880541A (en) An adaptive conflict relief method for drones in urban scenes
CN116414149A (en) An online avoidance system of no-fly zone for aircraft based on deep reinforcement learning
CN116795138A (en) A multi-UAV intelligent trajectory planning method for data collection
Li et al. Fast formation transformation and obstacle avoidance control for multi-agent system
CN115774455A (en) Distributed unmanned cluster trajectory planning method for avoiding deadlock in complex obstacle environment
CN114879490A (en) Iterative optimization and control method for unmanned aerial vehicle perching maneuver
CN115686076A (en) Unmanned aerial vehicle path planning method based on incremental development depth reinforcement learning
Huang et al. Optimization of Path Planning Algorithm in Intelligent Air Traffic Management System
Chen et al. Rerouting planning of suborbital debris hazard zone based on reinforcement learning DDPG

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20211008