CN120595851A - UAV flight attitude adjustment method and system based on reinforcement learning - Google Patents
UAV flight attitude adjustment method and system based on reinforcement learningInfo
- Publication number
- CN120595851A CN120595851A CN202510835183.7A CN202510835183A CN120595851A CN 120595851 A CN120595851 A CN 120595851A CN 202510835183 A CN202510835183 A CN 202510835183A CN 120595851 A CN120595851 A CN 120595851A
- Authority
- CN
- China
- Prior art keywords
- state
- information
- flight
- meta
- uav
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/40—Control within particular dimensions
- G05D1/49—Control of attitude, i.e. control of roll, pitch or yaw
- G05D1/495—Control of attitude, i.e. control of roll, pitch or yaw to ensure stability
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/0265—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
- G05B13/027—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks only
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/04—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
- G05B13/042—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/40—Control within particular dimensions
- G05D1/46—Control of position or course in three dimensions
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D2101/00—Details of software or hardware architectures used for the control of position
- G05D2101/10—Details of software or hardware architectures used for the control of position using artificial intelligence [AI] techniques
- G05D2101/15—Details of software or hardware architectures used for the control of position using artificial intelligence [AI] techniques using machine learning, e.g. neural networks
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D2109/00—Types of controlled vehicles
- G05D2109/20—Aircraft, e.g. drones
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Automation & Control Theory (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Health & Medical Sciences (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
Abstract
The invention relates to the technical field of unmanned aerial vehicle flight attitude control, in particular to an unmanned aerial vehicle flight attitude adjustment method and system based on reinforcement learning. The method comprises the steps of constructing a dynamic environment raster image based on a digital map and a real-time semantic segmentation result, generating an initial track through a fast search random tree algorithm introducing space-time constraint, collecting unmanned aerial vehicle flight state information, environment perception information and image definition indexes, inputting a space-time attention coder to generate a state tensor fusing semantics, carrying out importance distribution on the state tensor through an information entropy weighting mechanism to obtain a weighted state vector, inputting the weighted state vector to a Meta-SAC model with Meta learning capability, outputting a target flight reference pose and an LQR controller dynamic gain coefficient, driving the LQR controller to generate a flight control instruction based on the information by a control module, constructing a reward function based on image quality and energy consumption efficiency, and feeding back a reinforcement learning strategy in real time.
Description
Technical Field
The invention relates to the technical field of unmanned aerial vehicle flight attitude control, in particular to an unmanned aerial vehicle flight attitude adjustment method and system based on reinforcement learning.
Background
Unmanned aerial vehicle has obtained wide application in fields such as electric power inspection, geological survey, security protection control, especially in electric power line inspection task, unmanned aerial vehicle can replace the manual work to accomplish high-risk, complicated high altitude inspection task. However, the conventional unmanned aerial vehicle relies on manual remote control or a preset route to execute tasks, has the problems of poor autonomy, weak environmental adaptability, insufficient flight control precision and the like, and is difficult to meet the requirements of high-quality image acquisition and high-efficiency operation in a complex environment.
Most of the current flight control systems adopt a mode based on PID (proportion-integral-derivative) control or fixed route planning, and the methods depend on static rules, lack self-adaptive capacity and are difficult to cope with complex factors such as dynamic obstacles, wind field disturbance and the like. In the multi-target task, the existing control method cannot cooperatively process a plurality of indexes, so that the control effect is poor, and problems such as image blurring, target loss, route deviation and the like are easy to occur. In addition, the system for partially integrating path planning and control optimization adopts a staged processing strategy, lacks an end-to-end feedback mechanism, cannot continuously optimize the control strategy in the execution process, and limits the global performance capability of the system. Particularly, in high-precision aerial photography or complex-structure inspection, the cooperative control requirement on the flight attitude and the camera visual angle is high, and the traditional controller is difficult to achieve good balance between precision and real-time performance.
Therefore, a flight control method with adaptive learning capability and dynamic feedback optimization is needed to improve the intelligent level and operation performance of the unmanned aerial vehicle in the real inspection task.
Disclosure of Invention
The invention provides an unmanned aerial vehicle flight attitude adjustment method and system based on reinforcement learning. The unmanned aerial vehicle flight control system aims at solving the problems that a traditional unmanned aerial vehicle flight control system is poor in dynamic environment adaptability, inaccurate in posture adjustment, unstable in image acquisition quality and the like.
In order to achieve the above purpose, the present invention provides the following technical solutions:
The unmanned aerial vehicle flight attitude adjustment method based on reinforcement learning comprises the following steps:
Constructing a dynamic environment grid according to the digital map and the real-time semantic segmentation result, and generating an initial track by introducing a space-time constraint fast search random tree algorithm;
Acquiring flight state information of an unmanned aerial vehicle on an initial track, environment perception information of an onboard sensor and image definition information, inputting the information into a space-time attention encoder, and generating a state tensor fusing environment semantics;
Introducing an information entropy weighting mechanism to the state tensor to perform state importance distribution to obtain a weighted state vector;
inputting the weighted state vector into a Meta-SAC model with Meta-learning capability, and generating output target flight reference pose parameters and gain coefficients for dynamically adjusting the LQR controller through an Actor network;
based on the target flight reference pose parameters and the LQR gain coefficients, driving an LQR controller to generate a low-layer flight control instruction so as to control the unmanned aerial vehicle to execute corresponding pose actions;
In the process of executing the flight control instruction, a fuzzy logic reward function is constructed based on the energy consumption efficiency and the definition of the patrol image, and the reward function value is used as feedback of the Meta-SAC model to carry out strategy updating.
Further, the step of constructing the dynamic environment grid is as follows:
Identifying and classifying obstacles in the environment by using a semantic segmentation model based on a digital map of a target inspection area and an image acquired by an airborne sensor;
and fusing the static geographic information and the dynamic obstacle identification information to construct a three-dimensional dynamic environment raster image containing time dimension, wherein the raster records the trafficability, the obstacle type and the dynamic behavior attribute of each area in the environment.
Further, the step of generating the initial track is as follows:
In the dynamic environment grid, taking the current position of the unmanned aerial vehicle as a root node, taking a target point as an end point, and adopting a fast search random tree algorithm to perform path expansion;
And introducing Lyapunov stability constraint in path expansion, performing track fairing processing through a B spline curve, and outputting an initial flight track sequence containing time and attitude information.
Further, the step of generating a state tensor fusing the environment semantics comprises the following steps:
Carrying out normalization preprocessing on flight state information, environment perception information and image definition information of the unmanned aerial vehicle;
inputting the preprocessed information into a space-time attention encoder, and fusing multi-modal data by utilizing a multi-head attention mechanism;
and extracting the correlation characteristics of the spatial context and the time sequence by combining the semantic segmentation result, and constructing a state tensor comprising position information, obstacle threat degree, image definition predicted value and energy consumption estimation.
Further, the state importance allocation step comprises the following steps:
measuring the change degree of each sub-state component in the state tensor based on the information entropy theory;
Converting the information entropy value into a weight factor for representing the importance degree of each state dimension in policy decision;
And carrying out weighted fusion on the state tensors according to the weight factors to generate weighted state vectors with state importance perception capability.
Further, the step of generating the low-layer flight control instruction by the LQR controller is as follows:
Receiving target flight reference pose parameters output by a Meta-SAC model as expected input;
constructing a system state error vector by combining the current flight state of the unmanned aerial vehicle, and calculating a control instruction according to the dynamically adjusted feedback gain coefficient;
And (3) minimizing a state error and controlling an energy consumption cost function by solving a linear quadratic optimization problem, and outputting a low-level control quantity to drive the aircraft to execute corresponding attitude adjustment.
Further, the step of constructing the fuzzy logic rewarding function is as follows:
Acquiring image definition indexes, energy consumption rate and relative distance between obstacles of the unmanned aerial vehicle in the flight process, and carrying out normalization processing on the continuous variables;
Converting the normalized input variable into fuzzy variables, and mapping the fuzzy variables into preset fuzzy membership functions to obtain corresponding fuzzy language values;
fuzzy reasoning is carried out based on a preset fuzzy rule base, and the reasoning rule comprises combination judgment of image definition, energy consumption and obstacle distance;
the reasoning result is converted into specific numerical rewards through a defuzzification method and is used as an instant feedback signal of the Meta-SAC model to update the strategy network.
The invention also provides an unmanned aerial vehicle flight attitude adjusting system based on reinforcement learning, which comprises:
The route planning module is used for constructing a dynamic environment grid according to the digital map and the real-time semantic segmentation result, and generating an initial route by introducing a space-time constraint fast search random tree algorithm;
The information fusion module is used for acquiring flight state information of the unmanned aerial vehicle on an initial track, environment perception information of an onboard sensor and image definition information, inputting the information into the space-time attention encoder and generating a state tensor fusing environment semantics;
the tensor weighting module is used for introducing an information entropy weighting mechanism to the state tensor to distribute the state importance so as to obtain a weighted state vector;
The pose generation module is used for inputting the weighted state vector into a Meta-SAC model with Meta-learning capability, and generating output target flight reference pose parameters and gain coefficients for dynamically adjusting the LQR controller through an Actor network;
The pose control module is used for driving the LQR controller to generate a low-layer flight control instruction so as to control the unmanned aerial vehicle to execute corresponding pose actions based on the target flight reference pose parameters and the LQR gain coefficients;
And the feedback updating module is used for constructing a reward function based on the energy consumption efficiency and the definition of the patrol image in the process of executing the flight control instruction, and taking the reward function value as feedback of the Meta-SAC model to carry out strategy updating.
The beneficial effects of the invention are as follows:
1. and a space-time constraint fast search random tree algorithm is introduced, namely, the response capability of the flight path to dynamic environment change is remarkably improved by incorporating the time window and the obstacle dynamic behavior modeling into the RRT path planning process. By combining Lyapunov stability constraint and B-spline fairing processing, the generated track not only meets the dynamic accessibility requirement, but also has continuity and smoothness, and the flight safety and path controllability are effectively improved.
2. And (3) introducing an information entropy weight mechanism to weight tensors, namely carrying out dynamic importance distribution on state tensors generated by multi-mode perception by adopting the information entropy weight mechanism, and realizing the attention enhancement on key environment variables (such as image fuzzy areas or high threat barriers). The mechanism improves the representation capability of state input and the distinguishing degree of training samples, so that the learning efficiency and the gesture decision accuracy of the Meta-SAC strategy model are improved.
3. And constructing a fuzzy logic rewarding function, namely constructing the fuzzy logic rewarding function with a multi-rule fusion characteristic by taking performance indexes such as image definition, energy consumption efficiency, flight stability and the like as input, and realizing multi-target guidance on a flight strategy. Compared with the traditional single index feedback mechanism, the method can adaptively adjust the rewarding intensity, and strengthen the performance robustness and generalization capability of the strategy model in complex patrol tasks.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:
FIG. 1 is a flow chart of an unmanned aerial vehicle flight attitude adjustment method based on reinforcement learning provided by the invention;
Fig. 2 is a schematic structural diagram of an unmanned aerial vehicle flight attitude adjustment system based on reinforcement learning.
Detailed Description
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.
Example 1
The unmanned aerial vehicle flight attitude adjustment method based on reinforcement learning, as shown in fig. 1, comprises the following steps:
S100, constructing a dynamic environment grid according to a digital map and a real-time semantic segmentation result, and generating an initial track by introducing a space-time constraint fast search random tree algorithm;
further, the step of constructing the dynamic environment grid is as follows:
Identifying and classifying obstacles in the environment by using a semantic segmentation model based on a digital map of a target inspection area and an image acquired by an airborne sensor;
and fusing the static geographic information and the dynamic obstacle identification information to construct a three-dimensional dynamic environment raster image containing time dimension, wherein the raster records the trafficability, the obstacle type and the dynamic behavior attribute of each area in the environment.
Specifically, based on a preset inspection task area, two-dimensional or three-dimensional digital map data containing information such as topography, height, buildings and the like is loaded. The map data may originate from satellite imagery, a geographic information system, or a high-precision map.
Then, environmental images acquired in real time by using sensors such as an unmanned aerial vehicle-mounted camera or a laser radar are input into a trained semantic segmentation neural network (such as DeepLabV & lt3+ & gt, HRNet & lt/EN & gt) to perform pixel-level recognition and semantic classification on obstacles in the environmental images, such as dynamic or static obstacles such as trees, telegraph poles, personnel and vehicles, and output corresponding semantic tag diagrams.
On the basis, the existing static geographic information (such as roads, buildings, bridges and the like) in the digital map and the dynamic barrier information in the semantic segmentation result are spatially fused to generate a three-dimensional dynamic environment grid containing space coordinates and time labels. The grid is constructed using a fixed resolution rasterization method (e.g., octree or voxel grid), each grid element records information fields of trafficability (e.g., trafficability/non-trafficability/risk areas), and,
Obstacle type (e.g., static building, moving object) and dynamic behavior (e.g., presence or absence of current frame, prediction of direction of motion at next moment).
The construction of the dynamic environment grid can realize unified modeling of static geographic information and dynamic barrier information in the inspection area, so that the space-time resolution of environment expression is improved, and a real-time and quantifiable environment constraint basis is provided for track planning and posture adjustment. By integrating semantic segmentation results and time dimension information, the system can accurately judge the trafficability of each area and the dynamic behavior of the obstacle, and effectively improve the obstacle avoidance capability and safety of path planning.
Further, the step of generating the initial track is as follows:
In the dynamic environment grid, taking the current position of the unmanned aerial vehicle as a root node, taking a target point as an end point, and adopting a fast search random tree algorithm to perform path expansion;
And introducing Lyapunov stability constraint in path expansion, performing track fairing processing through a B spline curve, and outputting an initial flight track sequence containing time and attitude information.
Specifically, the three-dimensional dynamic environment raster image constructed in the step S100 is used as a path search space, the current three-dimensional space position of the unmanned aerial vehicle is used as a root node for quickly searching the random tree, and the target inspection point is used as a target end point. In the RRT path expansion process, an algorithm gradually connects and constructs a tree structure based on feasible nodes randomly sampled in a sampling space, and simultaneously, the time dimension information of an environment grid is combined to ensure the time sequence rationality of the path, so that the collision with a dynamic obstacle is avoided.
In order to improve the dynamic stability of the path and the feasibility of subsequent control, a Lyapunov function is introduced in the path expansion process to carry out stability constraint, namely, in each expansion step, path nodes which can cause gesture divergence or control incapacity are screened out by evaluating the dynamic stability of candidate path segments, and the path is ensured to have global asymptotic stability in a gesture tracking and control input range. The lyapunov function can be expressed as:
;
Wherein, the Representing the current state vector of the unmanned aerial vehicle, comprising state variables such as position, speed, attitude angle and the like,Representing the reference state vector of the object,Represents a symmetrical positive definite matrix, meets the Lyapunov stability condition,Representing a positive function of the state deviation. On the basis, only the path segments meeting the following conditions are reserved in the path expansion process:
;
Wherein, the Is a linear approximate state matrix (which can be obtained by linearization or empirical modeling) of the unmanned aerial vehicle dynamic model in the current state.
And then, carrying out B spline curve interpolation processing on the discrete path node sequence generated by searching to realize continuous fairing of the flight path, and embedding node time stamp and attitude parameter (such as course angle, pitch angle and roll angle) information in the interpolation process to construct a complete initial flight path sequence containing time and attitude constraints.
By adopting a fast random tree searching algorithm introducing Lyapunov stability constraint in a dynamic environment grid, the method can realize efficient three-dimensional track searching, ensure the controllability and stability of a path in a dynamic obstacle environment, avoid generating a path which cannot be tracked or has gesture divergence risk, further carry out fairing processing on the track through a B spline curve, not only promote the continuity and the flyability of the track, but also facilitate the accurate tracking and gesture adjustment of a subsequent controller, simultaneously reserve the time and gesture information of nodes, and provide a structured and high-quality initial flight reference for a downstream reinforcement learning and control module.
S200, acquiring flight state information of an unmanned aerial vehicle on an initial track, environment perception information of an onboard sensor and image definition information, and inputting the information to a space-time attention encoder to generate a state tensor fusing environment semantics;
specifically, the image sharpness information is obtained by calculating average of target edge sharpness, image contrast and image detail retention.
The image definition information comprehensively considers the target edge definition, the image contrast and the image detail retention, and calculates by adopting an average weight strategy, and the formula is as follows:
;
Wherein, the The image sharpness information is represented by a picture,The sharpness of the edge of the object is indicated,Representing the contrast of the image and,Representing the image detail retention, wherein the object edge sharpness is used to measure the sharpness of the object boundary in the image, the sharper the image the sharper. In the embodiment, a Sobel operator is adopted for edge detection, and an average value of the gray gradient amplitude of the image is calculated as a definition index:
;
Wherein, the AndRespectively representing imagesA kind of electronic deviceThe number of rows and columns in the up-stream,AndRepresentation ofThe gradient image in the direction, the contrast of the image reflects the difference of the bright and dark areas, and is an important index of the visual quality of the image. Calculation was performed using standard deviation:
;
Wherein, the The gray value of the pixel of the image is represented,The image detail retention is used for measuring the retention condition of high-frequency information such as texture, contour and the like. In the embodiment, laplace transformation is adopted to conduct second-order differentiation of the image, the detail area of the image is extracted, and then the average value of the absolute value of the detail area is counted to serve as a detail index:
;
Wherein, the Representing an image atThe laplace response value of the position.
Further, the step of generating a state tensor fusing the environment semantics comprises the following steps:
Carrying out normalization preprocessing on flight state information, environment perception information and image definition information of the unmanned aerial vehicle;
inputting the preprocessed information into a space-time attention encoder, and fusing multi-modal data by utilizing a multi-head attention mechanism;
and extracting the correlation characteristics of the spatial context and the time sequence by combining the semantic segmentation result, and constructing a state tensor comprising position information, obstacle threat degree, image definition predicted value and energy consumption estimation.
Specifically, to generate a state tensor fusing environment semantics, firstly, flight state information (including position, speed, acceleration, attitude angle and the like), environment perception information (including obstacle distribution, dynamic target information and the like identified by a laser radar, ultrasonic waves or a visual sensor) and image definition information (including indexes such as edge definition, image contrast, texture detail retention and the like) of an unmanned aerial vehicle on-board flight control system and a sensor system are obtained.
And carrying out normalization pretreatment on the multisource information to unify the numerical range of the multisource information to the [0,1] interval so as to enhance the stability and the calculation efficiency of model input. Specifically, linear normalization can be adopted for positions, speeds and the like, and standard fraction or maximum and minimum normalization mode can be adopted for the image definition index.
And then, the processed flight state vector, the environment perception vector and the image definition vector are input into a space-time attention encoder together, the encoder adopts a multi-head attention mechanism to extract interaction features among different modal information respectively, automatically learns the dependency relationship and weight among the information, and improves the robustness and discriminant of state representation.
The encoder further combines the pixel-level barrier semantic tags output from the semantic segmentation module in the fusion process to perform joint modeling on the spatial context information and the time sequence features. And finally constructing a structured state tensor, wherein the tensor comprises the following fields of current position and track deviation, threat degree estimated values of obstacles in all directions, definition predicted values of current visual angle images and energy consumption rate estimated values in unit time, and the estimated values are used as the state input of a subsequent reinforcement learning strategy network.
The state tensor integrating the environment semantics is generated by inputting the flight state, the environment perception and the image definition information to the space-time attention encoder, so that deep association among the multi-mode information can be effectively mined, and the perception capability of state representation on dynamic environment change is improved. The method combines semantic tags and time sequence characteristics, and enhances the expression integrity and discriminant of the input state.
S300, introducing an information entropy weighting mechanism to the state tensor to perform state importance distribution to obtain a weighted state vector;
Further, the state importance allocation step comprises the following steps:
measuring the change degree of each sub-state component in the state tensor based on the information entropy theory;
Converting the information entropy value into a weight factor for representing the importance degree of each state dimension in policy decision;
And carrying out weighted fusion on the state tensors according to the weight factors to generate weighted state vectors with state importance perception capability.
Specifically, first, a state tensor of the spatiotemporal attention encoder output is obtained, noted asWherein each sub-state componentCorresponding to a particular state characteristic such as positional deviation, obstacle threat level, image sharpness prediction value, energy consumption rate, etc. Then, based on the information entropy theory, the change degree of each sub-state component in a certain time window is counted, and the entropy value is calculated by adopting the following formula:
;
Wherein, the Representing state componentsIn the first placeProbability distribution in each value interval. Then, the entropy value of each sub-stateMapping to weight factorsThe following processing can be performed by normalization operation:
;
the above weight represents the importance of each state component to the current policy decision, and the higher the entropy value is, the greater the fluctuation of the state dimension is, and the more significant the influence on the policy stability is.
Finally, each component of the original state tensor is subjected to weighted fusion according to the obtained weight factors to generate a weighted state vectorAs a subsequent input to the state input of the reinforcement learning strategy model.
By introducing a state importance weighting mechanism based on information entropy, the key degree of different state features in the current task can be adaptively identified, and state information with obvious influence on policy decision is effectively highlighted. The mechanism improves the recognition degree and stability of the state representation, so that the reinforcement learning strategy has higher perception acuity and strategy adaptability when processing complex environment changes, and the accuracy and the robustness of unmanned plane gesture control are enhanced.
S400, inputting the weighted state vector into a Meta-SAC model with Meta-learning capability, and generating an output target flight reference pose parameter and a gain coefficient for dynamically adjusting an LQR controller through an Actor network;
Specifically, a weighted state vector generated by the spatiotemporal attention encoder and processed by the entropy weighting mechanism is taken as an environmental input of the current moment. The state vector contains multi-modal fusion features such as position errors, obstacle threat degrees, image sharpness predictors, and energy consumption rates.
The weighted state vector is taken as input and is sent into a trained Meta-SAC model with Meta-learning capability. The Actor network in the Meta-SAC uses a parameter migration mechanism to quickly adapt to the current environment state, and has the capacity of generalizing with few samples.
The Meta-SAC's Actor network outputs a continuous motion vector comprising two parts:
(1) Target flight reference pose parameters include target position coordinates, attitude angle (pitch, roll, yaw) and desired viewing angle direction of the onboard camera.
(2) And the LQR gain adjustment quantity is used for adaptively adjusting the feedback gain coefficient of the bottom LQR controller under the current task state so as to realize more sensitive or stable gesture tracking control.
After the motion vector is analyzed, the target reference pose is used as a control expected value to be input into the LQR controller, and the adjustment gain is used for updating the feedback matrix of the LQR, so that the dynamic cooperation of the high-level strategy and the bottom-level controller is realized. While preserving the current state vector, action output, prize value (to be calculated), and next state tuples for subsequent online or offline policy updates of the Meta-SAC model.
S500, driving an LQR controller to generate a low-layer flight control instruction to control the unmanned aerial vehicle to execute corresponding gesture actions based on the target flight reference gesture parameters and the LQR gain coefficients;
Further, the step of generating the low-layer flight control instruction by the LQR controller is as follows:
Receiving target flight reference pose parameters output by a Meta-SAC model as expected input;
constructing a system state error vector by combining the current flight state of the unmanned aerial vehicle, and calculating a control instruction according to the dynamically adjusted feedback gain coefficient;
And (3) minimizing a state error and controlling an energy consumption cost function by solving a linear quadratic optimization problem, and outputting a low-level control quantity to drive the aircraft to execute corresponding attitude adjustment.
Specifically, the LQR controller receives target flight reference pose parameters from the Meta-SAC model output, including desired position coordinates, attitude angles (pitch, roll, yaw), and camera perspective adjustment instructions, as desired inputs to the system. The current flight state information, including the actual position, the speed, the attitude angle, the angular speed and the like, is acquired in real time through the unmanned aerial vehicle-mounted sensor to form a current state vector. And calculating the error between the expected pose and the current state to form a state error vector of the system. For example, the position error is the difference between the desired position and the current actual position, and the posture error is the difference between the desired posture and the actual posture. And adjusting a feedback gain matrix K of the LQR controller according to a dynamic feedback gain coefficient provided by the Meta-SAC model, wherein the matrix is used for determining the mapping weight of the state error vector to the control input, so as to realize self-adaptive adjustment. With the state error vector as a variable, defining a linear quadratic cost function:
;
Wherein, the The state error vector is represented as a vector of state errors,The control input vector is represented as such,Representing a state error penalty weight matrix, which is a symmetric positive definite matrix used to quantify the degree of penalty in the event of a state deviation from expectations,Representing a control energy penalty matrix, which is also a symmetric positive definite matrix, for quantifying the degree of penalty in the event of a state deviation from the desired,Typically given by the design of the skilled person,Then it is set to the unit diagonal matrix. Solving optimal feedback gain matrix by algebraic Riccati equationObtaining a feedback control law: i.e. control input From state error vectorsFeedback gainDetermining, finally, the control inputIs analyzed into the speed adjusting instructions of the accelerator, the control surface and the propeller of the aircraft on each shaft, and the generated low-layer flight control instructions.
The self-adaptive adjustment of the controller to different flight environments and task states is realized by introducing the feedback gain coefficient dynamically output by the high-level Meta-SAC model, the linear quadratic optimization model is built by combining the state errors, the attitude precision and the energy consumption efficiency are effectively balanced, and finally the output control instruction has quick response and high precision, so that the attitude stability and the flight safety of the unmanned aerial vehicle in a complex environment can be obviously improved.
S600, constructing a fuzzy logic rewarding function based on energy consumption efficiency and patrol image definition in the process of executing the flight control instruction, and performing strategy updating by taking the rewarding function value as feedback of the Meta-SAC model.
Further, the step of constructing the fuzzy logic rewarding function is as follows:
Acquiring image definition indexes, energy consumption rate and relative distance between obstacles of the unmanned aerial vehicle in the flight process, and carrying out normalization processing on the continuous variables;
Converting the normalized input variable into fuzzy variables, and mapping the fuzzy variables into preset fuzzy membership functions to obtain corresponding fuzzy language values;
fuzzy reasoning is carried out based on a preset fuzzy rule base, and the reasoning rule comprises combination judgment of image definition, energy consumption and obstacle distance;
the reasoning result is converted into specific numerical rewards through a defuzzification method and is used as an instant feedback signal of the Meta-SAC model to update the strategy network.
Specifically, key operation indexes of the unmanned aerial vehicle during the execution of the flight control instruction are collected, wherein the key operation indexes mainly comprise image definition indexes (such as image contrast, edge sharpness, detail retention and the like), energy consumption rate in unit time (such as power and speed relation and propulsion system power output) and relative distance between the unmanned aerial vehicle and environmental obstacles (Euclidean distance calculated based on a laser radar or a depth camera), and the continuous variable data are subjected to normalization processing, so that the numerical range of the continuous variable data is normalized to a range of [0,1] so that subsequent fuzzy calculation is facilitated.
The normalized variables are mapped into preset fuzzy membership functions respectively, for example, the image definition can be divided into three fuzzy sets of low consumption, medium consumption and high consumption, the energy consumption rate is divided into low consumption, medium consumption and high consumption, the obstacle distance is divided into near, medium and far, and the fuzzy sets are defined by triangle, trapezoid or Gaussian membership functions.
A fuzzy rule base is constructed, a plurality of fuzzy reasoning rules (such as high image definition, low energy consumption, long distance, high rewarding, fuzzy image, high energy consumption, short distance, low rewarding) are formulated according to task priority and strategy targets, fuzzy logic reasoning is carried out on the fuzzy language values, and a comprehensive fuzzy output result is calculated.
Finally, common fuzzy solving techniques such as a gravity center method are adopted to convert fuzzy reasoning results into specific reward function values (such as continuous values between [0,1 ]), and the values are used as instant feedback signals to be input into a Meta-SAC model so as to drive the updating of a strategy network, thereby improving the strategy convergence efficiency and the practical effect of flight decisions.
By fusing key indexes such as image definition, energy consumption efficiency, safety distance and the like and adopting a fuzzy reasoning mechanism to flexibly process complex state information, the problem of single threshold sensitivity and expression existing in the traditional rewarding function is effectively avoided, finer and robust optimization guidance of a flight strategy is realized, and therefore autonomous decision performance and execution stability of the unmanned aerial vehicle in a complex dynamic environment are improved.
Example two
The invention also provides an unmanned aerial vehicle flight attitude adjusting system based on reinforcement learning, which has a structure shown in fig. 2 and comprises:
The route planning module is used for constructing a dynamic environment grid according to the digital map and the real-time semantic segmentation result, and generating an initial route by introducing a space-time constraint fast search random tree algorithm;
The information fusion module is used for acquiring flight state information of the unmanned aerial vehicle on an initial track, environment perception information of an onboard sensor and image definition information, inputting the information into the space-time attention encoder and generating a state tensor fusing environment semantics;
the tensor weighting module is used for introducing an information entropy weighting mechanism to the state tensor to distribute the state importance so as to obtain a weighted state vector;
The pose generation module is used for inputting the weighted state vector into a Meta-SAC model with Meta-learning capability, and generating output target flight reference pose parameters and gain coefficients for dynamically adjusting the LQR controller through an Actor network;
The pose control module is used for driving the LQR controller to generate a low-layer flight control instruction so as to control the unmanned aerial vehicle to execute corresponding pose actions based on the target flight reference pose parameters and the LQR gain coefficients;
And the feedback updating module is used for constructing a reward function based on the energy consumption efficiency and the definition of the patrol image in the process of executing the flight control instruction, and taking the reward function value as feedback of the Meta-SAC model to carry out strategy updating.
In this embodiment, the unmanned aerial vehicle flight attitude adjustment system based on reinforcement learning is applied to a daily inspection task of a high-voltage transmission line, and the system is deployed on a multi-rotor unmanned aerial vehicle platform with an onboard high-definition camera, a laser radar and an inertial measurement unit. The specific implementation flow is as follows:
The route planning module calls a preloaded transmission line digital map, performs semantic segmentation by combining images acquired by an onboard camera in real time, and identifies scene elements such as towers, wires, vegetation and the like. And fusing the static geographic data with the information of dynamic obstacles (such as birds and construction equipment) to construct a three-dimensional dynamic environment raster image with time resolution of second level. In the grid, the current unmanned plane position is taken as a starting point, a patrol target point is taken as an end point, an RRT algorithm which introduces space-time constraint and Lyapunov stability judgment is adopted to plan an initial track, and a B spline curve is used for smoothing processing to generate a time track sequence with gesture reference.
Then, the information fusion module acquires data such as flight state (position, speed and gesture), laser radar environment point cloud, image definition index and the like in real time in the flight process of the unmanned aerial vehicle along the initial track, and inputs the data to the space-time attention encoder. The encoder fuses time dependence and space distribution information and outputs state tensors containing position information deviation, obstacle threat degree, image definition predicted value, unit energy consumption and the like.
Then, the tensor weighting module introduces an information entropy weighting mechanism to each dimension of the state tensor, dynamically adjusts the state variable weight according to the current environment complexity and task priority, and outputs a more representative weighted state vector. The vector is used as a Meta-SAC model input into a pose generation module, and an Actor network outputs a target flight reference pose (position and pose) and a corresponding LQR feedback gain adjustment quantity.
Then, the pose control module inputs the reference pose parameters and the dynamic gain coefficients to the LQR controller. The controller calculates a linear quadratic optimal solution based on the current state error, generates low-layer flight control instructions such as acceleration, attitude angular speed and the like, and drives the multi-rotor unmanned aerial vehicle to complete high-precision attitude tracking and track maintenance in real time.
And finally, constructing a fuzzy logic rewarding function by the feedback updating module according to the image quality score, the energy consumption index and the flight stability data acquired in the flight process, and taking the calculation result as the instant feedback of the reinforcement learning strategy model. After the task is finished, the system further optimizes the Meta-SAC strategy weight parameters through offline federal countermeasure training so as to improve the generalization performance and the adaptability of the model in a new environment.
The above formulas are all formulas with dimensions removed and numerical values calculated, the formulas are formulas with a large amount of data collected for software simulation to obtain the latest real situation, and preset parameters in the formulas are set by those skilled in the art according to the actual situation.
The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product.
Those of ordinary skill in the art will appreciate that the various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Finally, the foregoing description of the preferred embodiment of the invention is provided for the purpose of illustration only, and is not intended to limit the invention to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.
Claims (8)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202510835183.7A CN120595851A (en) | 2025-06-20 | 2025-06-20 | UAV flight attitude adjustment method and system based on reinforcement learning |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202510835183.7A CN120595851A (en) | 2025-06-20 | 2025-06-20 | UAV flight attitude adjustment method and system based on reinforcement learning |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN120595851A true CN120595851A (en) | 2025-09-05 |
Family
ID=96883463
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202510835183.7A Pending CN120595851A (en) | 2025-06-20 | 2025-06-20 | UAV flight attitude adjustment method and system based on reinforcement learning |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN120595851A (en) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN120848590A (en) * | 2025-09-23 | 2025-10-28 | 浙江孚临科技有限公司 | A cross-modal attention feature fusion method for aerial robots |
| CN120874957A (en) * | 2025-09-26 | 2025-10-31 | 浙江孚临科技有限公司 | Layered reinforcement learning control method for aerial robot |
| CN121209566A (en) * | 2025-11-25 | 2025-12-26 | 天津天境飞航科技有限公司 | Unmanned plane path planning method and system for self-adaptive dynamic planning |
| CN120874957B (en) * | 2025-09-26 | 2026-02-06 | 浙江孚临科技有限公司 | Layered reinforcement learning control method for aerial robot |
-
2025
- 2025-06-20 CN CN202510835183.7A patent/CN120595851A/en active Pending
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN120848590A (en) * | 2025-09-23 | 2025-10-28 | 浙江孚临科技有限公司 | A cross-modal attention feature fusion method for aerial robots |
| CN120848590B (en) * | 2025-09-23 | 2025-12-02 | 浙江孚临科技有限公司 | A method for cross-modal attention feature fusion in aerial robots |
| CN120874957A (en) * | 2025-09-26 | 2025-10-31 | 浙江孚临科技有限公司 | Layered reinforcement learning control method for aerial robot |
| CN120874957B (en) * | 2025-09-26 | 2026-02-06 | 浙江孚临科技有限公司 | Layered reinforcement learning control method for aerial robot |
| CN121209566A (en) * | 2025-11-25 | 2025-12-26 | 天津天境飞航科技有限公司 | Unmanned plane path planning method and system for self-adaptive dynamic planning |
| CN121209566B (en) * | 2025-11-25 | 2026-01-23 | 天津天境飞航科技有限公司 | Adaptive dynamic programming method and system for UAV path planning |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN118504925B (en) | Unmanned aerial vehicle low-altitude monitoring method | |
| CN118707988B (en) | A UAV-assisted inspection method and system for power transmission lines | |
| CN119937623A (en) | Automatic selection of obstacle avoidance points and obstacle avoidance method for photovoltaic stations inspected by drones | |
| CN110866887A (en) | Target situation fusion sensing method and system based on multiple sensors | |
| CN120595851A (en) | UAV flight attitude adjustment method and system based on reinforcement learning | |
| CN118565476A (en) | UAV navigation and path planning method based on deep learning | |
| Xiao et al. | Vision-based learning for drones: A survey | |
| CN119085695B (en) | Obstacle map marking method and system combined with unmanned vehicle | |
| CN120235065B (en) | Unmanned aerial vehicle parking apron dynamic landing point adjustment method and system based on environment awareness | |
| CN118107822A (en) | A complex environment search and rescue method based on drone | |
| CN119645083A (en) | A UAV inspection control system and method based on data analysis | |
| CN114326821A (en) | Unmanned aerial vehicle autonomous obstacle avoidance system and method based on deep reinforcement learning | |
| CN117589167A (en) | Unmanned aerial vehicle routing inspection route planning method based on three-dimensional point cloud model | |
| CN117369479B (en) | Unmanned aerial vehicle obstacle early warning method and system based on oblique photogrammetry technology | |
| CN119356391A (en) | A three-dimensional AI perception direction method and system based on drone | |
| CN120803005B (en) | Deep Learning-Based Autonomous Obstacle Avoidance and Path Planning Methods and Systems for Unmanned Aerial Vehicles | |
| CN120356365B (en) | A low-altitude UAV traffic operation risk warning method and system | |
| CN119904766B (en) | Environment complexity-based end-to-end unmanned aerial vehicle autonomous control method | |
| CN120610568A (en) | Obstacle avoidance method and system for electric UAV based on multimodal perception and reinforcement learning | |
| Yubo et al. | Survey of UAV autonomous landing based on vision processing | |
| CN118795460A (en) | Unmanned aerial vehicle track tracking method based on perception radar | |
| Parlange et al. | Leveraging single-shot detection and random sample consensus for wind turbine blade inspection | |
| Bi | Multimodal sensor collaborative information sensing technology | |
| CN115576327B (en) | Autonomous learning method based on edge computing and reasoning of autonomous driving smart car | |
| CN120740587B (en) | Unmanned aerial vehicle target tracking and intelligent route planning method and system based on YOLO and DSM |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |