[go: up one dir, main page]

CN120595851A - UAV flight attitude adjustment method and system based on reinforcement learning - Google Patents

UAV flight attitude adjustment method and system based on reinforcement learning

Info

Publication number
CN120595851A
CN120595851A CN202510835183.7A CN202510835183A CN120595851A CN 120595851 A CN120595851 A CN 120595851A CN 202510835183 A CN202510835183 A CN 202510835183A CN 120595851 A CN120595851 A CN 120595851A
Authority
CN
China
Prior art keywords
state
information
flight
meta
uav
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202510835183.7A
Other languages
Chinese (zh)
Inventor
刘莉莉
王辉
吴朝晖
艾纪平
陈诚
邵帅
胡润明
何斯维
邹建晔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yingtan Power Supply Co of State Grid Jiangxi Electric Power Co Ltd
Original Assignee
Yingtan Power Supply Co of State Grid Jiangxi Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yingtan Power Supply Co of State Grid Jiangxi Electric Power Co Ltd filed Critical Yingtan Power Supply Co of State Grid Jiangxi Electric Power Co Ltd
Priority to CN202510835183.7A priority Critical patent/CN120595851A/en
Publication of CN120595851A publication Critical patent/CN120595851A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/40Control within particular dimensions
    • G05D1/49Control of attitude, i.e. control of roll, pitch or yaw
    • G05D1/495Control of attitude, i.e. control of roll, pitch or yaw to ensure stability
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/0265Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
    • G05B13/027Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks only
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/40Control within particular dimensions
    • G05D1/46Control of position or course in three dimensions
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D2101/00Details of software or hardware architectures used for the control of position
    • G05D2101/10Details of software or hardware architectures used for the control of position using artificial intelligence [AI] techniques
    • G05D2101/15Details of software or hardware architectures used for the control of position using artificial intelligence [AI] techniques using machine learning, e.g. neural networks
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D2109/00Types of controlled vehicles
    • G05D2109/20Aircraft, e.g. drones

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Automation & Control Theory (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention relates to the technical field of unmanned aerial vehicle flight attitude control, in particular to an unmanned aerial vehicle flight attitude adjustment method and system based on reinforcement learning. The method comprises the steps of constructing a dynamic environment raster image based on a digital map and a real-time semantic segmentation result, generating an initial track through a fast search random tree algorithm introducing space-time constraint, collecting unmanned aerial vehicle flight state information, environment perception information and image definition indexes, inputting a space-time attention coder to generate a state tensor fusing semantics, carrying out importance distribution on the state tensor through an information entropy weighting mechanism to obtain a weighted state vector, inputting the weighted state vector to a Meta-SAC model with Meta learning capability, outputting a target flight reference pose and an LQR controller dynamic gain coefficient, driving the LQR controller to generate a flight control instruction based on the information by a control module, constructing a reward function based on image quality and energy consumption efficiency, and feeding back a reinforcement learning strategy in real time.

Description

Unmanned aerial vehicle flight attitude adjustment method and system based on reinforcement learning
Technical Field
The invention relates to the technical field of unmanned aerial vehicle flight attitude control, in particular to an unmanned aerial vehicle flight attitude adjustment method and system based on reinforcement learning.
Background
Unmanned aerial vehicle has obtained wide application in fields such as electric power inspection, geological survey, security protection control, especially in electric power line inspection task, unmanned aerial vehicle can replace the manual work to accomplish high-risk, complicated high altitude inspection task. However, the conventional unmanned aerial vehicle relies on manual remote control or a preset route to execute tasks, has the problems of poor autonomy, weak environmental adaptability, insufficient flight control precision and the like, and is difficult to meet the requirements of high-quality image acquisition and high-efficiency operation in a complex environment.
Most of the current flight control systems adopt a mode based on PID (proportion-integral-derivative) control or fixed route planning, and the methods depend on static rules, lack self-adaptive capacity and are difficult to cope with complex factors such as dynamic obstacles, wind field disturbance and the like. In the multi-target task, the existing control method cannot cooperatively process a plurality of indexes, so that the control effect is poor, and problems such as image blurring, target loss, route deviation and the like are easy to occur. In addition, the system for partially integrating path planning and control optimization adopts a staged processing strategy, lacks an end-to-end feedback mechanism, cannot continuously optimize the control strategy in the execution process, and limits the global performance capability of the system. Particularly, in high-precision aerial photography or complex-structure inspection, the cooperative control requirement on the flight attitude and the camera visual angle is high, and the traditional controller is difficult to achieve good balance between precision and real-time performance.
Therefore, a flight control method with adaptive learning capability and dynamic feedback optimization is needed to improve the intelligent level and operation performance of the unmanned aerial vehicle in the real inspection task.
Disclosure of Invention
The invention provides an unmanned aerial vehicle flight attitude adjustment method and system based on reinforcement learning. The unmanned aerial vehicle flight control system aims at solving the problems that a traditional unmanned aerial vehicle flight control system is poor in dynamic environment adaptability, inaccurate in posture adjustment, unstable in image acquisition quality and the like.
In order to achieve the above purpose, the present invention provides the following technical solutions:
The unmanned aerial vehicle flight attitude adjustment method based on reinforcement learning comprises the following steps:
Constructing a dynamic environment grid according to the digital map and the real-time semantic segmentation result, and generating an initial track by introducing a space-time constraint fast search random tree algorithm;
Acquiring flight state information of an unmanned aerial vehicle on an initial track, environment perception information of an onboard sensor and image definition information, inputting the information into a space-time attention encoder, and generating a state tensor fusing environment semantics;
Introducing an information entropy weighting mechanism to the state tensor to perform state importance distribution to obtain a weighted state vector;
inputting the weighted state vector into a Meta-SAC model with Meta-learning capability, and generating output target flight reference pose parameters and gain coefficients for dynamically adjusting the LQR controller through an Actor network;
based on the target flight reference pose parameters and the LQR gain coefficients, driving an LQR controller to generate a low-layer flight control instruction so as to control the unmanned aerial vehicle to execute corresponding pose actions;
In the process of executing the flight control instruction, a fuzzy logic reward function is constructed based on the energy consumption efficiency and the definition of the patrol image, and the reward function value is used as feedback of the Meta-SAC model to carry out strategy updating.
Further, the step of constructing the dynamic environment grid is as follows:
Identifying and classifying obstacles in the environment by using a semantic segmentation model based on a digital map of a target inspection area and an image acquired by an airborne sensor;
and fusing the static geographic information and the dynamic obstacle identification information to construct a three-dimensional dynamic environment raster image containing time dimension, wherein the raster records the trafficability, the obstacle type and the dynamic behavior attribute of each area in the environment.
Further, the step of generating the initial track is as follows:
In the dynamic environment grid, taking the current position of the unmanned aerial vehicle as a root node, taking a target point as an end point, and adopting a fast search random tree algorithm to perform path expansion;
And introducing Lyapunov stability constraint in path expansion, performing track fairing processing through a B spline curve, and outputting an initial flight track sequence containing time and attitude information.
Further, the step of generating a state tensor fusing the environment semantics comprises the following steps:
Carrying out normalization preprocessing on flight state information, environment perception information and image definition information of the unmanned aerial vehicle;
inputting the preprocessed information into a space-time attention encoder, and fusing multi-modal data by utilizing a multi-head attention mechanism;
and extracting the correlation characteristics of the spatial context and the time sequence by combining the semantic segmentation result, and constructing a state tensor comprising position information, obstacle threat degree, image definition predicted value and energy consumption estimation.
Further, the state importance allocation step comprises the following steps:
measuring the change degree of each sub-state component in the state tensor based on the information entropy theory;
Converting the information entropy value into a weight factor for representing the importance degree of each state dimension in policy decision;
And carrying out weighted fusion on the state tensors according to the weight factors to generate weighted state vectors with state importance perception capability.
Further, the step of generating the low-layer flight control instruction by the LQR controller is as follows:
Receiving target flight reference pose parameters output by a Meta-SAC model as expected input;
constructing a system state error vector by combining the current flight state of the unmanned aerial vehicle, and calculating a control instruction according to the dynamically adjusted feedback gain coefficient;
And (3) minimizing a state error and controlling an energy consumption cost function by solving a linear quadratic optimization problem, and outputting a low-level control quantity to drive the aircraft to execute corresponding attitude adjustment.
Further, the step of constructing the fuzzy logic rewarding function is as follows:
Acquiring image definition indexes, energy consumption rate and relative distance between obstacles of the unmanned aerial vehicle in the flight process, and carrying out normalization processing on the continuous variables;
Converting the normalized input variable into fuzzy variables, and mapping the fuzzy variables into preset fuzzy membership functions to obtain corresponding fuzzy language values;
fuzzy reasoning is carried out based on a preset fuzzy rule base, and the reasoning rule comprises combination judgment of image definition, energy consumption and obstacle distance;
the reasoning result is converted into specific numerical rewards through a defuzzification method and is used as an instant feedback signal of the Meta-SAC model to update the strategy network.
The invention also provides an unmanned aerial vehicle flight attitude adjusting system based on reinforcement learning, which comprises:
The route planning module is used for constructing a dynamic environment grid according to the digital map and the real-time semantic segmentation result, and generating an initial route by introducing a space-time constraint fast search random tree algorithm;
The information fusion module is used for acquiring flight state information of the unmanned aerial vehicle on an initial track, environment perception information of an onboard sensor and image definition information, inputting the information into the space-time attention encoder and generating a state tensor fusing environment semantics;
the tensor weighting module is used for introducing an information entropy weighting mechanism to the state tensor to distribute the state importance so as to obtain a weighted state vector;
The pose generation module is used for inputting the weighted state vector into a Meta-SAC model with Meta-learning capability, and generating output target flight reference pose parameters and gain coefficients for dynamically adjusting the LQR controller through an Actor network;
The pose control module is used for driving the LQR controller to generate a low-layer flight control instruction so as to control the unmanned aerial vehicle to execute corresponding pose actions based on the target flight reference pose parameters and the LQR gain coefficients;
And the feedback updating module is used for constructing a reward function based on the energy consumption efficiency and the definition of the patrol image in the process of executing the flight control instruction, and taking the reward function value as feedback of the Meta-SAC model to carry out strategy updating.
The beneficial effects of the invention are as follows:
1. and a space-time constraint fast search random tree algorithm is introduced, namely, the response capability of the flight path to dynamic environment change is remarkably improved by incorporating the time window and the obstacle dynamic behavior modeling into the RRT path planning process. By combining Lyapunov stability constraint and B-spline fairing processing, the generated track not only meets the dynamic accessibility requirement, but also has continuity and smoothness, and the flight safety and path controllability are effectively improved.
2. And (3) introducing an information entropy weight mechanism to weight tensors, namely carrying out dynamic importance distribution on state tensors generated by multi-mode perception by adopting the information entropy weight mechanism, and realizing the attention enhancement on key environment variables (such as image fuzzy areas or high threat barriers). The mechanism improves the representation capability of state input and the distinguishing degree of training samples, so that the learning efficiency and the gesture decision accuracy of the Meta-SAC strategy model are improved.
3. And constructing a fuzzy logic rewarding function, namely constructing the fuzzy logic rewarding function with a multi-rule fusion characteristic by taking performance indexes such as image definition, energy consumption efficiency, flight stability and the like as input, and realizing multi-target guidance on a flight strategy. Compared with the traditional single index feedback mechanism, the method can adaptively adjust the rewarding intensity, and strengthen the performance robustness and generalization capability of the strategy model in complex patrol tasks.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:
FIG. 1 is a flow chart of an unmanned aerial vehicle flight attitude adjustment method based on reinforcement learning provided by the invention;
Fig. 2 is a schematic structural diagram of an unmanned aerial vehicle flight attitude adjustment system based on reinforcement learning.
Detailed Description
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.
Example 1
The unmanned aerial vehicle flight attitude adjustment method based on reinforcement learning, as shown in fig. 1, comprises the following steps:
S100, constructing a dynamic environment grid according to a digital map and a real-time semantic segmentation result, and generating an initial track by introducing a space-time constraint fast search random tree algorithm;
further, the step of constructing the dynamic environment grid is as follows:
Identifying and classifying obstacles in the environment by using a semantic segmentation model based on a digital map of a target inspection area and an image acquired by an airborne sensor;
and fusing the static geographic information and the dynamic obstacle identification information to construct a three-dimensional dynamic environment raster image containing time dimension, wherein the raster records the trafficability, the obstacle type and the dynamic behavior attribute of each area in the environment.
Specifically, based on a preset inspection task area, two-dimensional or three-dimensional digital map data containing information such as topography, height, buildings and the like is loaded. The map data may originate from satellite imagery, a geographic information system, or a high-precision map.
Then, environmental images acquired in real time by using sensors such as an unmanned aerial vehicle-mounted camera or a laser radar are input into a trained semantic segmentation neural network (such as DeepLabV & lt3+ & gt, HRNet & lt/EN & gt) to perform pixel-level recognition and semantic classification on obstacles in the environmental images, such as dynamic or static obstacles such as trees, telegraph poles, personnel and vehicles, and output corresponding semantic tag diagrams.
On the basis, the existing static geographic information (such as roads, buildings, bridges and the like) in the digital map and the dynamic barrier information in the semantic segmentation result are spatially fused to generate a three-dimensional dynamic environment grid containing space coordinates and time labels. The grid is constructed using a fixed resolution rasterization method (e.g., octree or voxel grid), each grid element records information fields of trafficability (e.g., trafficability/non-trafficability/risk areas), and,
Obstacle type (e.g., static building, moving object) and dynamic behavior (e.g., presence or absence of current frame, prediction of direction of motion at next moment).
The construction of the dynamic environment grid can realize unified modeling of static geographic information and dynamic barrier information in the inspection area, so that the space-time resolution of environment expression is improved, and a real-time and quantifiable environment constraint basis is provided for track planning and posture adjustment. By integrating semantic segmentation results and time dimension information, the system can accurately judge the trafficability of each area and the dynamic behavior of the obstacle, and effectively improve the obstacle avoidance capability and safety of path planning.
Further, the step of generating the initial track is as follows:
In the dynamic environment grid, taking the current position of the unmanned aerial vehicle as a root node, taking a target point as an end point, and adopting a fast search random tree algorithm to perform path expansion;
And introducing Lyapunov stability constraint in path expansion, performing track fairing processing through a B spline curve, and outputting an initial flight track sequence containing time and attitude information.
Specifically, the three-dimensional dynamic environment raster image constructed in the step S100 is used as a path search space, the current three-dimensional space position of the unmanned aerial vehicle is used as a root node for quickly searching the random tree, and the target inspection point is used as a target end point. In the RRT path expansion process, an algorithm gradually connects and constructs a tree structure based on feasible nodes randomly sampled in a sampling space, and simultaneously, the time dimension information of an environment grid is combined to ensure the time sequence rationality of the path, so that the collision with a dynamic obstacle is avoided.
In order to improve the dynamic stability of the path and the feasibility of subsequent control, a Lyapunov function is introduced in the path expansion process to carry out stability constraint, namely, in each expansion step, path nodes which can cause gesture divergence or control incapacity are screened out by evaluating the dynamic stability of candidate path segments, and the path is ensured to have global asymptotic stability in a gesture tracking and control input range. The lyapunov function can be expressed as:
;
Wherein, the Representing the current state vector of the unmanned aerial vehicle, comprising state variables such as position, speed, attitude angle and the like,Representing the reference state vector of the object,Represents a symmetrical positive definite matrix, meets the Lyapunov stability condition,Representing a positive function of the state deviation. On the basis, only the path segments meeting the following conditions are reserved in the path expansion process:
;
Wherein, the Is a linear approximate state matrix (which can be obtained by linearization or empirical modeling) of the unmanned aerial vehicle dynamic model in the current state.
And then, carrying out B spline curve interpolation processing on the discrete path node sequence generated by searching to realize continuous fairing of the flight path, and embedding node time stamp and attitude parameter (such as course angle, pitch angle and roll angle) information in the interpolation process to construct a complete initial flight path sequence containing time and attitude constraints.
By adopting a fast random tree searching algorithm introducing Lyapunov stability constraint in a dynamic environment grid, the method can realize efficient three-dimensional track searching, ensure the controllability and stability of a path in a dynamic obstacle environment, avoid generating a path which cannot be tracked or has gesture divergence risk, further carry out fairing processing on the track through a B spline curve, not only promote the continuity and the flyability of the track, but also facilitate the accurate tracking and gesture adjustment of a subsequent controller, simultaneously reserve the time and gesture information of nodes, and provide a structured and high-quality initial flight reference for a downstream reinforcement learning and control module.
S200, acquiring flight state information of an unmanned aerial vehicle on an initial track, environment perception information of an onboard sensor and image definition information, and inputting the information to a space-time attention encoder to generate a state tensor fusing environment semantics;
specifically, the image sharpness information is obtained by calculating average of target edge sharpness, image contrast and image detail retention.
The image definition information comprehensively considers the target edge definition, the image contrast and the image detail retention, and calculates by adopting an average weight strategy, and the formula is as follows:
;
Wherein, the The image sharpness information is represented by a picture,The sharpness of the edge of the object is indicated,Representing the contrast of the image and,Representing the image detail retention, wherein the object edge sharpness is used to measure the sharpness of the object boundary in the image, the sharper the image the sharper. In the embodiment, a Sobel operator is adopted for edge detection, and an average value of the gray gradient amplitude of the image is calculated as a definition index:
;
Wherein, the AndRespectively representing imagesA kind of electronic deviceThe number of rows and columns in the up-stream,AndRepresentation ofThe gradient image in the direction, the contrast of the image reflects the difference of the bright and dark areas, and is an important index of the visual quality of the image. Calculation was performed using standard deviation:
;
Wherein, the The gray value of the pixel of the image is represented,The image detail retention is used for measuring the retention condition of high-frequency information such as texture, contour and the like. In the embodiment, laplace transformation is adopted to conduct second-order differentiation of the image, the detail area of the image is extracted, and then the average value of the absolute value of the detail area is counted to serve as a detail index:
;
Wherein, the Representing an image atThe laplace response value of the position.
Further, the step of generating a state tensor fusing the environment semantics comprises the following steps:
Carrying out normalization preprocessing on flight state information, environment perception information and image definition information of the unmanned aerial vehicle;
inputting the preprocessed information into a space-time attention encoder, and fusing multi-modal data by utilizing a multi-head attention mechanism;
and extracting the correlation characteristics of the spatial context and the time sequence by combining the semantic segmentation result, and constructing a state tensor comprising position information, obstacle threat degree, image definition predicted value and energy consumption estimation.
Specifically, to generate a state tensor fusing environment semantics, firstly, flight state information (including position, speed, acceleration, attitude angle and the like), environment perception information (including obstacle distribution, dynamic target information and the like identified by a laser radar, ultrasonic waves or a visual sensor) and image definition information (including indexes such as edge definition, image contrast, texture detail retention and the like) of an unmanned aerial vehicle on-board flight control system and a sensor system are obtained.
And carrying out normalization pretreatment on the multisource information to unify the numerical range of the multisource information to the [0,1] interval so as to enhance the stability and the calculation efficiency of model input. Specifically, linear normalization can be adopted for positions, speeds and the like, and standard fraction or maximum and minimum normalization mode can be adopted for the image definition index.
And then, the processed flight state vector, the environment perception vector and the image definition vector are input into a space-time attention encoder together, the encoder adopts a multi-head attention mechanism to extract interaction features among different modal information respectively, automatically learns the dependency relationship and weight among the information, and improves the robustness and discriminant of state representation.
The encoder further combines the pixel-level barrier semantic tags output from the semantic segmentation module in the fusion process to perform joint modeling on the spatial context information and the time sequence features. And finally constructing a structured state tensor, wherein the tensor comprises the following fields of current position and track deviation, threat degree estimated values of obstacles in all directions, definition predicted values of current visual angle images and energy consumption rate estimated values in unit time, and the estimated values are used as the state input of a subsequent reinforcement learning strategy network.
The state tensor integrating the environment semantics is generated by inputting the flight state, the environment perception and the image definition information to the space-time attention encoder, so that deep association among the multi-mode information can be effectively mined, and the perception capability of state representation on dynamic environment change is improved. The method combines semantic tags and time sequence characteristics, and enhances the expression integrity and discriminant of the input state.
S300, introducing an information entropy weighting mechanism to the state tensor to perform state importance distribution to obtain a weighted state vector;
Further, the state importance allocation step comprises the following steps:
measuring the change degree of each sub-state component in the state tensor based on the information entropy theory;
Converting the information entropy value into a weight factor for representing the importance degree of each state dimension in policy decision;
And carrying out weighted fusion on the state tensors according to the weight factors to generate weighted state vectors with state importance perception capability.
Specifically, first, a state tensor of the spatiotemporal attention encoder output is obtained, noted asWherein each sub-state componentCorresponding to a particular state characteristic such as positional deviation, obstacle threat level, image sharpness prediction value, energy consumption rate, etc. Then, based on the information entropy theory, the change degree of each sub-state component in a certain time window is counted, and the entropy value is calculated by adopting the following formula:
;
Wherein, the Representing state componentsIn the first placeProbability distribution in each value interval. Then, the entropy value of each sub-stateMapping to weight factorsThe following processing can be performed by normalization operation:
;
the above weight represents the importance of each state component to the current policy decision, and the higher the entropy value is, the greater the fluctuation of the state dimension is, and the more significant the influence on the policy stability is.
Finally, each component of the original state tensor is subjected to weighted fusion according to the obtained weight factors to generate a weighted state vectorAs a subsequent input to the state input of the reinforcement learning strategy model.
By introducing a state importance weighting mechanism based on information entropy, the key degree of different state features in the current task can be adaptively identified, and state information with obvious influence on policy decision is effectively highlighted. The mechanism improves the recognition degree and stability of the state representation, so that the reinforcement learning strategy has higher perception acuity and strategy adaptability when processing complex environment changes, and the accuracy and the robustness of unmanned plane gesture control are enhanced.
S400, inputting the weighted state vector into a Meta-SAC model with Meta-learning capability, and generating an output target flight reference pose parameter and a gain coefficient for dynamically adjusting an LQR controller through an Actor network;
Specifically, a weighted state vector generated by the spatiotemporal attention encoder and processed by the entropy weighting mechanism is taken as an environmental input of the current moment. The state vector contains multi-modal fusion features such as position errors, obstacle threat degrees, image sharpness predictors, and energy consumption rates.
The weighted state vector is taken as input and is sent into a trained Meta-SAC model with Meta-learning capability. The Actor network in the Meta-SAC uses a parameter migration mechanism to quickly adapt to the current environment state, and has the capacity of generalizing with few samples.
The Meta-SAC's Actor network outputs a continuous motion vector comprising two parts:
(1) Target flight reference pose parameters include target position coordinates, attitude angle (pitch, roll, yaw) and desired viewing angle direction of the onboard camera.
(2) And the LQR gain adjustment quantity is used for adaptively adjusting the feedback gain coefficient of the bottom LQR controller under the current task state so as to realize more sensitive or stable gesture tracking control.
After the motion vector is analyzed, the target reference pose is used as a control expected value to be input into the LQR controller, and the adjustment gain is used for updating the feedback matrix of the LQR, so that the dynamic cooperation of the high-level strategy and the bottom-level controller is realized. While preserving the current state vector, action output, prize value (to be calculated), and next state tuples for subsequent online or offline policy updates of the Meta-SAC model.
S500, driving an LQR controller to generate a low-layer flight control instruction to control the unmanned aerial vehicle to execute corresponding gesture actions based on the target flight reference gesture parameters and the LQR gain coefficients;
Further, the step of generating the low-layer flight control instruction by the LQR controller is as follows:
Receiving target flight reference pose parameters output by a Meta-SAC model as expected input;
constructing a system state error vector by combining the current flight state of the unmanned aerial vehicle, and calculating a control instruction according to the dynamically adjusted feedback gain coefficient;
And (3) minimizing a state error and controlling an energy consumption cost function by solving a linear quadratic optimization problem, and outputting a low-level control quantity to drive the aircraft to execute corresponding attitude adjustment.
Specifically, the LQR controller receives target flight reference pose parameters from the Meta-SAC model output, including desired position coordinates, attitude angles (pitch, roll, yaw), and camera perspective adjustment instructions, as desired inputs to the system. The current flight state information, including the actual position, the speed, the attitude angle, the angular speed and the like, is acquired in real time through the unmanned aerial vehicle-mounted sensor to form a current state vector. And calculating the error between the expected pose and the current state to form a state error vector of the system. For example, the position error is the difference between the desired position and the current actual position, and the posture error is the difference between the desired posture and the actual posture. And adjusting a feedback gain matrix K of the LQR controller according to a dynamic feedback gain coefficient provided by the Meta-SAC model, wherein the matrix is used for determining the mapping weight of the state error vector to the control input, so as to realize self-adaptive adjustment. With the state error vector as a variable, defining a linear quadratic cost function:
;
Wherein, the The state error vector is represented as a vector of state errors,The control input vector is represented as such,Representing a state error penalty weight matrix, which is a symmetric positive definite matrix used to quantify the degree of penalty in the event of a state deviation from expectations,Representing a control energy penalty matrix, which is also a symmetric positive definite matrix, for quantifying the degree of penalty in the event of a state deviation from the desired,Typically given by the design of the skilled person,Then it is set to the unit diagonal matrix. Solving optimal feedback gain matrix by algebraic Riccati equationObtaining a feedback control law: i.e. control input From state error vectorsFeedback gainDetermining, finally, the control inputIs analyzed into the speed adjusting instructions of the accelerator, the control surface and the propeller of the aircraft on each shaft, and the generated low-layer flight control instructions.
The self-adaptive adjustment of the controller to different flight environments and task states is realized by introducing the feedback gain coefficient dynamically output by the high-level Meta-SAC model, the linear quadratic optimization model is built by combining the state errors, the attitude precision and the energy consumption efficiency are effectively balanced, and finally the output control instruction has quick response and high precision, so that the attitude stability and the flight safety of the unmanned aerial vehicle in a complex environment can be obviously improved.
S600, constructing a fuzzy logic rewarding function based on energy consumption efficiency and patrol image definition in the process of executing the flight control instruction, and performing strategy updating by taking the rewarding function value as feedback of the Meta-SAC model.
Further, the step of constructing the fuzzy logic rewarding function is as follows:
Acquiring image definition indexes, energy consumption rate and relative distance between obstacles of the unmanned aerial vehicle in the flight process, and carrying out normalization processing on the continuous variables;
Converting the normalized input variable into fuzzy variables, and mapping the fuzzy variables into preset fuzzy membership functions to obtain corresponding fuzzy language values;
fuzzy reasoning is carried out based on a preset fuzzy rule base, and the reasoning rule comprises combination judgment of image definition, energy consumption and obstacle distance;
the reasoning result is converted into specific numerical rewards through a defuzzification method and is used as an instant feedback signal of the Meta-SAC model to update the strategy network.
Specifically, key operation indexes of the unmanned aerial vehicle during the execution of the flight control instruction are collected, wherein the key operation indexes mainly comprise image definition indexes (such as image contrast, edge sharpness, detail retention and the like), energy consumption rate in unit time (such as power and speed relation and propulsion system power output) and relative distance between the unmanned aerial vehicle and environmental obstacles (Euclidean distance calculated based on a laser radar or a depth camera), and the continuous variable data are subjected to normalization processing, so that the numerical range of the continuous variable data is normalized to a range of [0,1] so that subsequent fuzzy calculation is facilitated.
The normalized variables are mapped into preset fuzzy membership functions respectively, for example, the image definition can be divided into three fuzzy sets of low consumption, medium consumption and high consumption, the energy consumption rate is divided into low consumption, medium consumption and high consumption, the obstacle distance is divided into near, medium and far, and the fuzzy sets are defined by triangle, trapezoid or Gaussian membership functions.
A fuzzy rule base is constructed, a plurality of fuzzy reasoning rules (such as high image definition, low energy consumption, long distance, high rewarding, fuzzy image, high energy consumption, short distance, low rewarding) are formulated according to task priority and strategy targets, fuzzy logic reasoning is carried out on the fuzzy language values, and a comprehensive fuzzy output result is calculated.
Finally, common fuzzy solving techniques such as a gravity center method are adopted to convert fuzzy reasoning results into specific reward function values (such as continuous values between [0,1 ]), and the values are used as instant feedback signals to be input into a Meta-SAC model so as to drive the updating of a strategy network, thereby improving the strategy convergence efficiency and the practical effect of flight decisions.
By fusing key indexes such as image definition, energy consumption efficiency, safety distance and the like and adopting a fuzzy reasoning mechanism to flexibly process complex state information, the problem of single threshold sensitivity and expression existing in the traditional rewarding function is effectively avoided, finer and robust optimization guidance of a flight strategy is realized, and therefore autonomous decision performance and execution stability of the unmanned aerial vehicle in a complex dynamic environment are improved.
Example two
The invention also provides an unmanned aerial vehicle flight attitude adjusting system based on reinforcement learning, which has a structure shown in fig. 2 and comprises:
The route planning module is used for constructing a dynamic environment grid according to the digital map and the real-time semantic segmentation result, and generating an initial route by introducing a space-time constraint fast search random tree algorithm;
The information fusion module is used for acquiring flight state information of the unmanned aerial vehicle on an initial track, environment perception information of an onboard sensor and image definition information, inputting the information into the space-time attention encoder and generating a state tensor fusing environment semantics;
the tensor weighting module is used for introducing an information entropy weighting mechanism to the state tensor to distribute the state importance so as to obtain a weighted state vector;
The pose generation module is used for inputting the weighted state vector into a Meta-SAC model with Meta-learning capability, and generating output target flight reference pose parameters and gain coefficients for dynamically adjusting the LQR controller through an Actor network;
The pose control module is used for driving the LQR controller to generate a low-layer flight control instruction so as to control the unmanned aerial vehicle to execute corresponding pose actions based on the target flight reference pose parameters and the LQR gain coefficients;
And the feedback updating module is used for constructing a reward function based on the energy consumption efficiency and the definition of the patrol image in the process of executing the flight control instruction, and taking the reward function value as feedback of the Meta-SAC model to carry out strategy updating.
In this embodiment, the unmanned aerial vehicle flight attitude adjustment system based on reinforcement learning is applied to a daily inspection task of a high-voltage transmission line, and the system is deployed on a multi-rotor unmanned aerial vehicle platform with an onboard high-definition camera, a laser radar and an inertial measurement unit. The specific implementation flow is as follows:
The route planning module calls a preloaded transmission line digital map, performs semantic segmentation by combining images acquired by an onboard camera in real time, and identifies scene elements such as towers, wires, vegetation and the like. And fusing the static geographic data with the information of dynamic obstacles (such as birds and construction equipment) to construct a three-dimensional dynamic environment raster image with time resolution of second level. In the grid, the current unmanned plane position is taken as a starting point, a patrol target point is taken as an end point, an RRT algorithm which introduces space-time constraint and Lyapunov stability judgment is adopted to plan an initial track, and a B spline curve is used for smoothing processing to generate a time track sequence with gesture reference.
Then, the information fusion module acquires data such as flight state (position, speed and gesture), laser radar environment point cloud, image definition index and the like in real time in the flight process of the unmanned aerial vehicle along the initial track, and inputs the data to the space-time attention encoder. The encoder fuses time dependence and space distribution information and outputs state tensors containing position information deviation, obstacle threat degree, image definition predicted value, unit energy consumption and the like.
Then, the tensor weighting module introduces an information entropy weighting mechanism to each dimension of the state tensor, dynamically adjusts the state variable weight according to the current environment complexity and task priority, and outputs a more representative weighted state vector. The vector is used as a Meta-SAC model input into a pose generation module, and an Actor network outputs a target flight reference pose (position and pose) and a corresponding LQR feedback gain adjustment quantity.
Then, the pose control module inputs the reference pose parameters and the dynamic gain coefficients to the LQR controller. The controller calculates a linear quadratic optimal solution based on the current state error, generates low-layer flight control instructions such as acceleration, attitude angular speed and the like, and drives the multi-rotor unmanned aerial vehicle to complete high-precision attitude tracking and track maintenance in real time.
And finally, constructing a fuzzy logic rewarding function by the feedback updating module according to the image quality score, the energy consumption index and the flight stability data acquired in the flight process, and taking the calculation result as the instant feedback of the reinforcement learning strategy model. After the task is finished, the system further optimizes the Meta-SAC strategy weight parameters through offline federal countermeasure training so as to improve the generalization performance and the adaptability of the model in a new environment.
The above formulas are all formulas with dimensions removed and numerical values calculated, the formulas are formulas with a large amount of data collected for software simulation to obtain the latest real situation, and preset parameters in the formulas are set by those skilled in the art according to the actual situation.
The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product.
Those of ordinary skill in the art will appreciate that the various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Finally, the foregoing description of the preferred embodiment of the invention is provided for the purpose of illustration only, and is not intended to limit the invention to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

Claims (8)

1.基于强化学习的无人机飞行姿态调整方法,其特征在于,包括:1. A method for adjusting the flight attitude of an unmanned aerial vehicle based on reinforcement learning, characterized by comprising: 根据数字地图与实时语义分割结果构建动态环境栅格,通过引入时空约束的快速搜索随机树算法生成初始航迹;A dynamic environment grid is constructed based on the digital map and real-time semantic segmentation results, and the initial track is generated by a fast search random tree algorithm with time and space constraints. 获取无人机在初始航迹上的飞行状态信息、机载传感器的环境感知信息和图像清晰度信息,并将上述信息输入至时空注意力编码器,生成融合环境语义的状态张量;Obtain the UAV's flight status information on the initial trajectory, the environmental perception information of the onboard sensors, and the image clarity information, and input the above information into the spatiotemporal attention encoder to generate a state tensor that integrates the environmental semantics; 对状态张量引入信息熵加权机制进行状态重要性分配,得到加权状态向量;The information entropy weighting mechanism is introduced into the state tensor to distribute the state importance and obtain the weighted state vector; 将加权状态向量输入具备元学习能力的Meta-SAC模型,通过Actor网络生成输出目标飞行参考位姿参数及用于动态调整LQR控制器的增益系数;The weighted state vector is input into the Meta-SAC model with meta-learning capability, and the Actor network is used to generate the output target flight reference pose parameters and the gain coefficients for dynamically adjusting the LQR controller. 基于所述目标飞行参考位姿参数和LQR增益系数,驱动LQR控制器生成低层飞行控制指令以控制无人机执行对应姿态动作;Based on the target flight reference pose parameters and the LQR gain coefficient, the LQR controller is driven to generate low-level flight control instructions to control the UAV to perform corresponding posture actions; 在执行飞行控制指令过程中,基于能耗效率和巡检图像清晰度,构建模糊逻辑奖励函数,将奖励函数值作为Meta-SAC模型的反馈进行策略更新。During the execution of flight control instructions, a fuzzy logic reward function is constructed based on energy efficiency and inspection image clarity, and the reward function value is used as feedback for the Meta-SAC model to update the strategy. 2.根据权利要求1所述的基于强化学习的无人机飞行姿态调整方法,其特征在于,构建动态环境栅格的步骤为:2. The method for adjusting the flight attitude of an unmanned aerial vehicle based on reinforcement learning according to claim 1, wherein the step of constructing a dynamic environment grid is: 基于目标巡检区域的数字地图与机载传感器采集的图像,利用语义分割模型对环境中的障碍物进行识别与分类;Based on the digital map of the target inspection area and the images collected by the onboard sensors, a semantic segmentation model is used to identify and classify obstacles in the environment. 对静态地理信息与动态障碍物识别信息进行融合,构建包含时间维度的三维动态环境栅格图,所述栅格记录环境中各区域的可通行性、障碍物类型及其动态行为属性。Static geographic information and dynamic obstacle identification information are integrated to construct a three-dimensional dynamic environment grid map including the time dimension. The grid records the passability, obstacle type and dynamic behavior attributes of each area in the environment. 3.根据权利要求1所述的基于强化学习的无人机飞行姿态调整方法,其特征在于,生成初始航迹的步骤为:3. The method for adjusting the flight attitude of an unmanned aerial vehicle based on reinforcement learning according to claim 1, wherein the step of generating the initial track is: 在所述动态环境栅格中,以无人机当前位置为根节点,目标点为终点,采用快速搜索随机树算法进行路径扩展;In the dynamic environment grid, the current position of the UAV is used as the root node and the target point is used as the end point, and a fast search random tree algorithm is used to perform path extension; 在路径扩展中引入李雅普诺夫稳定性约束,并通过B样条曲线进行航迹光顺处理,输出包含时间与姿态信息的初始飞行航迹序列。Lyapunov stability constraints are introduced in path extension, and trajectory smoothing is performed using B-spline curves to output an initial flight trajectory sequence containing time and attitude information. 4.根据权利要求1所述的基于强化学习的无人机飞行姿态调整方法,其特征在于,生成融合环境语义的状态张量的步骤为:4. The method for adjusting the flight attitude of a UAV based on reinforcement learning according to claim 1, wherein the step of generating a state tensor integrating environmental semantics comprises: 对无人机飞行状态信息、环境感知信息和图像清晰度信息进行归一化预处理;Normalize and preprocess the UAV flight status information, environmental perception information, and image clarity information; 将预处理后的信息输入至时空注意力编码器中,利用多头注意力机制融合多模态数据;The preprocessed information is input into the spatiotemporal attention encoder, and the multi-head attention mechanism is used to fuse multimodal data; 结合语义分割结果,提取空间上下文与时间序列关联特征,构建包含位置信息、障碍物威胁度、图像清晰度预测值与能量消耗估计的状态张量。Combining the semantic segmentation results, the spatial context and time series correlation features are extracted to construct a state tensor containing location information, obstacle threat level, image clarity prediction value and energy consumption estimation. 5.根据权利要求1所述的基于强化学习的无人机飞行姿态调整方法,其特征在于,状态重要性分配的步骤为:5. The method for adjusting the flight attitude of a UAV based on reinforcement learning according to claim 1, wherein the step of allocating state importance is as follows: 基于信息熵理论对状态张量中各个子状态分量的变化程度进行度量;The degree of change of each sub-state component in the state tensor is measured based on the information entropy theory; 将信息熵值转化为权重因子,用于表示各状态维度在策略决策中的重要程度;Convert the information entropy value into a weight factor to indicate the importance of each state dimension in policy decision-making; 根据权重因子对状态张量进行加权融合,生成具有状态重要性感知能力的加权状态向量。The state tensors are weighted and fused according to the weight factors to generate a weighted state vector with state importance perception capability. 6.根据权利要求1所述的基于强化学习的无人机飞行姿态调整方法,其特征在于,LQR控制器生成低层飞行控制指令步骤为:6. The method for adjusting the flight attitude of a UAV based on reinforcement learning according to claim 1, wherein the LQR controller generates low-level flight control instructions by: 接收Meta-SAC模型输出的目标飞行参考位姿参数作为期望输入;Receive the target flight reference pose parameters output by the Meta-SAC model as the expected input; 结合无人机当前飞行状态构建系统状态误差向量,并根据动态调整的反馈增益系数计算控制指令;The system state error vector is constructed based on the current flight state of the UAV, and the control instructions are calculated based on the dynamically adjusted feedback gain coefficient; 通过求解线性二次优化问题最小化状态误差与控制能耗代价函数,输出低层控制量以驱动飞行器执行相应姿态调整。By solving the linear quadratic optimization problem to minimize the state error and control energy consumption cost function, the low-level control quantity is output to drive the aircraft to perform corresponding attitude adjustments. 7.根据权利要求1所述的基于强化学习的无人机飞行姿态调整方法,其特征在于,构建模糊逻辑奖励函数的步骤为:7. The method for adjusting the flight attitude of a UAV based on reinforcement learning according to claim 1, wherein the step of constructing the fuzzy logic reward function is: 采集无人机在飞行过程中的图像清晰度指标、能耗消耗率及障碍物相对距离,并对上述连续变量进行归一化处理;Collect the image clarity index, energy consumption rate and relative distance of obstacles of the UAV during flight, and normalize the above continuous variables; 将归一化后的输入变量转换为模糊变量,分别映射至预设的模糊隶属函数中,得到对应的模糊语言值;The normalized input variables are converted into fuzzy variables, which are mapped to the preset fuzzy membership functions to obtain the corresponding fuzzy language values; 基于预设的模糊规则库进行模糊推理,推理规则包含图像清晰度、能耗与障碍物距离的组合判断;Fuzzy reasoning is performed based on a preset fuzzy rule library. The reasoning rules include a combination of image clarity, energy consumption, and obstacle distance. 将推理结果通过解模糊方法转换为具体的数值奖励,作为Meta-SAC模型的即时反馈信号以更新策略网络。The inference results are converted into specific numerical rewards through the defuzzification method, which serves as the immediate feedback signal of the Meta-SAC model to update the policy network. 8.基于强化学习的无人机飞行姿态调整系统,其特征在于,包括:8. A UAV flight attitude adjustment system based on reinforcement learning, characterized by including: 航线规划模块,用于根据数字地图与实时语义分割结果构建动态环境栅格,通过引入时空约束的快速搜索随机树算法生成初始航迹;The route planning module is used to construct a dynamic environment grid based on the digital map and real-time semantic segmentation results, and generate the initial track by introducing a fast search random tree algorithm with spatiotemporal constraints; 信息融合模块,用于获取无人机在初始航迹上的飞行状态信息、机载传感器的环境感知信息和图像清晰度信息,并将上述信息输入至时空注意力编码器,生成融合环境语义的状态张量;The information fusion module is used to obtain the flight status information of the UAV on the initial track, the environmental perception information of the onboard sensors, and the image clarity information, and input the above information into the spatiotemporal attention encoder to generate a state tensor that integrates the environmental semantics; 张量加权模块,用于对所述状态张量引入信息熵加权机制进行状态重要性分配,得到加权状态向量;A tensor weighting module is used to introduce an information entropy weighting mechanism into the state tensor to distribute state importance and obtain a weighted state vector; 位姿生成模块,用于将加权状态向量输入具备元学习能力的Meta-SAC模型,通过Actor网络生成输出目标飞行参考位姿参数及用于动态调整LQR控制器的增益系数;The pose generation module is used to input the weighted state vector into the Meta-SAC model with meta-learning capabilities, and generate the output target flight reference pose parameters and the gain coefficient for dynamically adjusting the LQR controller through the Actor network; 位姿控制模块,用于基于所述目标飞行参考位姿参数和LQR增益系数,驱动LQR控制器生成低层飞行控制指令以控制无人机执行对应姿态动作;A posture control module is used to drive the LQR controller to generate low-level flight control instructions based on the target flight reference posture parameters and the LQR gain coefficient to control the UAV to perform corresponding posture actions; 反馈更新模块,用于在执行飞行控制指令过程中,基于能耗效率和巡检图像清晰度,构建奖励函数,将奖励函数值作为Meta-SAC模型的反馈进行策略更新。The feedback update module is used to construct a reward function based on energy efficiency and inspection image clarity during the execution of flight control instructions, and use the reward function value as feedback to the Meta-SAC model to update the strategy.
CN202510835183.7A 2025-06-20 2025-06-20 UAV flight attitude adjustment method and system based on reinforcement learning Pending CN120595851A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510835183.7A CN120595851A (en) 2025-06-20 2025-06-20 UAV flight attitude adjustment method and system based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510835183.7A CN120595851A (en) 2025-06-20 2025-06-20 UAV flight attitude adjustment method and system based on reinforcement learning

Publications (1)

Publication Number Publication Date
CN120595851A true CN120595851A (en) 2025-09-05

Family

ID=96883463

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510835183.7A Pending CN120595851A (en) 2025-06-20 2025-06-20 UAV flight attitude adjustment method and system based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN120595851A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120848590A (en) * 2025-09-23 2025-10-28 浙江孚临科技有限公司 A cross-modal attention feature fusion method for aerial robots
CN120874957A (en) * 2025-09-26 2025-10-31 浙江孚临科技有限公司 Layered reinforcement learning control method for aerial robot
CN121209566A (en) * 2025-11-25 2025-12-26 天津天境飞航科技有限公司 Unmanned plane path planning method and system for self-adaptive dynamic planning
CN120874957B (en) * 2025-09-26 2026-02-06 浙江孚临科技有限公司 Layered reinforcement learning control method for aerial robot

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120848590A (en) * 2025-09-23 2025-10-28 浙江孚临科技有限公司 A cross-modal attention feature fusion method for aerial robots
CN120848590B (en) * 2025-09-23 2025-12-02 浙江孚临科技有限公司 A method for cross-modal attention feature fusion in aerial robots
CN120874957A (en) * 2025-09-26 2025-10-31 浙江孚临科技有限公司 Layered reinforcement learning control method for aerial robot
CN120874957B (en) * 2025-09-26 2026-02-06 浙江孚临科技有限公司 Layered reinforcement learning control method for aerial robot
CN121209566A (en) * 2025-11-25 2025-12-26 天津天境飞航科技有限公司 Unmanned plane path planning method and system for self-adaptive dynamic planning
CN121209566B (en) * 2025-11-25 2026-01-23 天津天境飞航科技有限公司 Adaptive dynamic programming method and system for UAV path planning

Similar Documents

Publication Publication Date Title
CN118504925B (en) Unmanned aerial vehicle low-altitude monitoring method
CN118707988B (en) A UAV-assisted inspection method and system for power transmission lines
CN119937623A (en) Automatic selection of obstacle avoidance points and obstacle avoidance method for photovoltaic stations inspected by drones
CN110866887A (en) Target situation fusion sensing method and system based on multiple sensors
CN120595851A (en) UAV flight attitude adjustment method and system based on reinforcement learning
CN118565476A (en) UAV navigation and path planning method based on deep learning
Xiao et al. Vision-based learning for drones: A survey
CN119085695B (en) Obstacle map marking method and system combined with unmanned vehicle
CN120235065B (en) Unmanned aerial vehicle parking apron dynamic landing point adjustment method and system based on environment awareness
CN118107822A (en) A complex environment search and rescue method based on drone
CN119645083A (en) A UAV inspection control system and method based on data analysis
CN114326821A (en) Unmanned aerial vehicle autonomous obstacle avoidance system and method based on deep reinforcement learning
CN117589167A (en) Unmanned aerial vehicle routing inspection route planning method based on three-dimensional point cloud model
CN117369479B (en) Unmanned aerial vehicle obstacle early warning method and system based on oblique photogrammetry technology
CN119356391A (en) A three-dimensional AI perception direction method and system based on drone
CN120803005B (en) Deep Learning-Based Autonomous Obstacle Avoidance and Path Planning Methods and Systems for Unmanned Aerial Vehicles
CN120356365B (en) A low-altitude UAV traffic operation risk warning method and system
CN119904766B (en) Environment complexity-based end-to-end unmanned aerial vehicle autonomous control method
CN120610568A (en) Obstacle avoidance method and system for electric UAV based on multimodal perception and reinforcement learning
Yubo et al. Survey of UAV autonomous landing based on vision processing
CN118795460A (en) Unmanned aerial vehicle track tracking method based on perception radar
Parlange et al. Leveraging single-shot detection and random sample consensus for wind turbine blade inspection
Bi Multimodal sensor collaborative information sensing technology
CN115576327B (en) Autonomous learning method based on edge computing and reasoning of autonomous driving smart car
CN120740587B (en) Unmanned aerial vehicle target tracking and intelligent route planning method and system based on YOLO and DSM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination