1. Introduction
In recent years, with the advancements in multi-agent technology, efficient computing technology, network convergence technology, and new communication technologies, various types of equipment have shed the “fight alone” approach. Instead, they operate in clusters, networking and collaborating to execute various tasks, thereby effectively enhancing the system’s ability to handle complexity, dynamics, and adverse conditions. However, in this process, effectively managing system resources, ensuring fair efficiency, scheduling tasks, and combating external interference have become new challenges in meeting application requirements. These challenges are currently hotspots in both academia and industry.
As an important application product of multi-agent systems, clustered drones have a broad application space [
1]. They form an unmanned network system that is low cost with high dynamics and strong stability due to resource integration, self-organized collaboration, and intelligent scheduling. This provides strong support for the future construction of air, sky, earth, and sea globalized networks, the realization of communication–perception–computing integration, and the development and construction of unmanned platforms. Reference [
2] introduces the use of multiple drones to form an integrated IoT between sky and Earth, redeploying multiple drones to a different location, changing the resource allocation, and offloading server computations by alternating heuristic greedy and successive convex approximation methods to minimize the total computational overhead of ground-based IoT devices. Reference [
3] proposes a generalized intelligent collaborative task scheduling framework that switches between heuristic and deep reinforcement learning-based scheduling solutions to address the ever-changing task requests when dynamically served by multiple drones. Reference [
4] applies multiple drones as airborne relays to serve clustered users on the ground and proposes a channel statistical model based on the probability of occurrence. It also establishes analytic expressions for the average rate and the average outage probability of this channel model, comparing its performance with those of the Rayleigh channel and Rice channel models. Reference [
5] proposes a traffic load balancing scheme for a multi-drone-assisted Fog network to minimize wireless delays for IoT users. Reference [
6] proposes a framework that utilizes a dynamic drone network for the heuristic optimization of coverage areas. This framework employs Bézier curves to plan the flight paths of drones, enabling intelligent adjustments to their positions and trajectories, which significantly enhances the network access quality for ground users. Reference [
7] conducted an in-depth investigation into routing protocols based on clustered drone networks, comparing the characteristics and performances of various protocols to provide researchers with a basis for selecting appropriate ones.
Despite the promising results demonstrated by the solutions proposed in the studies referenced, there is a notable lack of consideration for real-world environments and complex scenarios, which introduces certain limitations. Specifically, in the presence of malicious dynamic interference and a complex electromagnetic environment, the communication processes of clustered drones are inevitably impacted. Under these conditions, the signal-to-noise ratios of the received signals are affected by more than just the drones’ trajectories and transmission powers. Therefore, it is essential to conduct a thorough analysis of the physical impact of the interference on the entire communication process and to develop tailored algorithms designed to mitigate such external interference. Reference [
8] proposes a convergent rate-partitioning-based interference management resource allocation and clustering algorithm for iteratively optimizing drone transmission power, drone pairing, wireless resource scheduling, and wireless resource pricing to achieve interference management in cellular drone networks. Reference [
9] proposes a drone-path-tracking scheme that uses a radial basis neural network to approximate the adaptive approximation law of the gyroscopic effect function to balance the effects of system uncertainties and nonlinearities, improving the convergence speed and error accuracy of the controller. Reference [
10] designed a joint optimization model that provides optimal truck and drone routing policies to address mission abortion when truck and clustered drone routes are subjected to random attacks and to minimize the total cost. Reference [
11] designed a dynamic-based communication antijamming decision-making method to solve the problem of intelligent antijamming decision making for battlefield communication, enabling the intelligent system to effectively avoid jamming while ensuring uninterrupted communication as much as possible.
Current research on antijamming primarily concentrates on methods to eliminate and mitigate the impacts of interference on communication processes. However, this focus often overlooks the crucial aspect of enhancing the system performance of clustered drones post-elimination of interference, which is essential for bolstering their mission execution capabilities. Consequently, the joint optimization of the “residual performance” of clustered drones following antijamming efforts holds significant research value, promising substantial improvements in their operational effectiveness. Reference [
12] jointly optimizes the trajectory and scheduling plan of each drone, combining convex optimization and ant colony optimization algorithms to obtain the proposed optimal solution, maximizing the total amount of relay data. Reference [
13] proposes a drone relay vehicle networking architecture based on relay protocols with nonorthogonal multiple access and maximum ratio combination techniques, as well as improved particle swarm optimization algorithms to achieve increased data rates for cell-edge vehicles in rural road scenarios. Reference [
14] investigates the problem of clustered drone deployment for better coverage and proposes a genetic algorithm that encodes the solution of the problem as a chromosome and simulates the process of biological evolution to find favorable solutions, balancing between the energy consumption of all drones and the maximization of the lifetime of the full coverage network. Reference [
15] proposes a 3D layout analysis framework for coordinating a fleet of drone relays and uses it to construct a complex mixed-integer nonconvex planning problem for network performance optimization in terms of user throughput fairness through a parallel alpha-fair drone deployment method. Reference [
16] considers a scenario in which flying drones provide wireless services to multiple ground nodes simultaneously and proposes an effective joint transmission power and trajectory optimization algorithm that can maximize the minimum average throughput for a given length of time.
Based on the considerations outlined above, this paper delves deeper into the subject, and the contributions are as follows:
- -
Aiming to maximize the communication rate among clustered drones, this study constructs a multivariable coupled 0–1 mixed nonlinear integer programming model. This model incorporates various constraints, including the physical limitations of drones, channel selection, and communication performance, among others;
- -
The Discrete Soft Actor-Critic (DSAC) algorithm is deployed on relay drones, training them for channel selection to adeptly navigate and mitigate dynamic interference;
- -
The Bayesian optimization algorithm is utilized to refine the hyperparameters of the DSAC algorithm, such as the learning rate, discount factor, and target entropy, enhancing the algorithm’s capability to resist interference;
- -
The joint optimization problem is systematically decomposed and transformed into manageable convex subproblems, including the modulation order of relay drones, their transmission power, trajectories, and the power allocation factors for clustered drones, all aimed at iterative solutions to maximize the total communication rate. The experimental results demonstrate that the algorithm proposed in this paper effectively addresses the efficient communication challenges faced by clustered drones in environments with interference. Furthermore, the algorithm exhibits robust stability, strong convergence, and impressive task performance.
The structure of this article is as follows:
Section 1 introduces related work and provides an overview of the current research status.
Section 2 describes the system model, detailing the tasks, channels, and interference.
Section 3 presents the problem description, integrating the model to express specific research questions.
Section 4 proposes a DSAC algorithm using Bayesian optimization.
Section 5 presents a joint optimization algorithm for communication rates.
Section 6 covers experimental simulations.
Section 7 concludes the paper.
Table 1 presents a description of the key parameters used throughout the paper.
2. System Model
2.1. Task Model
When performing tasks, there is generally a considerable spatial distance between the swarm of drones and the ground station, requiring the involvement of relay drones for real-time information transmission. Suppose that relay drones assisting in communications suffer from malicious interference by an enemy, causing communication disruptions. Relay drones need to sense enemy interference, intelligently select channels, and optimize their transmission power, trajectory, and modulation methods to maximize communication rates, thereby forming real-time, efficient, and stable relay communications.
Figure 1 shows a schematic diagram of relay communication by a swarm of drones under interference conditions.
Assuming a relay drone assists in communication for a duration of T, it is divided into time intervals, and , where drones form a cluster, with the coordinates . The coordinates of the ground station are , the jammer’s coordinates are , and the relay drone’s coordinates are . Changes in positions during each time slot are negligible. Additionally, the relay drone updates the position coordinates of the cluster drones every time interval, during which changes in the cluster drones’ coordinates can also be neglected. The relay drone uses NOMA multiplexing to communicate, enhancing the real-time transmission efficiency of the cluster drones, and employs amplify-and-forward in a half-duplex communication mode for data transmission.
2.2. Channel Model
According to Reference [
17], the operational trajectory of the relay drone in the air exhibits strong Line-of-Sight (LOS) characteristics with both the cluster of drones and the ground station. Therefore, the channel model between the relay drone and both the cluster of drones and the ground station can be approximated by the free-space propagation model of electromagnetic waves, which is described as follows:
In this model,
represents the product of the antenna gains of the transmitter and receiver,
denotes the wavelength of the electromagnetic waves at different frequencies, and
signifies the distance between the relay drone and either the cluster of drones or the ground station. For example, the expression for calculating the free-space path loss between the relay drone and the ground station node can be illustrated as follows:
Jamming devices are typically hidden in relatively concealed areas to prevent detection and subsequent damage or destruction. Therefore, according to the 3GPP [
18], the channel model between the jammer and the relay drone is usually a probabilistic loss model that mixes LOS and Non-Line-of-Sight (NLOS) components. The calculation expression for this model is as follows:
In this model,
represents the spatial propagation loss of the electromagnetic waves. The calculation formula for this loss is as follows:
In this model, represents the distance between the relay drone and the jammer, denotes the communication frequency of the relay drone, refers to shadow fading, which is typically modeled as a log-normal distribution, and indicates the Rician distribution. The Rician distribution is used to model small-scale fading, as jammers typically have high transmission power, and there is always a line-of-sight component present in the signal, making this distribution more suitable for real-world conditions.
2.3. Interference Model
During the transmission of information by the relay drone, the jammer exhibits energy detection and power suppression capabilities, thereby inducing communication interference in the relay drone. The interference detection model based on energy is formulated as follows:
In this context,
denotes the currently captured signal,
represents the communication signal, and
signifies the interference signal. After the subband signals pass through the band-pass filter, their signal power,
, is computed. Subsequently, the energy levels for each frequency band are calculated. The corresponding calculation expression is given as follows:
In this context,
denotes the number of subbands. By comparing
with the respective thresholds,
, within the model, the channel state is determined. The relay drone also exhibits spectrum-sensing capabilities, enabling it to intelligently select the operating frequency,
, from the available frequency set
through frequency hopping, thereby avoiding the impact of interference on relay transmission. The subfrequency ranges within each frequency point do not overlap, and the available frequency set contains
frequency points, with subchannel bandwidths of
. The limits for the working frequency range of the relay drone, shared with the jammers, are defined by
and
. Assuming the maximum transmission power of cluster drone
is
, the transmission power of the relay drone is
, and the maximum transmission power of the jammer is
, with the product of the antenna gains between them being
. In a relay scenario in which the cluster drones need to transmit detection information back to the ground station, the Signal-to-Interference-plus-Noise Ratio (SINR) expression for a relay drone working at frequency
receiving the signal transmitted by cluster drone
is as follows:
In this model, represents the noise power spectral density, denotes the channel gain between the cluster drone, , and the relay drone. The magnitude of the directly reflects the interference condition of the relay drone; the smaller its value, the greater the degree of interference, and conversely, the larger its value, the lesser the degree of interference. Assuming the normal communication threshold is ,
indicates the normal transmission of relay information, while signifies a failure in the information relay.
The interference model proposed in this paper encompasses constant frequency interference, sweep frequency interference, and hybrid frequency interference. The corresponding expressions are as follows:
In this context,
represent the interference intensity of constant frequency interference, sweep frequency interference, and hybrid frequency interference, respectively, which are functions of time and frequency.
denote the amplitude of constant frequency interference and sweep frequency interference signals,
represents the set of constant frequencies,
signifies the set of frequencies at time
for sweep frequency interference, and
denotes the sampling function. The constant frequency interference mode is characterized as a bandwidth-limited random process with a central frequency of
, while the sweep frequency interference mode is a random process with frequencies varying over time. The hybrid frequency interference mode combines both constant frequency interference and sweep frequency interference, making it a more complex random process. As a result, external malicious interference exhibits high randomness and instability. Combined with the extremely limited prior knowledge of the interference signals, achieving efficient and accurate prediction and estimation is challenging. This renders the electromagnetic environment faced by the relay drone exceptionally complex and severe, posing significant challenges to communication performance. The dynamic interference process is illustrated in
Figure 2.
Under the context of the above interference modes, the expression for the SINR of the relay drone working on the
channel, receiving signals from cluster drone
, can be reformulated as follows:
In this context, represents the number of channels affected by interference, and denotes the channel interference indicator function, which takes a value of 0 or 1. When , this signifies that the channel is experiencing interference, . Conversely, when , this indicates that the frequency channel is not subject to interference, .
2.4. Power Transmission Model and Energy Model
Transmission power is closely related to the communication rate, ensuring reliable data transmission across varying distances and environments, while also being significant for optimizing energy consumption. Hence, a power transmission model is introduced to achieve precise measurement. The corresponding calculation expression is given by the following:
In this context,
denotes the received signal power,
represents the transmitted signal power,
indicates the path loss,
signifies the receiver antenna gain, and
signifies the transmitter antenna gain. In a drone cluster relay communication system, the system energy consumption primarily comprises the flight energy consumption and communication energy consumption of the drones. The calculation expression for flight energy consumption is given by the following:
In this context,
represents the transmission power of the drone, while the calculation expression for communication energy consumption is given by the following:
In this context,
represents the drone’s flight speed at time
k. In summary, the total system energy consumption,
, can be expressed as follows:
3. Problem Description
This section focuses on a scenario in which swarm drones transmit collected information back to the ground station in real time, exploring the modeling issues of real-time communication between the swarm drones and the ground station with the assistance of a relay drone. Assume that the working frequency of the relay drone is
, and this channel comprises
subcarriers for the paired transmission of information from the swarm drones. During this process, the relay drone often faces a certain degree of malicious interference. Considering the high-speed movement of drones and the complex electromagnetic environment, their channel state information (CSI) may become imperfect or outdated. In this context, the traditional assumption of perfect CSI is unrealistic. By referring to current mainstream methods for CSI estimation and prediction, such as deep learning algorithms [
19], adaptive algorithms [
20], and coding techniques [
21], it is possible to dynamically predict and compensate for the quality of the CSI, thereby addressing the issues related to outdated and imperfect CSI. According to the principles of Non-Orthogonal Multiple Access (NOMA), the sender transmits data by allocating different levels of transmission power to different users, and the receiver decodes the signals sequentially by detecting the gain differences of the received signals, thus canceling the links with smaller channel gains [
22]. In the designated task scenario, the communication link gains from swarm drone
and swarm drone
to the relay drone are
and
, respectively. When
, the relay drone decodes the link from swarm drone
first; otherwise, it decodes the link from swarm drone
first. The power allocation factor set for the swarm drones is
, and since the swarm drones select only one channel for data transmission in different time slots, in time slot
, the SINR for the relay drone working on the frequency
channel and receiving the transmission signal from swarm drone
is as follows:
Similarly, in time slot
, the Signal-to-Noise Ratio (SNR) for the ground station receiving the transmission signal from the relay drone is as follows:
The relay drone employs an amplify-and-forward method to transmit information. As stated in reference [
23], in time slot
k, the SNR for the ground station receiving the two-hop link from cluster drone
is as follows:
From this, it is evident that in time slot
, the communication rate between the cluster drone
and the ground station is as follows:
Because of the dynamic characteristics of the relay drone, the channel conditions for its assisted communication change in real time. Therefore, dynamically adjusting the channel modulation method is an important strategy to flexibly respond to real-time channel variations. Hence, this paper introduces Quadrature Amplitude Modulation (QAM) technology into relay drone-assisted communication to maximize high-performance auxiliary communication. Consequently, in time slot
, the communication rate between cluster drone
and the ground station becomes the following:
In this case,
represents the modulation order at time
, and the set of selectable modulation orders is given by
. Therefore, the expression for the total communication rate between the cluster drone and the ground station over the entire mission period is as follows:
From the above formula, it is evident that optimizing the total communication rate achievable by the ground station is dependent on the power allocation factors of the cluster drones, the modulation method, transmission power, and three-dimensional trajectory of the relay drone. However, it is also necessary to consider the physical constraints of the communication equipment and the performance constraints of the communication process.
The transmission power of the relay drone must not exceed its maximum transmission power,
, and the energy consumption for auxiliary communication must not exceed the maximum energy storage capacity of the relay drone,
, expressed as follows:
Similarly, similar constraints exist for all drones in the cluster, namely, the following:
Additionally, in each time slot, the horizontal movement trajectory of the relay drone cannot exceed the trajectory length that would be covered at its maximum horizontal speed,
, expressed as follows:
Similarly, the vertical movement trajectory of the relay drone cannot exceed the trajectory length that would be covered at its maximum vertical speed,
, and the altitude of the relay drone must be within an appropriate range, expressed as follows:
In this case,
represent the flight altitude limit for the relay drone, with the starting point coordinates given by
and the ending point coordinates by
. The movement constraints also need to satisfy the following:
Since the auxiliary communication operates on a two-hop model, there is a single time slot delay in relay transmission [
24]. Therefore, the transmission power of the cluster drones at the final time slot and the relay drone at the initial time slot is set to zero, expressed as follows:
Here,
represents the last task slot. To ensure that the data volume returned by the cluster drones is sufficient for analysis and decision making at the ground station, it is required that the total communication rate of the cluster drones exceed a specified threshold
, expressed as follows:
Because of the relay drone’s ability to switch between different modulation modes based on varying channel conditions and according to different modulation orders, the formula for calculating the bit error rate (BER) also varies [
25], expressed as follows:
Therefore, for each communication link formed between the cluster drones and the ground station, the BER must meet the threshold requirement
, as follows:
Thus, the optimization model for the communication rate of cluster drones under interference conditions is constructed as follows:
The optimization model for the communication rate of cluster drones under interference conditions integrates several constraints, as follows: channel selection constraints (37) and (38), physical constraints of relay-assisted communication (20)–(30), and communication service performance constraints (31)–(33), (35), (39), and (40). The problem is formulated as a highly coupled nonlinear mixed-integer optimization model involving both discrete and continuous variables, which presents significant computational challenges due to its complexity. To effectively tackle this, the approach is divided into the following two strategic steps: firstly, employing an intelligent channel selection scheme to choose interference-free channels, and secondly, conducting the joint optimization of the relay’s modulation method, transmission power, three-dimensional trajectory, and the cluster drones’ power allocation factors. This method aims to maximize the data transmission rate of the relay drone under interference conditions while simplifying the optimization process and enhancing solution efficiency.
4. Intelligent Channel Selection Scheme
Because of the dynamic nature of jammer interference patterns, conventional anti-interference algorithms often struggle to cope effectively. Deep reinforcement learning, as a potent decision-making tool within artificial intelligence, is capable of adapting to dynamic environments and achieving optimal outcomes. However, deep reinforcement learning algorithms strongly depend on hyperparameters, which can be a limiting factor in their application. Therefore, this section proposes an intelligent channel selection scheme based on Bayesian optimization within a Discrete Soft Actor-Critic (BO_DSAC) framework. This approach is designed to facilitate channel selection for relay drones, thereby mitigating the impact of dynamic interference on the communication of cluster drones.
4.1. Discrete Soft Actor-Critic Algorithm
The DSAC algorithm requires discussion primarily around the agent and the interfering environment. The agent mainly consists of the following three components: an action selection network, a value evaluation network, and an entropy structure. The Soft Actor-Critic algorithm adds an entropy component to the previous actor-critic network architecture to enhance the algorithm’s ability to explore global solutions and prevent convergence to local optima. Entropy is a measure of uncertainty, with higher randomness corresponding to greater entropy values. In the DSAC algorithm, the presence of entropy shifts the focus of the evaluation network from determining the cumulative rewards of actions for each state to exploring actions that could yield the maximum cumulative rewards in a given state. This adjustment addresses the shortcomings of traditional AC architectures in deep reinforcement learning. The working framework of this architecture is illustrated in
Figure 3. For ease of expression, this paper denotes the soft predictive value network with
as
.
From the diagram, it can be observed that the
Discrete Soft Actor-Critic algorithm architecture consists of one policy
network and four
evaluation networks
. The evaluation networks are divided into two sets, as follows: predictive value networks and their corresponding target value networks. The configuration within each set aims to prevent overestimation and ensure the stability of the learning process, while the arrangement among the sets is designed to enhance learning efficiency and further improve stability. In terms of the evaluation network, unlike algorithms for continuous action spaces, in discrete action spaces, the state is solely input into the evaluation network, which then outputs values for all possible actions corresponding to that state, denoted as
. The expression for the loss function is
.
In this setup,
represents the expected reward value when action
is executed in state
.
denotes the entropy coefficient, which signifies the weight of the entropy,
represents the entropy itself, and
indicates the experience replay buffer used to store interaction data between the agent and the environment. Regarding the policy network, its input remains the state, but its output is the probability distribution of discrete actions, unlike the output of mean and variance in continuous action spaces, denoted as
. The expression for the loss function of the policy network is
.
Regarding entropy, the entropy coefficient,
α, determines the importance of the entropy term. To enable the adaptive variation in the entropy coefficient at different stages of learning, its loss function is designed as
, as follows:
In this context, represents the target entropy. The application of gradient descent to these loss functions facilitates the updating of network parameters. Under the framework of the Discrete Soft Actor-Critic algorithm, and in consideration of the actual conditions of the interference environment, the required basic elements are defined as follows:
State Space: The state space
should include the physical quantities that the relay drone needs to compute, primarily characterized by the energy state of the channel. Its definition is as follows:
In this context, represents the energy state of channel . When the channel is idle, only environmental noise is present. During normal communication, the energy is characterized by . Based on prior information, when the cluster drones communicate at full power, the maximum reception power of the relay drone is . When there is interference, the channel energy is represented by . The current energy state of the channel can be used to determine the communication status of the channel.
Action Space: The action space
represents the possible choices the relay drone can make regarding channel selection during time slot
. Therefore, the action space is designed as follows:
Therefore, the size of the action space is , where indicates that the channel is not selected, and indicates that the channel is selected. During each interaction, the agent selects only one channel for communication at a time.
Reward Function: The real-time reward function
measures the approval of action
and is designed with the following expression:
The relay drone receives the maximum reward if it successfully avoids interference while maintaining the same working channel. A slightly lower reward is given if the drone successfully avoids interference by switching channels. If the drone fails to avoid interference and does not change channels, it receives zero reward. However, if it changes channels and still fails to avoid interference, a penalty is applied. In this setup, is set to 0.4 to measure the reward for channel switching decisions.
4.2. Bayesian Optimization Algorithm
Bayesian Optimization is an efficient global optimization algorithm that can optimally extract hyperparameters with minimal evaluations using a statistical approach. The fundamental principle is based on Bayes’ theorem [
26], and its expression is as follows:
In this context,
represents the distribution model to be specified,
is the set of observed data,
denotes the likelihood distribution, and
is the prior probability. In this paper, the decision variables
include the discount factor
, learning rate
, and target entropy
. The cumulative reward function is the target value,
. The prior and posterior models use Gaussian distributions for simulation. The acquisition function, based on Expected Improvement function, is used to select the next evaluation point,
, from the posterior distribution, and its expression is as follows:
In this context, represents the current known maximum value simulated by the Gaussian process. This establishes a probabilistic optimization model concerning the discount factor, learning rate, and target entropy. By continuously selecting the next point through the acquisition function, the process iterates until the optimal solution is found.
Summarizing, the implementation process of the BO_DSAC intelligent channel selection scheme is detailed as follows in Algorithm 1.
Algorithm 1 BO_DSAC intelligent channel selection scheme |
Inputs: discount factor , learning rate , and target entropy , each with their respective optimization ranges. Additionally, inputs include the entropy coefficient , maximum number of training iterations num_episodes, Markov chain length , soft update parameter , capacity of the replay buffer psize, batch size for data processing , and the number of iterations for Bayesian Optimization . These parameters collectively set the framework for optimizing the intelligent channel selection strategy. |
Output: policy function and value function . |
1: Inputs discount factor , learning rate , and target entropy , each with their respective optimization ranges. Additionally, inputs include the entropy coefficient , maximum number of training iterations num_episodes, Markov chain length , soft update parameter , capacity of the replay buffer , batch size for data processing , and the number of iterations for Bayesian optimization, . These parameters collectively set the framework for optimizing the intelligent channel selection strategy. |
2: Initialize the replay buffer and the network parameters of the policy and evaluation networks . |
3: |
4: Initial state: . |
5: |
6: Choose action: . |
7: Perform the action, receive the reward, and form the data tuple:. |
8: Update State: . |
9: Store the data tuple in the replay buffer: . |
10: |
11: Delete the earliest data tuple from the replay buffer. |
12:
|
13: |
14: Randomly sample data tuples . |
15: Calculate the loss function of the soft predictive value network . |
16: Calculate the loss function of the soft policy network . |
17: Calculate the entropy loss function . |
18: Minimize the loss using the data tuples to train the network parameters. |
19: Maximize the using the data tuples to train the network parameters. |
20: Minimize the loss using the data tuples to adaptively adjust the entropy coefficient. |
21: Soft update: . |
22:
|
23:
|
24:
|
25: Output the final policy function and value function . |
26: Obtain the initial observation set based on prior probabilities, and construct the prior Gaussian distribution and the posterior Gaussian distribution. |
27:
|
28: Determine using the function. |
29: Calculate the target function value corresponding to . |
30: Update the Gaussian model. |
31:
|
32: Output the optimal discount factor , learning rate , and target entropy . |
33: Incorporate the optimal hyperparameters , and re-execute Steps 2–25. |
The computational overhead of the Bayesian optimized Discrete Flexible Actor-Critic algorithm comprises the following two parts: the Discrete Flexible Actor-Critic algorithm and the Bayesian optimization process. The Discrete Flexible Actor-Critic algorithm includes the complexity of network forward and backward propagation , where denotes the number of network layers; the complexity of experience replay ; and the complexity of policy and value updates , with being the number of network updates. Hence, its computational overhead is . The second part involves the Bayesian optimization process, which includes the complexity of Gaussian process training , prediction complexity , and acquisition function optimization complexity , with representing the number of evaluated candidate points. Therefore, the overall computational complexity of the proposed algorithm in this section is , primarily dominated by .
5. Joint Optimization Scheme for Communication Rate
After intelligent channel selection, the relay drone can avoid the impact of jammers and achieve information transmission on interference-free channels. To further efficiently decouple the multivariable mixed-integer nonlinear optimization problem, this section divides the joint optimization problem of communication rate into the following four subproblems: relay drone modulation method optimization, relay drone transmission power optimization, three-dimensional trajectory optimization, and cluster drone power allocation factor optimization. These subproblems are iteratively solved alternately until convergence, thereby maximizing the communication rate of the cluster drones.
5.1. Relay Drone Modulation Method Optimization
To study the optimization problem of the relay drone modulation method
, it is necessary to fix variables such as the power allocation factor,
, of the cluster drones, the transmission power,
, of the relay drone, and trajectory,
, during the
th iteration in time slot
. Thus, the original optimization problem
can be simplified as follows:
Wherein
, and
is given by
. Because of the inconvenient computation of discrete variable values involved in the above optimization problem, the auxiliary variables
are introduced, thus transforming the optimization problem
into the following:
The above issue is a 0–1 integer programming problem, which can be directly solved using the Mosek optimization toolkit to obtain the value.
5.2. Optimization of Relay Drone Transmission Power
To study the optimization problem of relay drone transmission power
, it is necessary to fix the power allocation factor
of the cluster drones in the
th round of iterations within time slot
, the 3D trajectory
, and the modulation method
of the relay drones that was just updated in the
th round. Consequently, the original optimization problem
can be simplified as follows:
From Equation (34), the computational relationship between
and the transmission power
is as follows:
Then, constraint (35) can be transformed into:
Then the above optimization problem
becomes the following:
The above optimization problem is convex and can be solved by directly applying the CVX toolkit to obtain a global suboptimal solution.
Proof: Let the expression of the objective function,
,
,
,
be:
Taking the second order derivative of
results in the following:
From the constraint restrictions and physical significance of each variable, , the second order derivative , and the objective function is concave. Also, the restriction constraints satisfy the convex set property and hence the optimization problem is convex, as evidenced.
5.3. Optimization of Three-Dimensional Trajectory for Relay Drones
To study the optimization problem of the three-dimensional trajectory,
, of the relay drones, it is necessary to fix the power allocation factor,
, of the cluster drones during the
th round of iterations within time slot
and the modulation method
of the relay drones that was just updated in the
th round, as well as the transmission power
. Consequently, the original optimization problem
can be simplified as follows:
In optimization problem
, the objective function expression is nonconvex, has a nonlinear relationship with
, and its components are interdependent. Additionally, the limiting constraints do not meet the requirements for a convex problem, thus making problem
a nonlinear nonconvex optimization problem. Because
and
, the objective function can be transformed as follows:
The changed objective function is still difficult to solve, in order to effectively solve this problem, this paper adopts the quadratic transformation to deal with the above objective function expression, so that
is the new objective function value, then the objective function becomes the following:
where the
expressions are, respectively, as follows:
where
are the distances between the two hops obtained in the previous iteration, respectively, and the optimization problem becomes the following:
Unfortunately, because
exist, this leads to the problem that
is still nonconvex, and the solution to this problem is still difficult. So this paper applied continuous convex approximation to solve the problem [
27]. First-order Taylor expansion was carried out for
. Since
, the first-order derivative value of the above expression for
should be a trajectory along three dimensions (
X,
Y,
Z). The first-order partial derivative values of
for the Taylor expansion expression of
are as follows:
The expanded objective function expression becomes the following:
The optimization problem
is obtained by bringing Equations (60), (61), and (64) into problem
for collation, as follows:
The
is a monotonically increasing concave function, the quadratic transformation of the partition in problem
must be a concave function, and in Reference [
28], the optimization problem
after the quadratic transformation of problem
is a convex problem, which can be solved by applying the CVX toolkit in order to obtain the global suboptimal solution.
5.4. Optimization of Power Allocation Factor for Cluster Drones
To study the optimization problem of the power allocation factor
for cluster drones, it is necessary to fix variables such as the recently updated modulation method
of the relay drone, the transmission power
, and the three-dimensional trajectory
during the
th round within time slot
. Given
, and with the other variables held constant, the optimization of the transmission power for cluster drones across different time slots is independent [
29]. Let
be the new objective function value, with its expression as follows:
However, the objective function is still not a convex function and the problem is still difficult to solve; fortunately, Equation (66) is satisfied, as follows:
Thus, the original optimization problem
can be varied as follows:
In the above optimization problem , the objective function is a concave function of the optimization variables .
Proof: Take the second order derivative of
, as follows:
From the constraints on the variables and the
physical significance of the constraints, it can be seen for the second-order
derivative that ;
then, the objective function is a concave function. Also, the restriction
constraints satisfy the convex set property, so the optimization problem is convex, as
proven. Therefore problem can be solved with CVX toolkit to obtain a global suboptimal solution for .
In summary, because of the solver employs mixed-integer computation and the interior-point method to solve each subproblem, the computational complexity of the joint optimization of the relay drone communication rate is
, where
is the total number of iterations of the algorithm,
is the maximum number of iterations for the subproblems, and
is the maximum dimension of the decision variables. The algorithm execution process is shown in Algorithm 2.
Algorithm 2 Flow of joint communication rate optimization algorithm |
Input: Use the coordinates of the cluster drones, the relay drone, the jammer, the ground station, and constraints such as the BER threshold to calculate the initial feasible solutions , , , and . Initialize the optimal value storage space , the current iteration count c = 1, the maximum number of iterations , and the precision . |
Output: . |
1: Input the coordinates of the cluster drones, the relay drone, the jammer, the ground station, and constraints such as the BER threshold to calculate the initial feasible solution , , , and . Initialize the optimal value storage space , the current iteration count , the maximum number of iterations , and the precision . |
2:
|
3: Optimize the modulation method of the relay drone: |
4: Initial value of input , , . |
5: Apply the Mosek toolkit to compute problem and obtain a Suboptimal solution for the relay drone modulation method . |
6: Updating the suboptimal solution for the relay drone modulation method obtained in the th round of iterations . |
7: Optimize the transmission power of the relay drone: |
8: Initial value of input , , . |
9: From Equation (52), the transmission power constraint corresponding to the BER threshold is calculated. |
10: Use the CVX toolkit to solve problem and obtain the suboptimal solution for the transmission power of the relay drone. |
11: Update the suboptimal solution for the relay drone’s transmission power obtained in the th round of iteration. |
12: Optimize the three-dimensional trajectory of the relay drone’s transmission: |
13: Initial value of input , , and . |
14: Input the two-hop channel gain value obtained by solving in the previous iteration and calculate the value of expressions and according to Equation (60) and Equation (61). |
15: Rewriting the objective function from Equation (59) yields . |
16: Calculate the values of the derivatives of with respect to in three different dimensions to obtain , and calculate the first-order Taylor expansion expression to obtain . |
17: Substituting the first-order Taylor expansion expression into problem collates problem . |
18: Use the CVX toolbox to solve problem and obtain the suboptimal solution for the three-dimensional trajectory of the relay drone. |
19: Update the suboptimal solution for the three-dimensional trajectory of the relay drone obtained in the th round of iteration. |
20: Optimize the power allocation factor for cluster drones: |
21: Initial value of input , , and . |
22: Group the cluster drones to obtain user sets for different subcarriers . |
23: Calculate and rank the signal-to-noise ratios of the cluster drones within each group reaching the relay drone to determine the order of channel gains . |
24: Referring to the channel gain order, rewriting the objective function according to Equation (67) yields . |
25: Use the CVX toolbox to solve problem and obtain the suboptimal solution for the power allocation factor of cluster drones. |
26: Update the suboptimal solution for the power allocation factor of cluster drones obtained in the th round of iteration. |
27: . |
28: Calculate . |
29: . |
30:
|
7. Conclusions
To address the optimization problem of cluster
drone relay communication rates under interference conditions, this paper
introduced a dual-step algorithmic architecture. By integrating a Bayesian
optimization-based Discrete Soft Actor-Critic (BO_DSAC) algorithm, the relay
drones were trained for efficient and precise channel selection, effectively
mitigating the impact of malicious interference on the communication link. The
performance of the algorithm was further enhanced through the optimization of the
hyperparameters. Subsequently, joint optimization of the relay drone’s
modulation order, transmission power, trajectory, and the cluster drone’s power
allocation factor was performed. Complex problems were decomposed into
subproblems and transformed into convex problems for solving, thus reducing the
complexity of the algorithm and effectively increasing the overall
communication rate of the cluster drones. The experimental results demonstrate
that the proposed algorithm excels in terms of convergence speed, task
effectiveness, and solution stability. Compared to other solutions, the
proposed algorithm improved the total reward value by 3.3947 times in
anti-interference and increased the total communication rate by in
the joint optimization scheme. Therefore, the proposed algorithm can flexibly
handle constant frequency, sweep frequency, and hybrid frequency interference
while effectively enhancing the information backhaul efficiency of cluster
drones, increasing task throughput, reducing information backhaul latency, and
improving task performance. Future research will explore the impact of highly
dynamic channel state information on channel selection algorithms to enhance
the stability and efficiency of algorithms in broader applications. Considering
the potential impact of efficient energy use for drones, further studies will
also focus on the organic integration of wireless energy harvesting
technologies with drone communication to address the energy limitations of
drones. Additionally, this study primarily concentrated on optimization
strategies for relay communication in drone clusters under interference. Future
research can be extended to investigate information backhaul methods for
ultra-large-scale drone clusters in complex electromagnetic environments. This
will reveal more details about drone cluster communication and yield more
valuable conclusions.