Visual Based Moving Target Tracking
Visual Based Moving Target Tracking
Abstract— The use of legitimate unmanned aerial vehicles efficiency and quietness make a fixed-wing UAV an excellent
(UAVs) to surveil and track misbehaved UAVs can serve a choice for surveillance applications. The suspicious targets can
crucial role in public safety and security. This paper proposes be aerial, e.g., misbehaved or malicious UAVs, due to the
a new deep reinforcement learning (DRL)-based online control
scheme for visual-based UAV-on-UAV tracking and monitoring, proliferation and easy accessibility of UAVs. A UAV monitor
where a solar-powered, fixed-wing UAV tracks a suspicious UAV can record the targets’ misbehaviors and provide evidence for
target by having the target inside its effective visual range. The forensics purposes, facilitating the visual-based surveillance
key idea is a new deep deterministic policy gradient (DDPG)- for many internet-of-things (IoT) and security applications [3],
based model, which can cope with the continuous state and [4]. While most existing drones are powered by batteries and
action spaces of the monitor and learn the optimal acceleration
control policy adapting to the solar power availability and restricted in terms of mission time and batteries, renewable
the target’s movement. The state space is designed to be the energy sources, such as solar power, are increasingly consid-
relative position of the monitor to the target, thereby preventing ered for UAVs. Solar panels can be readily installed on the top
model infeasibility. Experiments show that the new algorithm of fixed-wing UAVs, to power visual-based tracking missions.
can maintain a desired distance from the target, and outperform The trajectory of a UAV monitor needs to be carefully
control- and optimization-based alternatives in terms of energy
efficiency and tracking accuracy. An interesting finding is that planned by taking into account not only the power usage mod-
our algorithm learns faster and better with a constraint of a els of the propulsion, thrust, and hanging of the UAV, but the
minimum allowed battery energy reserve. The reason is that, energy harvesting process as well, as suggested in [5], [6], and
without the constraint, the monitor is more likely to deplete its [7]. As far as we are concerned, there has been no investigation
battery before the end of a surveillance mission. on the flight path planning of a solar-powered fixed-wing drone
Index Terms— UAV-on-UAV visual-based tracking, fixed-wing running a visual-based tailing and monitoring task. This is
UAV, solar power harvesting, online trajectory design, deep because the flight path plan of a fixed-wing drone deems non-
deterministic policy gradient.
trivial due to its relatively poor maneuverability, as compared
I. I NTRODUCTION to a rotary-wing drone.
Authorized licensed use limited to: DELHI TECHNICAL UNIV. Downloaded on March 12,2025 at 18:43:45 UTC from IEEE Xplore. Restrictions apply.
9116 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 25, NO. 8, AUGUST 2024
On the other hand, a target tracking system by UAVs to generalize these works [17], [18], [19], [20], [21], [22], [23],
equipped with gimbaled cameras was studied in [13]. [24], [25] to capture the power usage of the UAV monitors
The proposed system distributively incorporated a clustering because the system model, UAV dynamics, and application
algorithm, a sensor manager, and an optimal path planner scenario would be different when a specific UAV power
to track multiple mobile targets. An integrated guidance and consumption model is considered.
gimbal control coverage path planning approach was proposed With the consideration of power usage (and especially the
in [14], in which the mobility and gimbal inputs of an solar power), the UAV control variables, i.e., acceleration,
autonomous UAV agent were jointly controlled and optimized velocity, and trajectory, are coupled over time. The trajectory
to achieve full coverage of a given object of interest, according planning becomes a sequential decision process which is chal-
to a specified set of optimality criteria. None of these works lenging, attributed to its considerable state and action spaces
considered energy consumption of the UAV, and all of them and requirement of real-time operation. UAVs were deployed
relied on traditional control methods for UAV path planning. to monitor or track mobile ground targets in [31], [32], and
Machine learning techniques, such as deep learning (DL) [33]. A decentralized 3D navigation rule was developed in [31]
and deep reinforcement learning (DRL) [15], [16], [17], [18], to determine the best position of each UAV, and balance
[19], [20], have been found very useful in visual-based tar- the UAV’s power usage and the number of covered targets.
get tracking [21], to improve the performance of traditional Dynamic programming was utilized in [32] to optimize the
control methods. For instance, the various features of deep UAV path, which minimized the power usage and maximized
convolutional neural networks (CNNs) were exploited in [22] the disguise performance of the surveillance. Receding horizon
to enhance the precision of video-based target tailing. The control was employed in [33] to generate an energy-efficient
features of convolutional layers were interpreted to be a non- path of the solar-powered UAV. The power consumption of
linear corresponding term of a picture pyramid and straightly the UAV monitor was reduced to be linear with respect to its
exploited to represent target objects. It was shown in [23] speed in [31] and [32], and the power requirement of hovering
that basic two-layer CNNs could be sufficient to learn effec- was overlooked in [33]. In [34] and [35], we investigated
tive representations for video-based tailing, without training disguised visual-based target tracking by rotary- and fixed-
beforehand on a large dataset. An underwater target tailing task wing UAVs, respectively. Difference-of-convex programming
of an underwater vehicle was investigated in [24], where DRL was employed to solve the non-convex trajectory design for
was employed to resolve the constructed Markov decision rotary-wing UAVs [34]. Convex optimization was integrated
task under uncertain hydrodynamics. The robustness of target with MPC to produce trajectories for fixed-wing UAVs [35].
detection and tracking was enhanced through radar and video None of the existing works [8], [9], [10], [11], [12], [13],
camera data fusion in [25], especially under bad weather [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24],
conditions. Radar acted as the main role and camera served [25], [26], [27], [28], [29], [30] fully investigated the energy-
as an assistant for real-time environment sensing. A dynamic efficient 3D target tracking by a fixed-wing UAV with solar
occlusion-aware video-based tailing approach was developed power harvesting capability, and compared the performances
in [26] to address the deformation and long-term occlusion of of ML methods with traditional control- or optimization-based
target appearance during tracking. When the target is severely algorithms.
occluded or had been occluded for a long time, it is redetected
by a well-designed classifier chosen from a classifier group, B. Contribution
based on an entropy minimization metric.
DL and DRL have been leveraged to UAV-enabled target This paper presents a new scheme for the online control of
tracking, to distinguish a target from its background, handle video-based UAV-on-UAV target tracking. Specifically, a solar-
the object deformation, address the aspect ratio change (ARC) harvesting fixed-wing UAV installed with video cameras tails
of captured images, and make coarse-to-fine tracking policies. and visually surveils a suspicious UAV target. The monitoring
Autonomous flight control was developed in [27] for path UAV keeps the target inside its view, while staying away from
tailing missions in adversarial situations. A two-player zero- the target to avoid raising the attention of the target. DRL is
sum game was formulated and the best UAV path was obtained employed to train the trajectory of the monitor online.
by DL in real-time. Q-learning was utilized to optimize the The novelty and contributions of the paper are highlighted
UAV flight route for tailing a radio frequency (RF) target in as follows.
a Rayleigh fading channel [28]. A UAV swarm was deployed • A new problem is formulated for online control UAV-
to locate an RF mobile target in [29], where a constrained on-UAV target tracking performed by a solar-harvesting,
Markov decision process (MDP) was constructed to locate the fixed-wing UAV monitor. The problem is a non-convex
target in the presence of channel uncertainties. Multi-agent sequential decision process and can be formulated as a
reinforcement learning (RL) was utilized to coordinate the Markov decision process (MDP) [36]. It is difficult to
several UAVs, avoid excessive UAV flight paths, and perform obtain the optimal solution online for a continuous state
real-time target tracking. In [30], DRL was employed by sequential decision problem, given the large size of the
a UAV for vision-based target recognition and tracking to problem and state space [37]. The problem cannot be
address the ARC problem of captured images and refine the optimally resolved by traditional control or optimization
boundaries of the bounding box, without addressing the UAV methods that would rely on accurate predictions of the
trajectory design during tracking. Yet, it is not straightforward target’s trajectory.
Authorized licensed use limited to: DELHI TECHNICAL UNIV. Downloaded on March 12,2025 at 18:43:45 UTC from IEEE Xplore. Restrictions apply.
HU et al.: VISUAL-BASED MOVING TARGET TRACKING WITH SOLAR-POWERED FIXED-WING UAV 9117
Authorized licensed use limited to: DELHI TECHNICAL UNIV. Downloaded on March 12,2025 at 18:43:45 UTC from IEEE Xplore. Restrictions apply.
9118 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 25, NO. 8, AUGUST 2024
task is performed on the midday of a sunny or cloudy day. The battery constraint (6) is important, without which the UAV
At slot t, the solar power collected is [39]. monitor could coarsely design its trajectory, consume all the
battery energy at the beginning, and fail for the rest of the
α
5.2561
Pst (z t ) = ηS Pi exp − 1−2.2556 × 10−5 z t . mission (as shown by simulations in Section IV). The studied
cos ϑt problem is cast as
(3)
N
X
Here, z t is the monitor’s altitude; η ∈ (0, 1) and S (in min Pvt ,
{(q t −bt ),v t ,at ,∀t}
m2 ) are the conversion efficiency and the solar panel size, t=1
respectively; Pi (in Watts) is the fixed power intensity of the s.t. (1), (5), (6). (7)
solar beams before entering the clouds; α > 0 is the total
Here, we propose to optimize the relative position of the
gaseous absorption; ϑt ∈ [0, π/2] stands for the solar zenith
UAV monitor with reference to the target, i.e., (q t − bt ),
angle at slot t. At sunrise or sunset, ϑt = π/2. At midday,
as opposed to designing the monitor trajectory, i.e., q t . This
ϑt = 0. The change of ϑt is negligible within 15 minutes.
helps avoid the infeasibility incurred by optimizing the monitor
We assume that ϑt = 0 in this paper w.l.o.g.; i.e., the mission
trajectory directly. Specifically, the range and distribution of
is at midday.
the relative distance and velocity between the drone monitor
While some modern solar devices can capture energy in very
and the target are expected to be reasonably consistent across
low lighting conditions, such as solar cells based on Perovskite
different time slots for an effective tracking mission. In con-
or amorphous silicon [40], [41], they cannot provide enough
trast, the range and distribution of the absolute positions of the
energy to power and sustain a typical UAV’s flight. This is
UAV monitor would change dramatically over time. Moreover,
because these dim-light solar cells operate at low voltage
the optimization of the relative distance and speed contributes
and radiation conditions, yielding an output power in the
to the reduction of the state space; or the absolute position and
magnitude of milliwatt (mW). In practice, dim-light solar cells
velocity of the target would otherwise be part of the state as the
are typically employed to support low-power devices, such as
monitor’s observations. The reduced state space is conducive
Internet-of-things sensors.
to the training and convergence of the proposed DRL model.
Based on (7), a novel DRL-based method is developed to
B. Problem Formulation compute the instantaneous acceleration of the monitor, which
The UAV monitor is installed with a video camera to is the action of the designed control system. The proposed
spot and identify the target of interest, which requires the tracking problem has continuous action and state spaces,
representations of various visual variables, such as resolution and can be solved by designing a DDPG-based method,
and viewpoint [42]. To successfully tail the target, the UAV as described in the next section. While we primarily focus
monitor needs to follow the target within a specific resolution on video input from onboard cameras in this paper, it is worth
range specified by [ pmin , pmax ] and minimize the energy noting that the proposed DRL-based control mechanism for
consumption during the mission time. Here, pmin and pmax are UAV tracking can be adapted to utilize alternative distance
the smallest and largest perceived width/height of the target measurement techniques, such as lidar and radar, to determine
in pixels, respectively. Based on the resolution requirement, the direction and distance of a target.
we can use the triangle similarity to determine the maximum
and minimum allowed monitor-target distances, dmax and dmin III. D EEP D ETERMINISTIC P OLICY G RADIENT
(in meters), as given by [43] (DDPG)-BASED S OLUTION
WT × F This section proposes a DDPG-based method under the
dmax = , (4a) actor-critic framework to solve problem (7). DDPG is
pmin
WT × F known to effectively tackle problems with continuous action
dmin = , (4b) spaces [44], [45]. In contrast, traditional DRL methods, such
pmax
as deep Q-learning, perform poorly and incur divergence under
where WT is the actual width/height of the target, and F is a continuous action space.
the focal length of the onboard video cameras. The monitor
optimizes its trajectory online by controlling its acceleration,
A. Markov Decision Process (MDP)
to keep a certain distance dt = ∥q t − bt ∥, ∀t from the target:
The UAV propulsion power consumption, i.e., the objective
dmin ≤ dt ≤ dmax , ∀t. (5) function in (7), is a non-convex function of the optimization
variables, i.e., the UAV acceleration, velocity, and waypoint.
Let E 0 (in Joules) denote the energy level of the monitor’s
In addition, these variables are tightly coupled in (7) and in
battery at the beginning of the mission. Suppose the monitor
the constraints (1) and (6) over time. As a result, the problem
needs to maintain at least (1 − η0 )E 0 Joules at every moment
is non-convex and challenging to tackle.
in the battery for other functions and contingency plans, i.e.,
On the other hand, the task is a sequential decision process
t
X t
X and can be interpreted as an MDP, since the environment
Pvn δ ≤ Psn δ + η0 E 0 , ∀t. (6) (or state) is observable and the future state only depends on
n=1 n=1 the present state [36]. Specifically, the UAV’s position at the
Authorized licensed use limited to: DELHI TECHNICAL UNIV. Downloaded on March 12,2025 at 18:43:45 UTC from IEEE Xplore. Restrictions apply.
HU et al.: VISUAL-BASED MOVING TARGET TRACKING WITH SOLAR-POWERED FIXED-WING UAV 9119
(t + 1)-th time slot depends on its position, velocity, and can also be sparse. We adopt the max-abs normalization [47]
acceleration at the t-th slot; see (1a). The UAV’s velocity at to pre-process the state. The max-abs normalization does not
the (t +1)-th time slot depends on its velocity and acceleration destroy the original data distribution, and can re-scale the
at the t-th slot; see (1b). However, it is difficult to procure the value range of the data. Moreover, we perform the max-
optimal policy online for this MDP problem, due to the con- abs normalization separately for the relative distance and the
tinuous state space of the problem [37]. The problem cannot relative velocity, to avoid the loss of the state information due
be optimally solved using traditional control or optimization to substantial differences in values.
methods that would rely on accurate predictions of the target’s In practice, the target motion estimation using a camera
trajectory, which may not be possible in practice. can be influenced by the time-varying monitor-target dis-
We start by constructing an MDP for the considered target tance, illumination condition, background characteristics, and
tracking problem and defining the state, action, and reward of target appearance. Yet, some commercially available UAVs
the drone monitor per time slot [46]. can achieve a high-resolution view and an accurate target
• State Space S: The current system state st ∈ S consists motion estimation with a camera array, such as Intel RealSense
of the relative position, q t − bt , and the relative velocity, D435i [48]. Yet, this type of camera may suffer from tem-
V t − v t , of the UAV monitor with regards to the target. perature change and noisy data when used outdoor. The
• Action Space A: Define A := {at , ∀t = 1, · · · , N } to accuracy of their distance estimation can be as good as within
gather all potential actions. The current action at is the 0.28 meter [48]. We assume that the motion estimation of the
acceleration value of the UAV monitor, At , constrained target is reasonably accurate using such cameras and visual-
by (1c)–(1d). The values of the action are constrained aided methods.
to the range of [−1, 1] (m/s2 ). Given the initial location In the reward, the “tanh(·)” function is selected due to its
and velocity of the monitor, its future waypoints q t and smooth and bounded nature. It can normalize each of the
velocities V t are decided by the accelerations, i.e., by (1a) penalty terms, barrier the reward function, stabilize training,
and (1b). and facilitate the convergence of the algorithm. Cd1 , Cd2 and
• Policy: A policy, denoted by π : S → A, is a mapping Ce can be fine-tuned to account for the relative importance
from the state space, S, to the action space, A. In other or priority of each of the penalty terms. Moreover, the reward
words, given a state s ∈ S, the policy determines a dis- function can be applied under cloudy weather conditions. For a
tribution π(a|s) = Pr (at = a|st = s) over state s ∈ S. particular time slot n, Psn follows (3) when the UAV monitor
• Experience: Define et = (st , at , rt , st+1 ) as an experi- is exposed under the sun and harvests the solar power. Psn
ence, which is stored in a replay memory R. takes zero when the UAV monitor flies under a cloud and its
• Reward rt : The reward function offers non-negative harvested solar power diminishes.
rewards at each time slot (or step) if the monitor-target Moreover, the shape and size of the target in an image
distance is within [dmin , dmax ] and the onboard battery depend on the image-capturing direction, and can change
level is no lower than (1 − η0 )E 0 Joules; or incurs during the tracking process. The proposed algorithm has the
penalties otherwise. The reward function is defined as potential to maintain a consistent view of the target object’s
shape and size by keeping the image capturing direction/angle
rt = 1 + Cd1 tanh (dt − dmin ) + Cd2 tanh (dmax − dt ) within a predefined range relative to the target’s movement
| {z }
for distance-keeping direction while maintaining a safe distance between the drone
X t X t
! and the target. This can be potentially achieved by introducing
+ Ce tanh η0 E 0 + Psn δ − Pvn δ , (8) a new term in the reward function defined in (8) that incor-
n=1 n=1 porates the image capturing angle. The new term can provide
| {z } a reward when the angle is within the specified range, or a
for battery constraint
penalty when it falls outside of it.
where Cd1 , Cd2 and Ce (Cd1 , Cd2 , Ce ≤ 1) are config- The agent senses the present state st , executes a legitimate
urable coefficients that can be tuned during the learning action at , receives a reward rt , and evolves to state st+1 .
process; and “tanh(·)” is used to scale the rewards and A policy at = π(st ) projects st to at . The agent chooses
ensure that all the rewards are in similar magnitudes. the
P N strategy achieving the largest accumulated reward Rt =
n=t γ
n−t r with γ ∈ (0, 1) denoting the discount coefficient.
The state space and reward function are particularly t
designed for problem (7). In particular, we optimize the With the determined state st , action at and the strategy π(·), Rt
relative position of the monitor to the target, as opposed to is evaluated by an action-value function, i.e., the Q-function,
directly designing the monitor’s trajectory. This is because the as given by
range and distribution of the relative distance and velocity are
expected to be reasonably consistent across different time slots Q π (st , at ) = Eπ [Rt |st , at ], (9)
for an effective tracking mission. In contrast, a direct design
of the monitor’s trajectory would suffer from substantially The action-value function, Q π (st , at ), meets the following
changing value range and distribution of the absolute positions Bellman Expectation Equation:
of the UAV monitor. On the other hand, the state st consists of
Q π (st ,at ) = Ert ,st+1 ∼E rt +γ Eat+1 ∼π Q π (st+1 ,at+1 ) .
the relative distance and velocity, the two of which can have
substantially different ranges and distributions. A state vector (10)
Authorized licensed use limited to: DELHI TECHNICAL UNIV. Downloaded on March 12,2025 at 18:43:45 UTC from IEEE Xplore. Restrictions apply.
9120 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 25, NO. 8, AUGUST 2024
Fig. 2. The architecture of the proposed DDPG-based UAV tracking system, where the training network and the target network each comprise an actor
network and a critic network. The experience replay buffer provides batches of samples of state transitions to train and update the networks.
Here, E represents the interacting environment. When this and chooses an action at . Exploration noises are attached to
given policy, denoted by µπ : S → A, is deterministic, (10) the action to balance new action exploration and known action
can be rewritten as exploitation. The output action is given by
Q µπ (st , at ) = Ert ,st+1 ∼E [rt at = µ (st ; θa ) + Nt , (12)
+γ Q µπ (st+1 , µπ (st+1 )) .
(11)
where Nt is an Ornstein-Uhlenbeck (OU) noise process that
In general, it is difficult to apply directly an RL algorithm to generates temporally correlated exploration in the physics
handle the MDP or obtain the Q-value, Q(st , at ), because of domain to improve exploration efficiency [50]. As the result
the continuous state and action spaces, and the uncertain target of the action at , the monitor is rewarded with rt and evolves
trajectory. We design a DDPG-based algorithm to control the to state st+1 . It also stores (st , at , rt , st+1 ) in R.
UAV monitor’s trajectory, as described in the following. The training-critic network evaluates the action-value func-
tion of the executed action at , that is, Q µ (st , µ(st ; θa ); θc ).
B. Actor-Critic Framework-Based DDPG By taking a recorded transition (si , ai , ri , si+1 ) at random from
In the DDPG-based network, four DNN approximators are R, the action-value function generated at the training-critic
used, including training-actor and training-critic networks, network is approximately evaluated as Q µ (si , µ(si ; θa ); θc ).
and target-actor and target-critic networks, as illustrated by We assume a probability distribution of θa , denoted by
Fig. 2. The training-actor network, represented by µ (st ; θa ), J (θa ), for policy estimation. To improves the strategy fastest,
approaches the strategy of the UAV monitor and generates the training-actor network is refreshed based on the gradient
actions. θa stands for the parameters of the training-actor net- of J (θa ). The gradient is given by [44].
work. The training-critic network, denoted by Q µ (st , at ; θc ),
∇θa J (θa ) = Es∼ρ µ ∇θa µ(st ; θa )∇a Q µ (st , a; θc )|a=µ(st ;θa ) ,
estimates the action-value function of the actions [44]. θc
stands for the model parameters of the training-critic network. (13)
The target-actor network, represented by µ′ st ; θa′ , and the
where ρ µ stands for a discounted state distribution of
target-critic network, represented by Q ′µ′ (st , at ; θc′ ), generate
strategy µ(st ; θa ) [45]; ∇θa µ(s) provides the gradient of
the target Q-value for learning the training-actor and training-
the training-actor network µ(s) regarding the parameter θa ;
critic networks. θa′ and θc′ stand for the model parameters of
∇a Q µ (st , a; θa ) stands for the gradient of Q µ (st , a; θa )
the target-actor and target-critic networks, respectively.
regarding action a.
Relying on the actor-critic setting, the DDPG network
By randomly drawing Nbatch sampled historical transitions
abides by the deterministic policy gradient (DPG) theo-
from R, ∇θa J (θa ) is approximated by
rem [44] to refresh θa , θc , θa′ and θc′ , and improve the
actions. The adoption of the target network (containing the NX
batch
1
∇θa J (θa ) ≈ ∇θa µ(si )∇a Q µ (si , a; θc )|a=µ(si ) .
target-actor and target-critic networks) helps address an issue Nbatch
of oscillating operation stemming from employing only a i=1
training network [49]. We consider fully connected neural (14)
networks (FCNNs) with two hidden layers for each training-
The model parameter of the training-actor network, θa ,
actor, training-critic, target-actor and target-critic network.
is refreshed based on the gradient ascent [51]:
The UAV monitor (i.e., the agent) passes its current state
st to the training-actor network. In the DPG theorem [44], NX
ηa batch
θa ← θa + ∇θa µ(si )∇a Q µ (si , a; θc )|a=µ(si ) .
the training-actor network yields explicitly the present tactics
Nbatch
by definitely mapping a state into an action. The training- i=1
actor network approaches the strategy function of the agent (15)
Authorized licensed use limited to: DELHI TECHNICAL UNIV. Downloaded on March 12,2025 at 18:43:45 UTC from IEEE Xplore. Restrictions apply.
HU et al.: VISUAL-BASED MOVING TARGET TRACKING WITH SOLAR-POWERED FIXED-WING UAV 9121
Here, ηa stands for the learning rate of the training-actor Algorithm 1 DDPG-based UAV-based tracking
network. 1 Initialization: Randomly initialize training-actor network µ
The training-critic network is refreshed by achieving the and training-critic network Q µ with weights θa and θc ,
minimum loss, as given by target-actor network µ′ and target-critic network Q ′µ′ with
h 2 i weights θa′ ← θa and θc′ ← θc , and R.
L(θc ) = Est ∼ρ µ ,at ∼µ(st ;θa ) Q µ (st , at ; θc ) − yt . (16) 2 for episode = 1, · · · , Tep do
3 Initialize the relative locations and velocities between the
Here, yt = rt + γ Q ′µ′ st+1 , µ′ st+1 ; θa′ ; θc′ is the tar-
UAV monitor and the target as the initial state s0 .
get Q-value yielded by the target network according to 4 for timestep = 1, · · · , Ts do
Select an action at1 = µ (st ; θa ) + Nt .
(st , at , rt , st+1 ). Here, the parameters of the target-actor and 5
6 Operate action at1 , procure reward rt and transits to
target-critic networks, θa′ and θc′ , provide declined copies of next state st+1 .
θa and θc , respectively. 7 Store current evolution (st , at , rt , st+1 ) in R.
With the Nbatch stochastically sampled evolutions, the loss 8 Stochastically draw Nbatch sampled historical
function, L(θc ), is approximately evaluated by evolutions (si , ai , ri , si+1 ) from R.
9 Refresh the target Q-value:
NX
batch yi = ri + γ Q ′µ′ si+1 , µ′ si+1 ; θa′ ; θc′ .
1 h 2 i
L(θc ) ≈ Q µ (si , µ(si ; θa ); θc ) − yi . (17) 10 Calculate the lossh function: L(θc ) ≈
Nbatch i
i=1 1 P Nbatch Q (s , µ(s ; θ ); θ ) − y 2 .
Nbatch i=1 µ i i a c i
Here, yi = ri + γ Q ′µ′ si+1 , µ′ si+1 ; θa′ ; θc′ approximates the
11 Update the critic network by minimizing the loss
target Q-value from the target network according to Nbatch function: P h 2 i
Nbatch
samples drawn randomly from the replay memory. Differenti- 12 minθc N 1 i=1 Q µ (si , µ(si ; θa ); θc ) − yi .
batch
ating L(θc ) with respect to θc , we obtain the gradient as 13 Update the actor network by the sampled policy
gradient:
NX
batch 14 ∇θa J (θa ) ≈
2
∇θc L(θc ) ≈ Q µ (si , µ(si ; θa ); θc ) − yi 1 P Nbatch ∇ µ s )∇ Q (s , a; θ )|
Nbatch i=1 θa ( i a µ i c a=µ(si ) ;
Nbatch
i=1 15 θa ← θa + ηa ∇θa J (θa )
·∇θc Q µ (si , µ(si ; θa ); θc ) . Update the target-actor and target-critic networks:
(18) 16
17 θa′ ← τa θa + (1 − τa )θa′ ,
The parameter of the training-critic network, θc , is refreshed 18 θc′ ← τc θc + (1 − τc )θc′ .
utilizing the stochastic gradient descent method [51].
The target-actor and target-critic networks evolve from
the training-actor and training-critic networks based on the
following rule: input and output sizes of fully connected layer l [52].
The mini-batch gradient ascent has the time complex-
θa′ ← τa θa + (1 − τa )θa′ , ity of Tbgd = O (Nbatch /ϵ0 ), where ϵ0 is the accuracy
θc′ ← τc θc + (1 − τc )θc′ , (19) requirement to terminate the iterations [53]. Therefore,
the time-complexity of Algorithm 1 is T f c + Tbgd =
where τa and τc are decaying rates for the training-actor and P2
training-critic networks, respectively. O F l · Fl + N
l=1 in out /ϵ
batch 0 .
The proposed DDPG-based UAV-target tailing approach is The optimality of the proposed DDPG-based algorithm can
summarized in Algorithm 1. The algorithm contains model be established by proving that the DDPG algorithm satisfies
initialization, model training, and model updates. We first ran- the conditions under the optimality of the policy gradient (PG)
domly initialize the four networks, including the training-actor methods. It was established in [54] that the optimality gap of
and training-critic networks, and target-actor and target-critic PG is bounded by
networks, and the experience replay buffer. The relative loca-
tions and velocities of the UAV monitor and target are the L(πθ ) − min L(π )
π∈5
input state of the algorithm. Given the present state of the κρ h u i
≤− min c⟨∇θ L(θ ), v − θ⟩ + ∥v − θ∥2 , (20)
UAV monitor, the action is produced through model training. 1 − γ v∈2 2
The model parameters are updated with historical transitions
in which L(·) is the loss function. κρ is the concentrability
from the replay memory until the maximum cumulative reward
coefficient defined for the class of cost-to-go functions Jθ =
is obtained. Finally, the monitor executes the action, changes
{Jπθ : θ ∈ 2}, which is the smallest scalar to satisfy
its state, and decides whether to stop the training.
κρ
∥J −J ∗ ∥1,ρ ≤ ∥J − T J ∥1,ρ , ∀J ∈ Jθ , (21)
1−γ
C. Complexity and Optimality
where T : J → J is the Bellman optimality operator:
To evaluate the time-complexity of Algorithm 1, we con- Z
sider the time-complexity of a FCNN with two hidden (T J )(s) ≜ min g(s, a) + J (s ′ )P(ds ′ |s, a) . (22)
layers, and the mini-batch gradient ascent (14). The time- a∈A
complexity of the FCNN with two hidden layers is The optimality gap holds when the following conditions
P2 l l l l
Tfc = O l=1 Fin · Fout , where Fin and Fout are the are met.
Authorized licensed use limited to: DELHI TECHNICAL UNIV. Downloaded on March 12,2025 at 18:43:45 UTC from IEEE Xplore. Restrictions apply.
9122 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 25, NO. 8, AUGUST 2024
Authorized licensed use limited to: DELHI TECHNICAL UNIV. Downloaded on March 12,2025 at 18:43:45 UTC from IEEE Xplore. Restrictions apply.
HU et al.: VISUAL-BASED MOVING TARGET TRACKING WITH SOLAR-POWERED FIXED-WING UAV 9123
TABLE I
S UMMARY OF H YPERPARAMETERS
TABLE II
T HE PARAMETERS FOR F IXED -W ING UAV P ROPULSION P OWER AND
H ARVESTED S OLAR P OWER
Fig. 4. Trajectory, velocity and acceleration of the best control policy for
the UAV monitor (Episode 1,067).
Authorized licensed use limited to: DELHI TECHNICAL UNIV. Downloaded on March 12,2025 at 18:43:45 UTC from IEEE Xplore. Restrictions apply.
9124 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 25, NO. 8, AUGUST 2024
Authorized licensed use limited to: DELHI TECHNICAL UNIV. Downloaded on March 12,2025 at 18:43:45 UTC from IEEE Xplore. Restrictions apply.
HU et al.: VISUAL-BASED MOVING TARGET TRACKING WITH SOLAR-POWERED FIXED-WING UAV 9125
Fig. 9. Monitor battery level under three types of target trajectories by the DDPG and MPC algorithms when η = 0.2 and S = 0.5.
Fig. 10. Monitor battery level under three types of target trajectories when η = 0.1 and S = 0.4 m2 .
Fig. 11. The distance-keeping performance of the UAV monitor under three types of target trajectories when η = 0.1 and S = 0.4 m2 .
level of 750 Joules under the “random” target trajectories Figs. 12 and 13, where the solar panel efficiency and size are
with B t ∈ U (−1, 1). This indicates that the harvested solar set to (η, S) = (0.2, 0.5). The UAV acceleration is 3D and
energy supports the UAV monitor effectively. In the rest of can change continuously along each of the x-, y-, and z-axes
the scenarios studied in Fig. 10, the battery level drops below under the proposed method. By default, we set the acceleration
500 Joules, indicating the insufficiency of the harvested solar within [−1, 1] m/s2 along each of the x-, y-, and z-axes.
energy for UAV propulsion. Fig. 11 shows that the average It is seen that the proposed DDPG-based algorithm outper-
monitor-target distance increases quickly to about 20 m first, forms the MPC-based algorithm under the sinusoidal target
and then stabilizes around 20 m during tracking. trajectory, providing more effective tracking and better energy
We show that the proposed scheme can be readily extended efficiency.
from the 2D UAV target tracking to the 3D. In this case, with We note that controlling a fixed-wing UAV through its
slight abuse of notation, the constraints of 3D mobility in (1) 3D acceleration is feasible due to the inherent connection
concern 3D variables, including the UAV waypoint q t := between acceleration and the key aerodynamic forces involved
(xt , yt , z t ), velocity V t := (Vxt , Vyt , Vzt ), and acceleration in flight. By influencing the UAV’s acceleration, one indirectly
At := (A xt , A yt , A zt ). In addition, the maximum pitch angle manages the interplay of thrust, lift, drag, and weight, which
constraint has to be satisfied when the drone is ascending or are fundamental to flight dynamics. The throttle control, linked
descending, i.e., Vzt /∥V t ∥ ≤ ϑ, where ϑ is the sine value of to the propulsion system, governs the UAV’s longitudinal
the largest allowable pitch angle of the drone. The propulsion acceleration by adjusting thrust. Elevator control influences the
power usage for the UAV 3D flight is [62] pitch, affecting the balance between lift and weight for changes
in altitude. Ailerons control roll, adjusting the distribution of
c2 (Vxt2 + Vyt2 ) c2 ∥ At ∥2 2c2 A zt lift between wings, facilitating turns. Rudder control manages
Pvt = c1 ∥V t ∥3 + + +
∥V t ∥3 g 2 ∥V t ∥ g∥V t ∥ yaw, enabling coordinated turns by manipulating the aircraft’s
+ m(gVzt + At V t ).
T
(24) heading. These control inputs collectively shape the forces
and moments acting on the UAV. The ability to control a
We plot the 3D UAV trajectory and the corresponding fixed-wing UAV through its 3D acceleration can leverage these
battery level under the proposed scheme in a 3D scenario in aerodynamic principles to achieve responsive flight control.
Authorized licensed use limited to: DELHI TECHNICAL UNIV. Downloaded on March 12,2025 at 18:43:45 UTC from IEEE Xplore. Restrictions apply.
9126 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 25, NO. 8, AUGUST 2024
TABLE III
C OMPARISON OF T RACKING C APABILITY B ETWEEN THE DDPG-BASED
A LGORITHMS W ITH AND W / O BATTERY C ONSTRAINT U NDER THE
“ RANDOM - LINEAR ” TARGET T RAJECTORY
Fig. 13. Monitor-target distance and battery level under 3D random and
sinusoidal trajectories of the UAV monitor.
Authorized licensed use limited to: DELHI TECHNICAL UNIV. Downloaded on March 12,2025 at 18:43:45 UTC from IEEE Xplore. Restrictions apply.
HU et al.: VISUAL-BASED MOVING TARGET TRACKING WITH SOLAR-POWERED FIXED-WING UAV 9127
TABLE V
C OMPARISON OF T RACKING C APABILITY B ETWEEN THE MPC-BASED
C ONTROL A LGORITHMS W ITH AND W / O BATTERY C ONSTRAINT
U NDER THE “ RANDOM ” TARGET T RAJECTORY
Authorized licensed use limited to: DELHI TECHNICAL UNIV. Downloaded on March 12,2025 at 18:43:45 UTC from IEEE Xplore. Restrictions apply.
9128 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 25, NO. 8, AUGUST 2024
[5] S. Hu, Q. Wu, and X. Wang, “Energy management and trajectory [27] L. R. G. Carrillo and K. G. Vamvoudakis, “Deep-learning tracking
optimization for UAV-enabled legitimate monitoring systems,” IEEE for autonomous flying systems under adversarial inputs,” IEEE Trans.
Trans. Wireless Commun., vol. 20, no. 1, pp. 142–155, Jan. 2021. Aerosp. Electron. Syst., vol. 56, no. 2, pp. 1444–1459, Apr. 2020.
[6] C. Sun, W. Ni, and X. Wang, “Joint computation offloading and [28] M. M. U. Chowdhury, F. Erden, and I. Guvenc, “RSS-based Q-learning
trajectory planning for UAV-assisted edge computing,” IEEE Trans. for indoor UAV navigation,” in Proc. IEEE Mil. Commun. Conf. (MIL-
Wireless Commun., vol. 20, no. 8, pp. 5343–5358, Aug. 2021. COM), Nov. 2019, pp. 121–126.
[7] Y. Zeng and R. Zhang, “Energy-efficient UAV communication with [29] Y. Chen, D. Chang, and C. Zhang, “Autonomous tracking using a swarm
trajectory optimization,” IEEE Trans. Wireless Commun., vol. 16, no. 6, of UAVs: A constrained multi-agent reinforcement learning approach,”
pp. 3747–3760, Jun. 2017. IEEE Trans. Veh. Technol., vol. 69, no. 11, pp. 13702–13717, Nov. 2020.
[8] L. Zhang et al., “Vision-based target three-dimensional geolocation using [30] W. Zhang, K. Song, X. Rong, and Y. Li, “Coarse-to-fine UAV target
unmanned aerial vehicles,” IEEE Trans. Ind. Electron., vol. 65, no. 10, tracking with deep reinforcement learning,” IEEE Trans. Autom. Sci.
pp. 8052–8061, Oct. 2018. Eng., vol. 16, no. 4, pp. 1522–1530, Oct. 2019.
[9] X. Zhang, Y. Fang, X. Zhang, J. Jiang, and X. Chen, “A novel geometric [31] H. Huang and A. V. Savkin, “Reactive 3D deployment of a flying robotic
hierarchical approach for dynamic visual servoing of quadrotors,” IEEE network for surveillance of mobile targets,” Comput. Netw., vol. 161,
Trans. Ind. Electron., vol. 67, no. 5, pp. 3840–3849, May 2020. pp. 172–182, Oct. 2019.
[32] H. Huang, A. Savkin, and W. Ni, “A method for covert video surveillance
[10] V. Shaferman and T. Shima, “Unmanned aerial vehicles cooperative
of a car or a pedestrian by an autonomous aerial drone via trajectory
tracking of moving ground target in urban environments,” J. Guid.,
planning,” in Proc. IEEE ICCAR, Singapore, Apr. 2020, pp. 1–3.
Control, Dyn., vol. 31, no. 5, pp. 1360–1371, Sep. 2008.
[33] Y. Huang, H. Wang, and P. Yao, “Energy-optimal path planning for
[11] H. Yu, K. Meier, M. Argyle, and R. W. Beard, “Cooperative path solar-powered UAV with tracking moving ground target,” Aerosp. Sci.
planning for target tracking in urban environments using unmanned air Technol., vol. 53, pp. 241–251, Jun. 2016.
and ground vehicles,” IEEE/ASME Trans. Mechatronics, vol. 20, no. 2, [34] S. Hu, W. Ni, X. Wang, A. Jamalipour, and D. Ta, “Joint optimization of
pp. 541–552, Apr. 2015. trajectory, propulsion, and thrust powers for covert UAV-on-UAV video
[12] S. Wang, F. Jiang, B. Zhang, R. Ma, and Q. Hao, “Development of tracking and surveillance,” IEEE Trans. Inf. Forensics Security, vol. 16,
UAV-based target tracking and recognition systems,” IEEE Trans. Intell. pp. 1959–1972, 2021.
Transp. Syst., vol. 21, no. 8, pp. 3409–3422, Aug. 2020. [35] S. Hu, W. Ni, X. Wang, and A. Jamalipour, “Disguised tailing and video
[13] N. Farmani, L. Sun, and D. J. Pack, “A scalable multitarget track- surveillance with solar-powered fixed-wing unmanned aerial vehicle,”
ing system for cooperative unmanned aerial vehicles,” IEEE Trans. IEEE Trans. Veh. Technol., vol. 71, no. 5, pp. 5507–5518, May 2022.
Aerosp. Electron. Syst., vol. 53, no. 4, pp. 1947–1961, Aug. 2017. [36] S. S. Baek, H. Kwon, J. A. Yoder, and D. Pack, “Optimal path
[14] S. Papaioannou, P. Kolios, T. Theocharides, C. G. Panayiotou, and planning of a target-following fixed-wing UAV using sequential decision
M. M. Polycarpou, “Integrated guidance and gimbal control for cover- processes,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., Nov. 2013,
age planning with visibility constraints,” IEEE Trans. Aerosp. Electron. pp. 2955–2962.
Syst., vol. 59, no. 2, pp. 1276–1291, Apr. 2023. [37] P. Morere, R. Marchant, and F. Ramos, “Sequential Bayesian optimiza-
[15] S. Hu, X. Chen, W. Ni, E. Hossain, and X. Wang, “Distributed tion as a POMDP for environment monitoring with UAVs,” in Proc.
machine learning for wireless communication networks: Techniques, IEEE Int. Conf. Robot. Autom. (ICRA), May 2017, pp. 6381–6388.
architectures, and applications,” IEEE Commun. Surveys Tuts., vol. 23, [38] F. Phillip, “Newton’s second law of motion,” Phys. Today, vol. 60, no. 6,
no. 3, pp. 1458–1493, 3rd Quart., 2021. p. 28, 2007.
[16] X. Yuan, S. Hu, W. Ni, R. P. Liu, and X. Wang, “Joint user, channel, [39] G. S. Aglietti, S. Redi, A. R. Tatnall, and T. Markvart, “Harnessing
modulation-coding selection, and RIS configuration for jamming resis- high-altitude solar power,” IEEE Trans. Energy Convers., vol. 24, no. 2,
tance in multiuser OFDMA systems,” IEEE Trans. Commun., vol. 71, pp. 442–451, Jun. 2009.
no. 3, pp. 1631–1645, Mar. 2023. [40] C. Chen, J. Chang, K. Chiang, H. Lin, S. Hsiao, and H. Lin, “Perovskite
[17] X. Yuan, W. Ni, M. Ding, K. Wei, J. Li, and H. V. Poor, “Amplitude- photovoltaics for dim-light applications,” Adv. Funct. Mater., vol. 25,
varying perturbation for balancing privacy and utility in federated no. 45, pp. 7064–7070, Dec. 2015.
learning,” IEEE Trans. Inf. Forensics Security, vol. 18, pp. 1884–1897, [41] K. Yoshikawa et al., “Silicon heterojunction solar cell with interdigitated
2023. back contacts for a photoconversion efficiency over 26%,” Nature
[18] Y. He, M. Yang, Z. He, and M. Guizani, “Computation offloading Energy, vol. 2, no. 5, p. 17032, Mar. 2017.
and resource allocation based on DT-MEC-assisted federated learn- [42] Z. Zheng, T. Ruan, Y. Wei, Y. Yang, and T. Mei, “VehicleNet: Learning
ing framework,” IEEE Trans. Cognit. Commun. Netw., vol. 9, no. 6, robust visual representation for vehicle re-identification,” IEEE Trans.
pp. 1707–1720, Dec. 2023. Multimedia, vol. 23, pp. 2683–2693, 2021.
[19] Y. He, X. Zhong, Y. Gan, H. Cui, and M. Guizani, “A DDPG hybrid of [43] C. J. Hsu, M.-C. Lu, and Y.-Y. Lu, “Distance and angle measurement
graph attention network and action branching for multi-scale end-edge- of objects on an oblique plane based on pixel number variation of CCD
cloud vehicular orchestrated task offloading,” IEEE Wireless Commun., images,” IEEE Trans. Instrum. Meas., vol. 60, no. 5, pp. 1779–1794,
vol. 30, no. 4, pp. 147–153, Aug. 2023. May 2011.
[44] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller,
[20] Y. He, M. Yang, Z. He, and M. Guizani, “Resource allocation based
“Deterministic policy gradient algorithms,” in Proc. 31st Int. Conf. Int.
on digital twin-enabled federated learning framework in heteroge-
Conf. Mach. Learn., vol. 32, 2014, pp. I-387–I-395.
neous cellular network,” IEEE Trans. Veh. Technol., vol. 72, no. 1,
[45] T. P. Lillicrap et al., “Continuous control with deep reinforcement
pp. 1149–1158, Jan. 2023.
learning,” in Proc. ICLR, 2016, pp. 1–14.
[21] S. M. Marvasti-Zadeh, L. Cheng, H. Ghanei-Yakhdan, and S. Kasaei, [46] M. L. Puterman, Markov Decision Processes: Discrete Stochastic
“Deep learning for visual tracking: A comprehensive survey,” IEEE Dynamic Programming. New York, NY, USA: Wiley, 2014.
Trans. Intell. Transp. Syst., vol. 23, no. 5, pp. 3943–3968, May 2022. [47] J. Han and M. Kamber, Data Mining. Concepts and Techniques.
[22] C. Ma, J.-B. Huang, X. Yang, and M.-H. Yang, “Robust visual tracking San Mateo, CA, USA: Morgan Kaufmann, 2006.
via hierarchical convolutional features,” IEEE Trans. Pattern Anal. [48] (Sep. 2023). Intel RealSense Depth Camera D435i. [Online]. Available:
Mach. Intell., vol. 41, no. 11, pp. 2709–2723, Nov. 2019. https://www.intelrealsense.com/depth-camera-d435i/?magento_session_
[23] K. Zhang, Q. Liu, Y. Wu, and M.-H. Yang, “Robust visual tracking via id=c5a2edff5b296e750d607cdd314bbc5f
convolutional networks without training,” IEEE Trans. Image Process., [49] Y. Hou, L. Liu, Q. Wei, X. Xu, and C. Chen, “A novel DDPG method
vol. 25, no. 4, pp. 1779–1792, Apr. 2016. with prioritized experience replay,” in Proc. IEEE Int. Conf. Syst., Man,
[24] Y. Wang et al., “Target tracking control of a biomimetic underwater Cybern. (SMC), Oct. 2017, pp. 316–321.
vehicle through deep reinforcement learning,” IEEE Trans. Neural Netw. [50] G. E. Uhlenbeck and L. S. Ornstein, “On the theory of the Brownian
Learn. Syst., vol. 33, no. 8, pp. 3741–3752, Aug. 2022. motion,” Phys. Rev., vol. 36, pp. 823–841, Sep. 1930, doi: 10.1103/Phys-
[25] Z. Liu et al., “Robust target recognition and tracking of self-driving cars Rev.36.823.
with radar and camera information fusion under severe weather condi- [51] R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, “Policy gradient
tions,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 7, pp. 6640–6653, methods for reinforcement learning with function approximation,” in
Jul. 2022. Proc. NIPS, 1999, pp. 1057–1063.
[26] X. Dong, J. Shen, D. Yu, W. Wang, J. Liu, and H. Huang, “Occlusion- [52] G. Serpen and Z. Gao, “Complexity analysis of multilayer perceptron
aware real-time object tracking,” IEEE Trans. Multimedia, vol. 19, no. 4, neural network embedded into a wireless sensor network,” Proc. Com-
pp. 763–771, Apr. 2017. put. Sci., vol. 36, pp. 192–197, Jan. 2014.
Authorized licensed use limited to: DELHI TECHNICAL UNIV. Downloaded on March 12,2025 at 18:43:45 UTC from IEEE Xplore. Restrictions apply.
HU et al.: VISUAL-BASED MOVING TARGET TRACKING WITH SOLAR-POWERED FIXED-WING UAV 9129
[53] R. Johnson and T. Zhang, “Accelerating stochastic gradient descent using Australia. He is also a Conjoint Professor with The University of New South
predictive variance reduction,” in Proc. 26th Int. Conf. Adv. Neural Inf. Wales, an Adjunct Professor with the University of Technology Sydney, and an
Process. Syst., vol. 26, 2013, pp. 315–323. Honorary Professor with Macquarie University. He has coauthored one book,
[54] J. Bhandari and D. Russo, “Global optimality guarantees for policy ten book chapters, more than 300 journal articles, more than 100 conference
gradient methods,” 2019, arXiv:1906.01786. papers, 26 patents, and ten standard proposals accepted by IEEE, and three
[55] G. R. Chandra Mouli, P. Bauer, and M. Zeman, “System design for a technical contributions accepted by ISO. His research interests include 6G
solar powered electric vehicle charging station for workplaces,” Appl. security and privacy, machine learning, stochastic optimization, and their
Energy, vol. 168, pp. 434–443, Apr. 2016. applications to system efficiency and integrity. He served as the Secretary
[56] P. J. Zarco-Tejada, R. Diaz-Varela, V. Angileri, and P. Loudjani, and then the Vice-Chair and Chair of the IEEE VTS NSW Chapter from
“Tree height quantification using very high resolution imagery acquired 2015 to 2022, the Track Chair for VTC-Spring 2017, the Track Co-Chair for
from an unmanned aerial vehicle (UAV) and automatic 3D photo- IEEE VTC-Spring 2016, the Publication Chair for BodyNet 2015, and the
reconstruction methods,” Eur. J. Agronomy, vol. 55, pp. 89–99, Student Travel Grant Chair for WPMC 2014. He also serves as a Technical
Apr. 2014. Expert at Standards Australia in support of the ISO Standardization of AI and
[57] M. Zinaddinov, S. Mil’shtein, and D. Kazmer, “Design of light-weight Big Data. He has been an Editor of IEEE T RANSACTIONS ON W IRELESS
solar panels,” in Proc. IEEE 46th Photovolt. Spec. Conf. (PVSC), C OMMUNICATIONS, since 2018; IEEE T RANSACTIONS ON V EHICULAR
Jun. 2019, pp. 0582–0587. T ECHNOLOGY, since 2022; and IEEE T RANSACTIONS ON I NFORMATION
[58] J.-K. Shiau, D.-M. Ma, P.-Y. Yang, G.-F. Wang, and J. H. Gong, “Design F ORENSICS AND S ECURITY and IEEE C OMMUNICATIONS S URVEYS AND
of a solar power management system for an experimental UAV,” IEEE T UTORIALS, since 2024.
Trans. Aerosp. Electron. Syst., vol. 45, no. 4, pp. 1350–1360, Oct. 2009.
[59] A. Kokhanovsky, “Optical properties of terrestrial clouds,” Earth-Sci.
Xin Wang (Fellow, IEEE) received the B.Sc. and
Rev., vol. 64, nos. 3–4, pp. 189–241, Feb. 2004.
M.Sc. degrees in electrical engineering from Fudan
[60] J. A. Shaffer, E. Carrillo, and H. Xu, “Hierarchal application of receding
University, Shanghai, China, in 1997 and 2000,
horizon synthesis and dynamic allocation for UAVs fighting fires,” IEEE
respectively, and the Ph.D. degree in electrical engi-
Access, vol. 6, pp. 78868–78880, 2018.
neering from Auburn University, Auburn, AL, USA,
[61] C. Antoniou, M. Ben-Akiva, and H. N. Koutsopoulos, “Nonlinear
in 2004.
Kalman filtering algorithms for on-line calibration of dynamic traffic
From September 2004 to August 2006, he was a
assignment models,” IEEE Trans. Intell. Transp. Syst., vol. 8, no. 4,
Post-Doctoral Research Associate with the Depart-
pp. 661–670, Dec. 2007.
ment of Electrical and Computer Engineering,
[62] X. Xiong, C. Sun, W. Ni, and X. Wang, “Three-dimensional trajectory
University of Minnesota, Minneapolis, MN, USA.
design for unmanned aerial vehicle-based secure and energy-efficient
In August 2006, he joined the Department of Elec-
data collection,” IEEE Trans. Veh. Technol., vol. 72, no. 1, pp. 664–678,
trical Engineering, Florida Atlantic University, Boca Raton, FL, USA, as an
Jan. 2023.
Assistant Professor, then was promoted to a tenured Associate Professor
[63] A. M. C. Rezende, V. M. Goncalves, and L. C. A. Pimenta, “Constructive
in 2010. He is currently a Distinguished Professor and the Chair of the
time-varying vector fields for robot navigation,” IEEE Trans. Robot.,
Department of Communication Science and Engineering, Fudan University.
vol. 38, no. 2, pp. 852–867, Apr. 2022.
His research interests include stochastic network optimization, energy-efficient
communications, cross-layer design, and signal processing for communi-
cations. He is a member of the Signal Processing for Communications
Shuyan Hu (Member, IEEE) received the B.Eng. and Networking Technical Committee of the IEEE Signal Processing Soci-
degree in electrical engineering from Tongji Uni- ety. He is a Senior Area Editor of IEEE T RANSACTIONS ON S IGNAL
versity, China, in 2014, and the Ph.D. degree P ROCESSING and an Editor of IEEE T RANSACTIONS ON W IRELESS C OM -
in electronic science and technology from Fudan MUNICATIONS . In the past, he served as an Associate Editor for IEEE
University, China, in 2019. She is currently a T RANSACTIONS ON S IGNAL P ROCESSING and IEEE S IGNAL P ROCESSING
Post-Doctoral Research Fellow with the School of L ETTERS, and an Editor for IEEE T RANSACTIONS ON V EHICULAR T ECH -
Information Science and Technology, Fudan Uni- NOLOGY . He is a Distinguished Speaker of the IEEE Vehicular Technology
versity. She was selected by Shanghai Post-Doctoral Society.
Excellence Program in 2019. Her research interests
include machine learning and convex optimizations
and their applications to unmanned aerial vehicle Abbas Jamalipour (Fellow, IEEE) received the
(UAV) networks and intelligent systems. Ph.D. degree in electrical engineering from Nagoya
University, Nagoya, Japan, in 1996. He is currently
a Professor of ubiquitous mobile networking with
Xin Yuan (Senior Member, IEEE) received the The University of Sydney. He has authored nine
B.E. degree from Taiyuan University of Technology, technical books, 11 book chapters, over 550 tech-
Shanxi, China, in 2013, the first Ph.D. degree from nical papers, and five patents, all in the area of
Beijing University of Posts and Telecommunica- wireless communications and networking. He is a
tions (BUPT), Beijing, China, in 2019, and the fellow of the Institute of Electrical, Information, and
second Ph.D. degree from the University of Tech- Communication Engineers (IEICE) and the Institu-
nology Sydney (UTS), Sydney, Australia, in 2020. tion of Engineers Australia, an ACM Professional
She is currently a Senior Research Scientist at Member, and an IEEE Distinguished Speaker. Since 2014, he has been an
CSIRO, Sydney, NSW, Australia. Her research inter- elected member of the Board of Governors of the IEEE Vehicular Technology
ests include machine learning and optimization, and Society. He was a recipient of the number of prestigious awards, such as
their applications to UAV networks and intelligent the 2019 IEEE ComSoc Distinguished Technical Achievement Award in Green
systems. Communications, the 2016 IEEE ComSoc Distinguished Technical Achieve-
ment Award in Communications Switching and Routing, the 2010 IEEE
ComSoc Harold Sobol Award, the 2006 IEEE ComSoc Best Tutorial Paper
Wei Ni (Fellow, IEEE) received the B.E. and Award, and over 15 best paper awards. He has been the General Chair and
Ph.D. degrees in electronic engineering from Fudan the Technical Program Chair of several prestigious conferences, including
University, Shanghai, China, in 2000 and 2005, IEEE ICC, GLOBECOM, WCNC, and PIMRC. He was the President of the
respectively. IEEE Vehicular Technology Society, from 2020 to 2021. Previously, he held
He was a Post-Doctoral Research Fellow with the positions of the Executive Vice-President and the Editor-in-Chief of VTS
Shanghai Jiao Tong University from 2005 to 2008; Mobile World. He was the Vice President-Conferences and a member of the
the Deputy Project Manager with Bell Laborato- Board of Governors of the IEEE Communications Society. He sits on the
ries, Alcatel/Alcatel-Lucent, from 2005 to 2008; and editorial board of IEEE ACCESS and several other journals. He is a member
a Senior Researcher with Devices Research and of the Advisory Board of IEEE I NTERNET OF T HINGS J OURNAL. Since
Development, Nokia, from 2008 to 2009. He is January 2022, he has been the Editor-in-Chief of IEEE T RANSACTIONS
currently a Principal Research Scientist with Com- ON V EHICULAR T ECHNOLOGY . He was also the Editor-in-Chief of IEEE
monwealth Scientific and Industrial Research Organization (CSIRO), Sydney, W IRELESS C OMMUNICATIONS.
Authorized licensed use limited to: DELHI TECHNICAL UNIV. Downloaded on March 12,2025 at 18:43:45 UTC from IEEE Xplore. Restrictions apply.