0% found this document useful (0 votes)

18 views15 pages

Visual Based Moving Target Tracking

This paper presents a novel deep reinforcement learning (DRL)-based approach for visual-based tracking of suspicious unmanned aerial vehicles (UAVs) using a solar-powered fixed-wing UAV. The proposed method optimizes the UAV's trajectory while maintaining energy efficiency and ensuring the target remains within visual range, addressing the challenges of continuous state and action spaces. Experimental results demonstrate that the DRL algorithm significantly outperforms traditional control methods in tracking accuracy and energy management.

Uploaded by

Nithya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views15 pages

Visual Based Moving Target Tracking

Uploaded by

Nithya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 25, NO.

8, AUGUST 2024 9115

Visual-Based Moving Target Tracking With

Solar-Powered Fixed-Wing UAV: A New
Learning-Based Approach
Shuyan Hu , Member, IEEE, Xin Yuan , Senior Member, IEEE, Wei Ni , Fellow, IEEE,
Xin Wang , Fellow, IEEE, and Abbas Jamalipour , Fellow, IEEE

Abstract— The use of legitimate unmanned aerial vehicles efficiency and quietness make a fixed-wing UAV an excellent
(UAVs) to surveil and track misbehaved UAVs can serve a choice for surveillance applications. The suspicious targets can
crucial role in public safety and security. This paper proposes be aerial, e.g., misbehaved or malicious UAVs, due to the
a new deep reinforcement learning (DRL)-based online control
scheme for visual-based UAV-on-UAV tracking and monitoring, proliferation and easy accessibility of UAVs. A UAV monitor
where a solar-powered, fixed-wing UAV tracks a suspicious UAV can record the targets’ misbehaviors and provide evidence for
target by having the target inside its effective visual range. The forensics purposes, facilitating the visual-based surveillance
key idea is a new deep deterministic policy gradient (DDPG)- for many internet-of-things (IoT) and security applications [3],
based model, which can cope with the continuous state and [4]. While most existing drones are powered by batteries and
action spaces of the monitor and learn the optimal acceleration
control policy adapting to the solar power availability and restricted in terms of mission time and batteries, renewable
the target’s movement. The state space is designed to be the energy sources, such as solar power, are increasingly consid-
relative position of the monitor to the target, thereby preventing ered for UAVs. Solar panels can be readily installed on the top
model infeasibility. Experiments show that the new algorithm of fixed-wing UAVs, to power visual-based tracking missions.
can maintain a desired distance from the target, and outperform The trajectory of a UAV monitor needs to be carefully
control- and optimization-based alternatives in terms of energy
efficiency and tracking accuracy. An interesting finding is that planned by taking into account not only the power usage mod-
our algorithm learns faster and better with a constraint of a els of the propulsion, thrust, and hanging of the UAV, but the
minimum allowed battery energy reserve. The reason is that, energy harvesting process as well, as suggested in [5], [6], and
without the constraint, the monitor is more likely to deplete its [7]. As far as we are concerned, there has been no investigation
battery before the end of a surveillance mission. on the flight path planning of a solar-powered fixed-wing drone
Index Terms— UAV-on-UAV visual-based tracking, fixed-wing running a visual-based tailing and monitoring task. This is
UAV, solar power harvesting, online trajectory design, deep because the flight path plan of a fixed-wing drone deems non-
deterministic policy gradient.
trivial due to its relatively poor maneuverability, as compared
I. I NTRODUCTION to a rotary-wing drone.

W ITH eminent maneuverability, quick deployment, and

expansive coverage, unmanned aerial vehicles (UAVs)
have been increasingly employed for nature resilience, disaster
A. Related Work
Many existing studies on visual-based UAV tracking have
rescue, bushfire monitoring, mobile relaying, and eaves- been focused on keeping its target within the UAV monitor’s
dropping [1]. Equipped with optical cameras, UAVs are visual range, without considering the energy consumption and
increasingly utilized to visually tail and monitor moving longevity of the surveillance missions. For instance, visual
targets [2]. Compared to a rotary-wing UAV, the energy information-enhanced control mechanisms were developed
for a UAV monitor to procure the three-dimensional (3D)
Manuscript received 16 August 2022; revised 19 April 2023, 11 September
2023, and 20 March 2024; accepted 3 April 2024. Date of publication Cartesian coordinates of a moving object [8], or tail a mobile
19 April 2024; date of current version 1 August 2024. This work was object [9]. In [9], geometric control was operated along with
supported in part by the National Natural Science Foundation of China nonlinear multi-rank control, suppressing the need for a math-
under Grant 62231010, Grant 62071126, and Grant 62101135; and in part
by the Innovation Program of Shanghai Municipal Science and Technology ematical model of the thrust force, to simplify the executing
Commission under Grant 21XD1400300. The Associate Editor for this article steps. UAV-enabled object tailing was pursued in [10] and [11]
was M. Gao. (Corresponding author: Xin Wang.) to deal with camera view occlusion in a city with tall buildings.
Shuyan Hu and Xin Wang are with the Key Laboratory of EMW Infor-
mation (MoE), Department of Communication Science and Engineering, The flight path of the drone monitor was devised to maximize
Fudan University, Shanghai 200433, China (e-mail: syhu14@fudan.edu.cn; the identification rate that depends on the probability of
xwang11@fudan.edu.cn). having the object inside the sight. In [12], UAV-based target
Xin Yuan and Wei Ni are with the Data61, Commonwealth Scientific and
Industrial Research Organization, Marsfield, NSW 2122, Australia (e-mail: recognition and tracking was proposed by utilizing a smart
xin.yuan@data61.csiro.au; wei.ni@data61.csiro.au). gimbal with precise positioning and quick image analyzing,
Abbas Jamalipour is with the School of Electrical and Computer Engineer- where consensus-based target tailing, dynamic environment
ing, The University of Sydney, Camperdown, NSW 2006, Australia (e-mail:
a.jamalipour@ieee.org). analyzing, and neural network-enabled target sensing were
Digital Object Identifier 10.1109/TITS.2024.3386105 jointly optimized.
1558-0016 © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: DELHI TECHNICAL UNIV. Downloaded on March 12,2025 at 18:43:45 UTC from IEEE Xplore. Restrictions apply.
9116 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 25, NO. 8, AUGUST 2024

On the other hand, a target tracking system by UAVs to generalize these works [17], [18], [19], [20], [21], [22], [23],
equipped with gimbaled cameras was studied in [13]. [24], [25] to capture the power usage of the UAV monitors
The proposed system distributively incorporated a clustering because the system model, UAV dynamics, and application
algorithm, a sensor manager, and an optimal path planner scenario would be different when a specific UAV power
to track multiple mobile targets. An integrated guidance and consumption model is considered.
gimbal control coverage path planning approach was proposed With the consideration of power usage (and especially the
in [14], in which the mobility and gimbal inputs of an solar power), the UAV control variables, i.e., acceleration,
autonomous UAV agent were jointly controlled and optimized velocity, and trajectory, are coupled over time. The trajectory
to achieve full coverage of a given object of interest, according planning becomes a sequential decision process which is chal-
to a specified set of optimality criteria. None of these works lenging, attributed to its considerable state and action spaces
considered energy consumption of the UAV, and all of them and requirement of real-time operation. UAVs were deployed
relied on traditional control methods for UAV path planning. to monitor or track mobile ground targets in [31], [32], and
Machine learning techniques, such as deep learning (DL) [33]. A decentralized 3D navigation rule was developed in [31]
and deep reinforcement learning (DRL) [15], [16], [17], [18], to determine the best position of each UAV, and balance
[19], [20], have been found very useful in visual-based tar- the UAV’s power usage and the number of covered targets.
get tracking [21], to improve the performance of traditional Dynamic programming was utilized in [32] to optimize the
control methods. For instance, the various features of deep UAV path, which minimized the power usage and maximized
convolutional neural networks (CNNs) were exploited in [22] the disguise performance of the surveillance. Receding horizon
to enhance the precision of video-based target tailing. The control was employed in [33] to generate an energy-efficient
features of convolutional layers were interpreted to be a non- path of the solar-powered UAV. The power consumption of
linear corresponding term of a picture pyramid and straightly the UAV monitor was reduced to be linear with respect to its
exploited to represent target objects. It was shown in [23] speed in [31] and [32], and the power requirement of hovering
that basic two-layer CNNs could be sufficient to learn effec- was overlooked in [33]. In [34] and [35], we investigated
tive representations for video-based tailing, without training disguised visual-based target tracking by rotary- and fixed-
beforehand on a large dataset. An underwater target tailing task wing UAVs, respectively. Difference-of-convex programming
of an underwater vehicle was investigated in [24], where DRL was employed to solve the non-convex trajectory design for
was employed to resolve the constructed Markov decision rotary-wing UAVs [34]. Convex optimization was integrated
task under uncertain hydrodynamics. The robustness of target with MPC to produce trajectories for fixed-wing UAVs [35].
detection and tracking was enhanced through radar and video None of the existing works [8], [9], [10], [11], [12], [13],
camera data fusion in [25], especially under bad weather [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24],
conditions. Radar acted as the main role and camera served [25], [26], [27], [28], [29], [30] fully investigated the energy-
as an assistant for real-time environment sensing. A dynamic efficient 3D target tracking by a fixed-wing UAV with solar
occlusion-aware video-based tailing approach was developed power harvesting capability, and compared the performances
in [26] to address the deformation and long-term occlusion of of ML methods with traditional control- or optimization-based
target appearance during tracking. When the target is severely algorithms.
occluded or had been occluded for a long time, it is redetected
by a well-designed classifier chosen from a classifier group, B. Contribution
based on an entropy minimization metric.
DL and DRL have been leveraged to UAV-enabled target This paper presents a new scheme for the online control of
tracking, to distinguish a target from its background, handle video-based UAV-on-UAV target tracking. Specifically, a solar-
the object deformation, address the aspect ratio change (ARC) harvesting fixed-wing UAV installed with video cameras tails
of captured images, and make coarse-to-fine tracking policies. and visually surveils a suspicious UAV target. The monitoring
Autonomous flight control was developed in [27] for path UAV keeps the target inside its view, while staying away from
tailing missions in adversarial situations. A two-player zero- the target to avoid raising the attention of the target. DRL is
sum game was formulated and the best UAV path was obtained employed to train the trajectory of the monitor online.
by DL in real-time. Q-learning was utilized to optimize the The novelty and contributions of the paper are highlighted
UAV flight route for tailing a radio frequency (RF) target in as follows.
a Rayleigh fading channel [28]. A UAV swarm was deployed • A new problem is formulated for online control UAV-
to locate an RF mobile target in [29], where a constrained on-UAV target tracking performed by a solar-harvesting,
Markov decision process (MDP) was constructed to locate the fixed-wing UAV monitor. The problem is a non-convex
target in the presence of channel uncertainties. Multi-agent sequential decision process and can be formulated as a
reinforcement learning (RL) was utilized to coordinate the Markov decision process (MDP) [36]. It is difficult to
several UAVs, avoid excessive UAV flight paths, and perform obtain the optimal solution online for a continuous state
real-time target tracking. In [30], DRL was employed by sequential decision problem, given the large size of the
a UAV for vision-based target recognition and tracking to problem and state space [37]. The problem cannot be
address the ARC problem of captured images and refine the optimally resolved by traditional control or optimization
boundaries of the bounding box, without addressing the UAV methods that would rely on accurate predictions of the
trajectory design during tracking. Yet, it is not straightforward target’s trajectory.

Authorized licensed use limited to: DELHI TECHNICAL UNIV. Downloaded on March 12,2025 at 18:43:45 UTC from IEEE Xplore. Restrictions apply.
HU et al.: VISUAL-BASED MOVING TARGET TRACKING WITH SOLAR-POWERED FIXED-WING UAV 9117

• A DRL framework is developed to learn the trajectory of

the UAV monitor online, and balance between distance
keeping and energy efficiency. The UAV monitor acts as
an agent that decides its per-slot accelerations to track
the target relying on its instantaneous observation of the
target’s flight path, the target-monitor distance, and the
monitor’s battery level. The decisions also cater to its
solar energy harvesting process to avoid battery depletion
during the mission.
• Deep deterministic policy gradient (DDPG) is tailored to
cope with the continuous state and action spaces of the
UAV monitor, i.e., trajectory, velocity, and acceleration,
and learn the optimal control policy of the acceleration.
We design the state space to be the relative position
of the monitor regarding the target, thereby effectively
preventing model infeasibility.
Fig. 1. A solar-harvesting fixed-wing drone monitor operates video surveil-
Extensive simulations demonstrate that the proposed DDPG- lance and tracking on a suspicious rotary-wing drone. The drone monitor
based algorithm possesses an exceptional and reliable tracking devises its flight path according to the route of the drone target, and also
stays at a reasonable distance.
capability of keeping a required distance from the target, while
satisfying the battery constraint of the UAV monitor. It is
shown that the algorithm learns faster and achieves much as will be shown in Section IV. Let bt := (bxt , b yt , h 0 ) and
higher tracking success rates when a requirement of energy q t := (xt , yt , h 1 ) denote the 3D coordinates of the target
reserve is in place. Moreover, the algorithm considerably and monitor per slot t (t = 1, . . . , N ), respectively. V t :=
outperforms traditional control or optimization methods. This (Vxt , Vyt ) and At := (A xt , A yt ) denote the monitor’s speed
is because the DRL-based solution can deal with continuous and acceleration on the (x, y)-plane, respectively. Further let
action spaces, and can be easily scaled to handle large, com- v t be the speed vector of the target at slot t. With a fixed-wing
plex systems with high-dimensional state and action spaces. structure, the monitor obeys the following mobility constraints:
By contrast, traditional control or optimization methods, such
as model predictive control (MPC) and successive convex 1
q t+1 = q t + V t δ + At δ 2 , ∀t, (1a)
approximation (SCA) [34], typically work for problems with 2
discrete action spaces, and become computationally expensive V t+1 = V t + At δ, ∀t, (1b)
as the state and action spaces grow. The traditional methods ∥ At ∥ ≤ Amax , ∀t, (1c)
would also require accurate predictions of the target’s trajec-
Vmin ≤ ∥V t ∥ ≤ Vmax , ∀t, (1d)
tory and may converge to inferior local optima.
This paper is arranged as follows. Sec. II provides the where Amax is the monitor’s largest acceleration; and Vmax and
system model of the solar-harvesting fixed-wing UAV-enabled Vmin are the maximum and minimum speeds of the monitor,
visual-based tracking, and formulates the problem of energy- respectively.
efficient online target tracking. Sec. III articulates our method Per time slot t, the UAV propulsion power consumption is
to tackle the studied problem. The method is numerically modeled as [7]
validated in Sec. IV. Conclusions are drawn in Sec. V. c2 ∥ At ∥2
Pvt = c1 ∥V t ∥3 + 1+ + m AtT V t , (2)
∥V t ∥ g2
II. S YSTEM D ESIGN AND P ROBLEM S TATEMENT
where (·)T stands for transpose; c1 and c2 are two known
Consider a solar-powered fixed-wing UAV monitor for a
coefficients relying on the wing surface region and wing span
UAV-on-UAV visual-based tracking task, which is shown in
ratio; m is the mass of the UAV monitor; and g ≈ 9.8 m/s2 is
Fig. 1. For illustration convenience, the target and the monitor
the gravitational acceleration.
fly at fixed altitudes of h 0 and h 1 (in meters), respectively. The
We note that the mobility model in (1) is generic and
mission duration is T (seconds). It is discretized evenly into
describes the motions of an object with a continuous and
N time slots with δ (seconds) per slot. T = δ N . We assume
differentiable trajectory, such as a fixed-wing UAV, according
that each time slot is short enough that the trajectories of
the monitor and the target can be viewed as sequences of to Newton’s second law [38]. The model can capture and
waypoints, and the accelerations of the monitor and the target regulate the movement of UAVs with fixed wings, including
can be seen as unchanged during a slot. their velocities, accelerations, and curvature radii. On the other
hand, (2) provides the propulsion power of the fixed-wing
UAV, which is nonlinear in the velocity and acceleration of
A. Fixed-Wing UAV Mobility and Power Consumption the UAV (due to the aerodynamics) and makes it challenging
For ease of exposition, we describe the proposed method to control and optimize the trajectory of the UAV.
under a 2D setting; i.e., the UAV flies horizontally. Neverthe- Let Pst (in Watts) stand for the solar power harvesting
less, the method can be readily applied to the 3D scenario, capability of the UAV monitor. Suppose that the monitoring

Authorized licensed use limited to: DELHI TECHNICAL UNIV. Downloaded on March 12,2025 at 18:43:45 UTC from IEEE Xplore. Restrictions apply.
9118 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 25, NO. 8, AUGUST 2024

task is performed on the midday of a sunny or cloudy day. The battery constraint (6) is important, without which the UAV
At slot t, the solar power collected is [39]. monitor could coarsely design its trajectory, consume all the
battery energy at the beginning, and fail for the rest of the
α
5.2561
Pst (z t ) = ηS Pi exp − 1−2.2556 × 10−5 z t . mission (as shown by simulations in Section IV). The studied
cos ϑt problem is cast as
(3)
N
X
Here, z t is the monitor’s altitude; η ∈ (0, 1) and S (in min Pvt ,
{(q t −bt ),v t ,at ,∀t}
m2 ) are the conversion efficiency and the solar panel size, t=1
respectively; Pi (in Watts) is the fixed power intensity of the s.t. (1), (5), (6). (7)
solar beams before entering the clouds; α > 0 is the total
Here, we propose to optimize the relative position of the
gaseous absorption; ϑt ∈ [0, π/2] stands for the solar zenith
UAV monitor with reference to the target, i.e., (q t − bt ),
angle at slot t. At sunrise or sunset, ϑt = π/2. At midday,
as opposed to designing the monitor trajectory, i.e., q t . This
ϑt = 0. The change of ϑt is negligible within 15 minutes.
helps avoid the infeasibility incurred by optimizing the monitor
We assume that ϑt = 0 in this paper w.l.o.g.; i.e., the mission
trajectory directly. Specifically, the range and distribution of
is at midday.
the relative distance and velocity between the drone monitor
While some modern solar devices can capture energy in very
and the target are expected to be reasonably consistent across
low lighting conditions, such as solar cells based on Perovskite
different time slots for an effective tracking mission. In con-
or amorphous silicon [40], [41], they cannot provide enough
trast, the range and distribution of the absolute positions of the
energy to power and sustain a typical UAV’s flight. This is
UAV monitor would change dramatically over time. Moreover,
because these dim-light solar cells operate at low voltage
the optimization of the relative distance and speed contributes
and radiation conditions, yielding an output power in the
to the reduction of the state space; or the absolute position and
magnitude of milliwatt (mW). In practice, dim-light solar cells
velocity of the target would otherwise be part of the state as the
are typically employed to support low-power devices, such as
monitor’s observations. The reduced state space is conducive
Internet-of-things sensors.
to the training and convergence of the proposed DRL model.
Based on (7), a novel DRL-based method is developed to
B. Problem Formulation compute the instantaneous acceleration of the monitor, which
The UAV monitor is installed with a video camera to is the action of the designed control system. The proposed
spot and identify the target of interest, which requires the tracking problem has continuous action and state spaces,
representations of various visual variables, such as resolution and can be solved by designing a DDPG-based method,
and viewpoint [42]. To successfully tail the target, the UAV as described in the next section. While we primarily focus
monitor needs to follow the target within a specific resolution on video input from onboard cameras in this paper, it is worth
range specified by [ pmin , pmax ] and minimize the energy noting that the proposed DRL-based control mechanism for
consumption during the mission time. Here, pmin and pmax are UAV tracking can be adapted to utilize alternative distance
the smallest and largest perceived width/height of the target measurement techniques, such as lidar and radar, to determine
in pixels, respectively. Based on the resolution requirement, the direction and distance of a target.
we can use the triangle similarity to determine the maximum
and minimum allowed monitor-target distances, dmax and dmin III. D EEP D ETERMINISTIC P OLICY G RADIENT
(in meters), as given by [43] (DDPG)-BASED S OLUTION
WT × F This section proposes a DDPG-based method under the
dmax = , (4a) actor-critic framework to solve problem (7). DDPG is
pmin
WT × F known to effectively tackle problems with continuous action
dmin = , (4b) spaces [44], [45]. In contrast, traditional DRL methods, such
pmax
as deep Q-learning, perform poorly and incur divergence under
where WT is the actual width/height of the target, and F is a continuous action space.
the focal length of the onboard video cameras. The monitor
optimizes its trajectory online by controlling its acceleration,
A. Markov Decision Process (MDP)
to keep a certain distance dt = ∥q t − bt ∥, ∀t from the target:
The UAV propulsion power consumption, i.e., the objective
dmin ≤ dt ≤ dmax , ∀t. (5) function in (7), is a non-convex function of the optimization
variables, i.e., the UAV acceleration, velocity, and waypoint.
Let E 0 (in Joules) denote the energy level of the monitor’s
In addition, these variables are tightly coupled in (7) and in
battery at the beginning of the mission. Suppose the monitor
the constraints (1) and (6) over time. As a result, the problem
needs to maintain at least (1 − η0 )E 0 Joules at every moment
is non-convex and challenging to tackle.
in the battery for other functions and contingency plans, i.e.,
On the other hand, the task is a sequential decision process
t
X t
X and can be interpreted as an MDP, since the environment
Pvn δ ≤ Psn δ + η0 E 0 , ∀t. (6) (or state) is observable and the future state only depends on
n=1 n=1 the present state [36]. Specifically, the UAV’s position at the

(t + 1)-th time slot depends on its position, velocity, and can also be sparse. We adopt the max-abs normalization [47]
acceleration at the t-th slot; see (1a). The UAV’s velocity at to pre-process the state. The max-abs normalization does not
the (t +1)-th time slot depends on its velocity and acceleration destroy the original data distribution, and can re-scale the
at the t-th slot; see (1b). However, it is difficult to procure the value range of the data. Moreover, we perform the max-
optimal policy online for this MDP problem, due to the con- abs normalization separately for the relative distance and the
tinuous state space of the problem [37]. The problem cannot relative velocity, to avoid the loss of the state information due
be optimally solved using traditional control or optimization to substantial differences in values.
methods that would rely on accurate predictions of the target’s In practice, the target motion estimation using a camera
trajectory, which may not be possible in practice. can be influenced by the time-varying monitor-target dis-
We start by constructing an MDP for the considered target tance, illumination condition, background characteristics, and
tracking problem and defining the state, action, and reward of target appearance. Yet, some commercially available UAVs
the drone monitor per time slot [46]. can achieve a high-resolution view and an accurate target
• State Space S: The current system state st ∈ S consists motion estimation with a camera array, such as Intel RealSense
of the relative position, q t − bt , and the relative velocity, D435i [48]. Yet, this type of camera may suffer from tem-
V t − v t , of the UAV monitor with regards to the target. perature change and noisy data when used outdoor. The
• Action Space A: Define A := {at , ∀t = 1, · · · , N } to accuracy of their distance estimation can be as good as within
gather all potential actions. The current action at is the 0.28 meter [48]. We assume that the motion estimation of the
acceleration value of the UAV monitor, At , constrained target is reasonably accurate using such cameras and visual-
by (1c)–(1d). The values of the action are constrained aided methods.
to the range of [−1, 1] (m/s2 ). Given the initial location In the reward, the “tanh(·)” function is selected due to its
and velocity of the monitor, its future waypoints q t and smooth and bounded nature. It can normalize each of the
velocities V t are decided by the accelerations, i.e., by (1a) penalty terms, barrier the reward function, stabilize training,
and (1b). and facilitate the convergence of the algorithm. Cd1 , Cd2 and
• Policy: A policy, denoted by π : S → A, is a mapping Ce can be fine-tuned to account for the relative importance
from the state space, S, to the action space, A. In other or priority of each of the penalty terms. Moreover, the reward
words, given a state s ∈ S, the policy determines a dis- function can be applied under cloudy weather conditions. For a
tribution π(a|s) = Pr (at = a|st = s) over state s ∈ S. particular time slot n, Psn follows (3) when the UAV monitor
• Experience: Define et = (st , at , rt , st+1 ) as an experi- is exposed under the sun and harvests the solar power. Psn
ence, which is stored in a replay memory R. takes zero when the UAV monitor flies under a cloud and its
• Reward rt : The reward function offers non-negative harvested solar power diminishes.
rewards at each time slot (or step) if the monitor-target Moreover, the shape and size of the target in an image
distance is within [dmin , dmax ] and the onboard battery depend on the image-capturing direction, and can change
level is no lower than (1 − η0 )E 0 Joules; or incurs during the tracking process. The proposed algorithm has the
penalties otherwise. The reward function is defined as potential to maintain a consistent view of the target object’s
shape and size by keeping the image capturing direction/angle
rt = 1 + Cd1 tanh (dt − dmin ) + Cd2 tanh (dmax − dt ) within a predefined range relative to the target’s movement
| {z }
for distance-keeping direction while maintaining a safe distance between the drone
X t X t
! and the target. This can be potentially achieved by introducing
+ Ce tanh η0 E 0 + Psn δ − Pvn δ , (8) a new term in the reward function defined in (8) that incor-
n=1 n=1 porates the image capturing angle. The new term can provide
| {z } a reward when the angle is within the specified range, or a
for battery constraint
penalty when it falls outside of it.
where Cd1 , Cd2 and Ce (Cd1 , Cd2 , Ce ≤ 1) are config- The agent senses the present state st , executes a legitimate
urable coefficients that can be tuned during the learning action at , receives a reward rt , and evolves to state st+1 .
process; and “tanh(·)” is used to scale the rewards and A policy at = π(st ) projects st to at . The agent chooses
ensure that all the rewards are in similar magnitudes. the
P N strategy achieving the largest accumulated reward Rt =
n=t γ
n−t r with γ ∈ (0, 1) denoting the discount coefficient.
The state space and reward function are particularly t
designed for problem (7). In particular, we optimize the With the determined state st , action at and the strategy π(·), Rt
relative position of the monitor to the target, as opposed to is evaluated by an action-value function, i.e., the Q-function,
directly designing the monitor’s trajectory. This is because the as given by
range and distribution of the relative distance and velocity are
expected to be reasonably consistent across different time slots Q π (st , at ) = Eπ [Rt |st , at ], (9)
for an effective tracking mission. In contrast, a direct design
of the monitor’s trajectory would suffer from substantially The action-value function, Q π (st , at ), meets the following
changing value range and distribution of the absolute positions Bellman Expectation Equation:
of the UAV monitor. On the other hand, the state st consists of
Q π (st ,at ) = Ert ,st+1 ∼E rt +γ Eat+1 ∼π Q π (st+1 ,at+1 ) .

the relative distance and velocity, the two of which can have
substantially different ranges and distributions. A state vector (10)

Authorized licensed use limited to: DELHI TECHNICAL UNIV. Downloaded on March 12,2025 at 18:43:45 UTC from IEEE Xplore. Restrictions apply.
9120 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 25, NO. 8, AUGUST 2024

Fig. 2. The architecture of the proposed DDPG-based UAV tracking system, where the training network and the target network each comprise an actor
network and a critic network. The experience replay buffer provides batches of samples of state transitions to train and update the networks.

Here, E represents the interacting environment. When this and chooses an action at . Exploration noises are attached to
given policy, denoted by µπ : S → A, is deterministic, (10) the action to balance new action exploration and known action
can be rewritten as exploitation. The output action is given by
Q µπ (st , at ) = Ert ,st+1 ∼E [rt at = µ (st ; θa ) + Nt , (12)
+γ Q µπ (st+1 , µπ (st+1 )) .

(11)
where Nt is an Ornstein-Uhlenbeck (OU) noise process that
In general, it is difficult to apply directly an RL algorithm to generates temporally correlated exploration in the physics
handle the MDP or obtain the Q-value, Q(st , at ), because of domain to improve exploration efficiency [50]. As the result
the continuous state and action spaces, and the uncertain target of the action at , the monitor is rewarded with rt and evolves
trajectory. We design a DDPG-based algorithm to control the to state st+1 . It also stores (st , at , rt , st+1 ) in R.
UAV monitor’s trajectory, as described in the following. The training-critic network evaluates the action-value func-
tion of the executed action at , that is, Q µ (st , µ(st ; θa ); θc ).
B. Actor-Critic Framework-Based DDPG By taking a recorded transition (si , ai , ri , si+1 ) at random from
In the DDPG-based network, four DNN approximators are R, the action-value function generated at the training-critic
used, including training-actor and training-critic networks, network is approximately evaluated as Q µ (si , µ(si ; θa ); θc ).
and target-actor and target-critic networks, as illustrated by We assume a probability distribution of θa , denoted by
Fig. 2. The training-actor network, represented by µ (st ; θa ), J (θa ), for policy estimation. To improves the strategy fastest,
approaches the strategy of the UAV monitor and generates the training-actor network is refreshed based on the gradient
actions. θa stands for the parameters of the training-actor net- of J (θa ). The gradient is given by [44].
work. The training-critic network, denoted by Q µ (st , at ; θc ),
∇θa J (θa ) = Es∼ρ µ ∇θa µ(st ; θa )∇a Q µ (st , a; θc )|a=µ(st ;θa ) ,

estimates the action-value function of the actions [44]. θc
stands for the model parameters of the training-critic network. (13)
The target-actor network, represented by µ′ st ; θa′ , and the
where ρ µ stands for a discounted state distribution of
target-critic network, represented by Q ′µ′ (st , at ; θc′ ), generate
strategy µ(st ; θa ) [45]; ∇θa µ(s) provides the gradient of
the target Q-value for learning the training-actor and training-
the training-actor network µ(s) regarding the parameter θa ;
critic networks. θa′ and θc′ stand for the model parameters of
∇a Q µ (st , a; θa ) stands for the gradient of Q µ (st , a; θa )
the target-actor and target-critic networks, respectively.
regarding action a.
Relying on the actor-critic setting, the DDPG network
By randomly drawing Nbatch sampled historical transitions
abides by the deterministic policy gradient (DPG) theo-
from R, ∇θa J (θa ) is approximated by
rem [44] to refresh θa , θc , θa′ and θc′ , and improve the
actions. The adoption of the target network (containing the NX
batch
1
∇θa J (θa ) ≈ ∇θa µ(si )∇a Q µ (si , a; θc )|a=µ(si ) .

target-actor and target-critic networks) helps address an issue Nbatch
of oscillating operation stemming from employing only a i=1
training network [49]. We consider fully connected neural (14)
networks (FCNNs) with two hidden layers for each training-
The model parameter of the training-actor network, θa ,
actor, training-critic, target-actor and target-critic network.
is refreshed based on the gradient ascent [51]:
The UAV monitor (i.e., the agent) passes its current state
st to the training-actor network. In the DPG theorem [44], NX
ηa batch
θa ← θa + ∇θa µ(si )∇a Q µ (si , a; θc )|a=µ(si ) .

the training-actor network yields explicitly the present tactics
Nbatch
by definitely mapping a state into an action. The training- i=1
actor network approaches the strategy function of the agent (15)

Here, ηa stands for the learning rate of the training-actor Algorithm 1 DDPG-based UAV-based tracking
network. 1 Initialization: Randomly initialize training-actor network µ
The training-critic network is refreshed by achieving the and training-critic network Q µ with weights θa and θc ,
minimum loss, as given by target-actor network µ′ and target-critic network Q ′µ′ with
h 2 i weights θa′ ← θa and θc′ ← θc , and R.
L(θc ) = Est ∼ρ µ ,at ∼µ(st ;θa ) Q µ (st , at ; θc ) − yt . (16) 2 for episode = 1, · · · , Tep do
3 Initialize the relative locations and velocities between the
Here, yt = rt + γ Q ′µ′ st+1 , µ′ st+1 ; θa′ ; θc′ is the tar-

UAV monitor and the target as the initial state s0 .
get Q-value yielded by the target network according to 4 for timestep = 1, · · · , Ts do
Select an action at1 = µ (st ; θa ) + Nt .
(st , at , rt , st+1 ). Here, the parameters of the target-actor and 5
6 Operate action at1 , procure reward rt and transits to
target-critic networks, θa′ and θc′ , provide declined copies of next state st+1 .
θa and θc , respectively. 7 Store current evolution (st , at , rt , st+1 ) in R.
With the Nbatch stochastically sampled evolutions, the loss 8 Stochastically draw Nbatch sampled historical
function, L(θc ), is approximately evaluated by evolutions (si , ai , ri , si+1 ) from R.
9 Refresh the target Q-value:
NX
batch yi = ri + γ Q ′µ′ si+1 , µ′ si+1 ; θa′ ; θc′ .

1 h 2 i
L(θc ) ≈ Q µ (si , µ(si ; θa ); θc ) − yi . (17) 10 Calculate the lossh function: L(θc ) ≈
Nbatch i
i=1 1 P Nbatch Q (s , µ(s ; θ ); θ ) − y 2 .
Nbatch i=1 µ i i a c i
Here, yi = ri + γ Q ′µ′ si+1 , µ′ si+1 ; θa′ ; θc′ approximates the

11 Update the critic network by minimizing the loss
target Q-value from the target network according to Nbatch function: P h 2 i
Nbatch
samples drawn randomly from the replay memory. Differenti- 12 minθc N 1 i=1 Q µ (si , µ(si ; θa ); θc ) − yi .
batch
ating L(θc ) with respect to θc , we obtain the gradient as 13 Update the actor network by the sampled policy
gradient:
NX
batch 14 ∇θa J (θa ) ≈
2
∇θc L(θc ) ≈ Q µ (si , µ(si ; θa ); θc ) − yi 1 P Nbatch ∇ µ s )∇ Q (s , a; θ )|

Nbatch i=1 θa ( i a µ i c a=µ(si ) ;
Nbatch
i=1 15 θa ← θa + ηa ∇θa J (θa )
·∇θc Q µ (si , µ(si ; θa ); θc ) . Update the target-actor and target-critic networks:

(18) 16
17 θa′ ← τa θa + (1 − τa )θa′ ,
The parameter of the training-critic network, θc , is refreshed 18 θc′ ← τc θc + (1 − τc )θc′ .
utilizing the stochastic gradient descent method [51].
The target-actor and target-critic networks evolve from
the training-actor and training-critic networks based on the
following rule: input and output sizes of fully connected layer l [52].
The mini-batch gradient ascent has the time complex-
θa′ ← τa θa + (1 − τa )θa′ , ity of Tbgd = O (Nbatch /ϵ0 ), where ϵ0 is the accuracy
θc′ ← τc θc + (1 − τc )θc′ , (19) requirement to terminate the iterations [53]. Therefore,
the time-complexity of Algorithm 1 is T f c + Tbgd =
where τa and τc are decaying rates for the training-actor and P2
training-critic networks, respectively. O F l · Fl + N
l=1 in out /ϵ
batch 0 .
The proposed DDPG-based UAV-target tailing approach is The optimality of the proposed DDPG-based algorithm can
summarized in Algorithm 1. The algorithm contains model be established by proving that the DDPG algorithm satisfies
initialization, model training, and model updates. We first ran- the conditions under the optimality of the policy gradient (PG)
domly initialize the four networks, including the training-actor methods. It was established in [54] that the optimality gap of
and training-critic networks, and target-actor and target-critic PG is bounded by
networks, and the experience replay buffer. The relative loca-
tions and velocities of the UAV monitor and target are the L(πθ ) − min L(π )
π∈5
input state of the algorithm. Given the present state of the κρ h u i
≤− min c⟨∇θ L(θ ), v − θ⟩ + ∥v − θ∥2 , (20)
UAV monitor, the action is produced through model training. 1 − γ v∈2 2
The model parameters are updated with historical transitions
in which L(·) is the loss function. κρ is the concentrability
from the replay memory until the maximum cumulative reward
coefficient defined for the class of cost-to-go functions Jθ =
is obtained. Finally, the monitor executes the action, changes
{Jπθ : θ ∈ 2}, which is the smallest scalar to satisfy
its state, and decides whether to stop the training.
κρ
∥J −J ∗ ∥1,ρ ≤ ∥J − T J ∥1,ρ , ∀J ∈ Jθ , (21)
1−γ
C. Complexity and Optimality
where T : J → J is the Bellman optimality operator:
To evaluate the time-complexity of Algorithm 1, we con- Z
sider the time-complexity of a FCNN with two hidden (T J )(s) ≜ min g(s, a) + J (s ′ )P(ds ′ |s, a) . (22)
layers, and the mini-batch gradient ascent (14). The time- a∈A
complexity of the FCNN with two hidden layers is The optimality gap holds when the following conditions
P2 l l l l
Tfc = O l=1 Fin · Fout , where Fin and Fout are the are met.

Authorized licensed use limited to: DELHI TECHNICAL UNIV. Downloaded on March 12,2025 at 18:43:45 UTC from IEEE Xplore. Restrictions apply.
9122 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 25, NO. 8, AUGUST 2024

Condition 1: For any θ ∈ 2, the functions θ̄ 7→ IV. E XPERIMENT AND R ESULTS

B(θ̄|ηπθ , Jπθ ) and θ̄ 7 → B(θ|ηπθ̄ , Jπθ ) are continuously dif- This section provides Python experimental results to cor-
ferentiable on an open set containing θ. roborate the merits of the proposed method. We include
Condition 2: For each π ∈ 52 , there exists π + ∈ 52 , the reward value during learning to show the convergence
such that B(π + |ηπ , Jπ ) = minπ ′ ∈5 B(π ′ |ηπ , Jπ ). of the DRL algorithm. We test the learned policy under
Condition 3: For any π ∈ 52 , the function B(θ|ηπ , Jπ ) different target trajectories and compare it with an MPC-based
has no suboptimal stationary points. approach to further justify the advantage of DRL methods
Condition 4: For each π ∈ 52 , there exist constants c > over traditional control- and optimization-based approaches.
0 and u ≥ 0, such that the function B(θ|ηπ , Jπ ) satisfies the We also consider the scenarios with and without the battery
following property: constraint on the UAV monitor to show the effect of energy-
aware target tracking. The details of the schemes, scenarios,
min B(θ ′ |ηπ , Jπ ) and simulation results are provided in the ensuing sections.
θ ′ ∈2
≥ B(θ|ηπ , Jπ )
h u ′ i A. Experiment Settings
+ min c⟨∇B(θ|ηπ , Jπ ), θ ′
− θ⟩ + ∥θ − θ∥2
. (23)
θ ′ ∈2 2 The actor networks are implemented by FCNNs with two
hidden layers and their learning rates are 10−4 . The first
Conditions 1 and 3 imply that the loss function L is contin- layer has 32 neurons. The second layer has 64 neurons. The
uously differentiable with ∇ L(θ) = ∇θ̄ B(θ̄|ηπθ , Jπθ )|θ̄ =θ , θ ∈ output layer of the actor networks takes the tanh(·) activation
2 presents a stationary point of L(·) if L(πθ ) = L(π ∗ ) [54, function that bounds the output actions within [−1, 1]. The
Lemma 6], and B(θ|ηπθ , Jπθ ) = minθ̄ ∈2 B(θ̄|ηπθ , Jπθ ). Con- critic networks use the FCNNs with two hidden layers and
dition 4 is the gradient-dominance property of the loss their learning rates are 10−3 . Both hidden layers use the ReLU
function, which guarantees quick global convergence of first- activation functions. The first layer has 32 neurons. The second
order approaches, even with non-convex objectives. layer has 64 neurons. The OU noise is added to the actor
We confirm that the loss function in (17) satisfies the four policy, sampled from the OU process with parameters µ = 0,
conditions for the following reasons: θ = 0.15, and σ = 0.25. The training of the policy network
• The UAV propulsion power in (2), the harvested solar is done on a server with an Nvidia Tesla P100 SXM2 16GB
power in (3), and the monitor-target distance in (5) are all GPU. Table I collates the hyperparameters.
continuously differentiable functions defined in a closed We include the performance of the proposed scheme under
set, hence satisfying Conditions 1 to 3. a practical parameter setting to verify its effectiveness for real-
• The state and action spaces of problem (7) are compact world applications. In particular, we set the power conversion
and lie in a closed set, as the states and actions are con- efficiency and size of the solar panel as η = 0.2 and
strained by the UAV’s highest acceleration and velocity, S = 0.5 m2 , respectively [55]. We consider a UAV of
thus meeting Condition 2. 10 kg inclusive of a battery [56]. Typically, a solar panel of
• The distance keeping constraint and battery constraint 1 m2 weighs about 2.8 kg [57]. Therefore, when the UAV is
in (9) are lower- and upper-bounded linear, quadratic, equipped with a solar panel of 0.5 m2 , its weight is 11.4 kg.
or smooth functions that are continuously differentiable Interested readers can refer to [58] for an in-depth study on
and satisfy the gradient-dominance property. Hence, Con- the solar power management system of a UAV. The other
dition 4 is satisfied. parameters regarding the monitor’s power and the solar power
• Moreover, the training network consists of feedforward are listed in Table II.
FCNNs. Their activation functions are twice continu- The maximum monitor-target distance for effective target
ously differentiable, for example, Rectified Linear Units detection depends on the focal length of the UAV camera,
(ReLUs) and Sigmoid, hence satisfying Conditions 1 where a longer focal length results in a larger distance. It was
and 2. experimentally demonstrated in [12] that a UAV-based visual
tracking system can detect and recognize a moving target
Therefore, the solution obtained by the proposed method is
within a distance of 40 meters. In light of [12], we set the
bounded by a finite optimality gap.
maximum and minimum monitor-target distances to 40 and
The proposed DRL-based method offers the advantage of
2 meters, respectively.
both offline training and online testing. Offline training boasts
computational efficiency, quicker convergence, and optimal
utilization of available data. While the initial parameter setting B. Results of Policy Learning
during offline training is important, online testing enables the In the training process, the average reward for the i-th
agent to adapt and refine its policy in real-time. Specifically, training episode, denoted by r̄i , is calculated by taking the
online testing facilitates continuous improvement of the policy average of the step reward P from the first episode up to the
by evaluating its performance and making adjustments on-the- i-th episode, i.e., r̄i = 1i ij=1 r j , i = 1, · · · , Tep , where r j
fly based on real-time feedback. To this end, the combination is the step reward for the j-th training episode; see (8).
of offline training and online testing empowers the DRL- The proposed model is trained using the “random-
based method to effectively address the challenges of dynamic linear” target trajectory (along the x-axis) for each episode.
environments. We assume that the target travels along the x-axis at an initial

TABLE I
S UMMARY OF H YPERPARAMETERS

TABLE II
T HE PARAMETERS FOR F IXED -W ING UAV P ROPULSION P OWER AND
H ARVESTED S OLAR P OWER
Fig. 4. Trajectory, velocity and acceleration of the best control policy for
the UAV monitor (Episode 1,067).

Fig. 5. Monitor-target distance, energy consumption and battery level of the

best control policy for the UAV monitor (Episode 1,067).

battery constraint, and reaches the maximum reward at the

1,067th episode. The average reward gradually increases and
reaches its peak around the 1,400th episode, as the UAV con-
trol policy changes with a randomly generated target trajectory
in each episode. Furthermore, the reward function (8) takes an
important role in the learned policy. The parameters Cd1 , Cd2
and Ce have to be carefully tuned to allow the UAV monitor
Fig. 3. Episode and average rewards with and w/o the battery constraint. to learn the desired behavior.
Fig. 4 plots the trajectory, velocity and acceleration of the
velocity of v 0 = [15, 0] m/s. A time-varying acceleration control policy learned at the 1,067th episode for the UAV
B t := (Bxt , B yt ), ∀t, is randomly chosen from [−1, 1] for monitor over time under the “random-linear” target trajectory.
each time step, i.e., B t ∈ U (−1, 1), where U (·) denotes the Fig. 5 shows the corresponding monitor-target distance, energy
uniform distribution. The UAV monitor has an initial velocity consumption of the monitor, and the residual battery level
of V 0 = [15, 15] m/s. The total scheduling period is T = 120 s over time. As shown in both figures, the UAV monitor can
with 0.05 s per slot. The performance of the drone monitor quickly adapt its control policy (i.e., the acceleration) to the
without (w/o) the battery constraint is examined as a baseline. random target trajectory, effectively track the target and keep
Fig. 3(a) plots both the per-episode and the average rewards a reasonable distance from it. The UAV monitor can maintain
of our algorithm. For comparison, Fig. 3(b) plots the per- the required amount of energy in the battery for future use.
episode reward and the average reward w/o the battery
constraint. Fig. 3 demonstrates that both the performances of
our proposed algorithm with and w/o the battery constraint C. Test Results of Learned Policy
improve over time. With the battery constraint, the proposed The network parameters obtained at the 1,067th training
algorithm performs better and learns much faster than w/o episode are selected to test the learned policy. With the

Authorized licensed use limited to: DELHI TECHNICAL UNIV. Downloaded on March 12,2025 at 18:43:45 UTC from IEEE Xplore. Restrictions apply.
9124 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 25, NO. 8, AUGUST 2024

slots (steps). No exploration noise is added during the testing

process.
As the state-of-the-art online trajectory plan technique,
the MPC-based online optimization combines rigorous and
efficient convex optimization-based trajectory generation with
control-based bias/error mitigation (cf. [33], [60] using numer-
ical heuristics to generate trajectories). The MPC-based online
optimization approach is implemented every time slot τ by
first predicting the target trajectory {bτ +t , ∀t = 1, · · · , N }.
As typically considered, the Kalman filtering technique is
adopted to forecast the target trajectory relying on the past
observations [61]. Then, the MPC plans the monitor trajectory
for N steps ahead. Nevertheless, only the first step of the
devised trajectory is executed. The forecast horizon keeps
Fig. 6. Linear target trajectory (along the x-axis) with accelerations randomly being shifted forward at each (new) time slot τ + 1.
chosen from [−1, 1] and/or [−3, 3] m/s2 . In Figs. 6–8, the tested trajectories of the UAV monitor are
indicated by blue solid lines under DDPG and orange dash-
dot lines under MPC, and the target trajectories are indicated
by green dash lines. It is seen that the new algorithm can
successfully follow the target under all three target trajectories,
even when the time-varying acceleration is randomly chosen
from [−3, 3]; see Figs. 6 and 7. The trajectories of the UAV
and target do not match perfectly (or overlap); see Figs. 6
and 8. This is because the UAV keeps the target in its sight
for effective tracking and maintains a reasonable distance from
the target, rather than heading directly towards the target.
In Figs. 6–8, we consider η = 0.1 and S = 0.4 or 0.5 m2 to
show the effectiveness of the DDPG-based algorithm under
different energy harvesting capabilities, and understand the
impact of the energy conversion rate and solar panel size on
the tracking. Clearly, the UAV monitor with a stronger energy
Fig. 7. Random target trajectory with accelerations randomly chosen from
[−1, 1] m/s2 and/or [−3, 3] m/s2 .
conversion capability η and a larger size of solar panel S can
have a higher probability of successful tracking.
We consider η = 0.2 and S = 0.5 m2 when plotting the
MPC-based algorithm. We do not include η = 0.1 (and S =
0.4 or 0.5 m2 ) because the MPC-based algorithm is prone
to infeasibility under this setting. Specifically, the MPC plans
the UAV monitor’s trajectory based on the estimation of the
target’s trajectory over longer time horizons. The estimation
can be poor at the beginning of tracking, due to the lack of
past observations. As a consequence, the trajectory planning
can be heavily penalized, leading to a need for excessively
high energy for drastic adjustments of the trajectory in the
first several decision rounds. When η is small, i.e., η = 0.1,
the energy available is likely to fail to meet the need, and the
MPC-based approach can become infeasible.
Fig. 9 plots the monitor battery level under three types of
Fig. 8. Sinusoidal target trajectory with accelerations randomly chosen from target trajectories under the DDPG and MPC algorithms, when
[−1, 1] m/s2 . (η, S) = (0.2, 0.5). We see from Figs. 6–8 and 9, that the
UAV takes different flight paths under the MPC scheme, and
learning results from Section IV-B, we test the proposed consumes more energy than under the DDPG-based algorithm.
energy-efficient DDPG-based tracking algorithm under three This is due to inaccurate predictions of the target trajectory,
different types of target trajectories over 100 episodes, namely, and the suboptimal control policy resulting from the SCA.
the “random-linear”, “random”, and “sinusoidal” target trajec- Figs. 10 and 11 plot the changes in the monitor’s bat-
tories, as shown in Figs. 6–8. The model parameters of the tery level and the monitor-target distance during tracking,
“random-linear” target trajectory are set up consistently with respectively, where the efficiency and size of the solar panel
those in Section IV-B. The UAV monitor has an initial velocity are (η, S) = (0.1, 0.4). Fig. 10 shows that, when (η, S) =
of V 0 = [15, 15] m/s. Each episode consists of 6,000 time (0.1, 0.4), the monitor’s battery maintains the initially charged

Fig. 9. Monitor battery level under three types of target trajectories by the DDPG and MPC algorithms when η = 0.2 and S = 0.5.

Fig. 10. Monitor battery level under three types of target trajectories when η = 0.1 and S = 0.4 m2 .

Fig. 11. The distance-keeping performance of the UAV monitor under three types of target trajectories when η = 0.1 and S = 0.4 m2 .

level of 750 Joules under the “random” target trajectories Figs. 12 and 13, where the solar panel efficiency and size are
with B t ∈ U (−1, 1). This indicates that the harvested solar set to (η, S) = (0.2, 0.5). The UAV acceleration is 3D and
energy supports the UAV monitor effectively. In the rest of can change continuously along each of the x-, y-, and z-axes
the scenarios studied in Fig. 10, the battery level drops below under the proposed method. By default, we set the acceleration
500 Joules, indicating the insufficiency of the harvested solar within [−1, 1] m/s2 along each of the x-, y-, and z-axes.
energy for UAV propulsion. Fig. 11 shows that the average It is seen that the proposed DDPG-based algorithm outper-
monitor-target distance increases quickly to about 20 m first, forms the MPC-based algorithm under the sinusoidal target
and then stabilizes around 20 m during tracking. trajectory, providing more effective tracking and better energy
We show that the proposed scheme can be readily extended efficiency.
from the 2D UAV target tracking to the 3D. In this case, with We note that controlling a fixed-wing UAV through its
slight abuse of notation, the constraints of 3D mobility in (1) 3D acceleration is feasible due to the inherent connection
concern 3D variables, including the UAV waypoint q t := between acceleration and the key aerodynamic forces involved
(xt , yt , z t ), velocity V t := (Vxt , Vyt , Vzt ), and acceleration in flight. By influencing the UAV’s acceleration, one indirectly
At := (A xt , A yt , A zt ). In addition, the maximum pitch angle manages the interplay of thrust, lift, drag, and weight, which
constraint has to be satisfied when the drone is ascending or are fundamental to flight dynamics. The throttle control, linked
descending, i.e., Vzt /∥V t ∥ ≤ ϑ, where ϑ is the sine value of to the propulsion system, governs the UAV’s longitudinal
the largest allowable pitch angle of the drone. The propulsion acceleration by adjusting thrust. Elevator control influences the
power usage for the UAV 3D flight is [62] pitch, affecting the balance between lift and weight for changes
in altitude. Ailerons control roll, adjusting the distribution of
c2 (Vxt2 + Vyt2 ) c2 ∥ At ∥2 2c2 A zt lift between wings, facilitating turns. Rudder control manages
Pvt = c1 ∥V t ∥3 + + +
∥V t ∥3 g 2 ∥V t ∥ g∥V t ∥ yaw, enabling coordinated turns by manipulating the aircraft’s
+ m(gVzt + At V t ).
T
(24) heading. These control inputs collectively shape the forces
and moments acting on the UAV. The ability to control a
We plot the 3D UAV trajectory and the corresponding fixed-wing UAV through its 3D acceleration can leverage these
battery level under the proposed scheme in a 3D scenario in aerodynamic principles to achieve responsive flight control.
Authorized licensed use limited to: DELHI TECHNICAL UNIV. Downloaded on March 12,2025 at 18:43:45 UTC from IEEE Xplore. Restrictions apply.
9126 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 25, NO. 8, AUGUST 2024

TABLE III
C OMPARISON OF T RACKING C APABILITY B ETWEEN THE DDPG-BASED
A LGORITHMS W ITH AND W / O BATTERY C ONSTRAINT U NDER THE
“ RANDOM - LINEAR ” TARGET T RAJECTORY

Fig. 12. A demonstration of the 3D trajectories generated, when

(η, S) = (0.2, 0.5).

Fig. 13. Monitor-target distance and battery level under 3D random and
sinusoidal trajectories of the UAV monitor.

D. Comparison With Learned Policy w/o Battery Constraint

As discussed in Section IV-B, our DDPG-based energy-
efficient algorithm with the battery constraint helps improve
the learning efficiency of the UAV monitor. Now, we further
test the tracking capability of the proposed algorithm either
with or w/o the battery constraint, using the learned policy in
Section IV-B. Different sizes of the solar panel are considered
to gauge the impact of the inflow energy volume on the pro-
posed DDPG-based algorithm and its resulting UAV trajectory. TABLE IV
A typical flexible, lightweight solar panel is about 3 kg/m2 . C OMPARISON OF T RACKING C APABILITY B ETWEEN THE DDPG-BASED
Tables III and IV list the successful tracking probabil- A LGORITHMS W ITH AND W / O BATTERY C ONSTRAINT U NDER THE
“ RANDOM ” TARGET T RAJECTORY
ity (STP) of the proposed DDPG-based algorithm with and
w/o the battery constraint (6) under the “random-linear” and
“random” target trajectories, respectively, over 100 testing
episodes. We see that the proposed algorithm offers a higher
STP with the battery constraint (6) than it does w/o the battery
constraint, under both target trajectories. This is due to the fact
that, with the battery constraint (6), the UAV monitor carefully
designs its trajectory to maintain the minimum energy reserve.
In contrast, without (6), the UAV monitor consumes all battery
energy at an early stage and fails the mission.
The UAV monitor has higher STPs when the solar panel has
a higher efficiency η or a larger size S, since the monitor can
harvest more solar energy and improve flexibility in trajectory
planning. The STP also increases with a decrease of the target
randomness and/or an increase of the monitor’s initial velocity,
i.e., B t ∈ U (−1, 1) and/or V 0 = [25, 0] m/s, as compared to
B t ∈ U (−3, 3) and/or V 0 = [15, 0] m/s. This is because
the monitor enjoys higher flexibility in the trajectory design
when Bt ∈ U (−1, 1) and/or V0 = [25, 0] m/s. In addition,
Table IV also shows that when (η, S) = (0.1, 0.3), the STP
of the monitor is zero. This failed target tracking is the result
of an energy deficiency, as the monitor consumes all energy under the “random” target trajectory with B t ∈ [−3, 3], over
and falls before catching up with the target. 100 realizations of the random target trajectories. By com-
Table V summarizes the STP of the UAV monitor by the paring Tables IV and V, we see that the new DDPG-based
MPC-based online control with and w/o the battery constraint algorithm outperforms the MPC-based control in terms of

TABLE V
C OMPARISON OF T RACKING C APABILITY B ETWEEN THE MPC-BASED
C ONTROL A LGORITHMS W ITH AND W / O BATTERY C ONSTRAINT
U NDER THE “ RANDOM ” TARGET T RAJECTORY

successful target tracking. This is because the feasibility of the

SCA-based convexification in the MPC-based control can be Fig. 14. Snapshots from the CoppeliaSim simulator. The recordings of the
impacted or restrained by the initial trajectory of the monitor, CoppeliaSim simulations are at https://drive.google.com/drive/folders/1Qoiq_
uWEzG0ATP77diZc I1R0DHOCkKFb?usp=drive_link.
resulting in less effective UAV control and a higher probability
of mission failure.
By comparing the simulation results in [34, Figs. 9, 10, & V. C ONCLUSION
13], it is observed that the rotary-wing UAV monitor maintains This paper proposed a novel resolution framework to the
the maximum monitor-target distance for most of the time, online control of UAV-on-UAV visual-based tracking per-
as opposed to a moderate monitor-target distance kept by a formed by a solar-powered UAV monitor with fixed wings.
fixed-wing counterpart, as shown in Fig. 11. In addition, the A DRL framework was developed to learn the tracking tra-
UAV monitor with rotary wings consumes much more energy jectory on-the-fly, balancing tracking performance and energy
than its counterpart with fixed wings. This again corroborates efficiency. DDPG was tailored to learn the optimal control
the merits of the proposed DDPG-based algorithm over the strategy for the monitor based on its continuous state and
existing MPC-based control. action spaces. Extensive experiments demonstrated that our
We also implement the proposed algorithm in a popular, DDPG-based algorithm can keep a required distance from
computer-based robotic simulator, i.e., CoppeliaSim robot the target for effective visual tracking, while satisfying the
simulator, to demonstrate the effectiveness of real-world UAV battery constraint of the UAV monitor. The new algorithm
tracking. CoppeliaSim can be regarded as a software-the outperformed traditional control and optimization methods in
in-the-loop (SITL) verification tool and has been broadly power efficiency and tracking accuracy.
used to demonstrate novel designs and concepts of UAV The proposed DRL-based approach could result in a limited
applications [63]. We connect Python with the CoppeliaSim performance gain when the target has a drastically changing
simulator and implement our proposed method by training trajectory or takes countermeasures against the monitor. Inter-
the DDPG model embedded in CoppeliaSim using the reward esting research directions that are worth future investigation
function (8), the relative monitor-target distance-based state include an integrated optimization framework with control and
space, and the acceleration-based action space. The Cop- machine learning techniques, and transformer-based learning
peliaSim simulator provides the instantaneous locations of the models for UAV target tracking.
target and monitor, based on which the relative target-monitor
distance is obtained to evaluate (8).
The flight trajectories of the UAV and the target in the Cop- R EFERENCES
peliaSim simulator are available at https://drive.google.com/ [1] K. Li, W. Ni, X. Wang, R. P. Liu, S. S. Kanhere, and S. Jha, “Energy-
drive/folders/1Qoiq_uWEzG0ATP77diZcI1R0DHOCkKFb? efficient cooperative relaying for unmanned aerial vehicles,” IEEE Trans.
Mobile Comput., vol. 15, no. 6, pp. 1377–1386, Jun. 2016.
usp=drive_link, with a screenshot provided in Fig. 14. A small
[2] H. Huang and A. V. Savkin, “An algorithm of reactive collision free 3-D
simulation area is displayed due to a small default acceleration deployment of networked unmanned aerial vehicles for surveillance and
range of [−0.1,0.1] m/s2 in the simulator. Nevertheless, the monitoring,” IEEE Trans. Ind. Informat., vol. 16, no. 1, pp. 132–140,
simulation test can be proportionally projected onto large Jan. 2020.
[3] X. Yuan, Z. Feng, W. Ni, R. P. Liu, J. A. Zhang, and W. Xu, “Secrecy
areas if larger acceleration ranges, e.g., [−1,1] m/s2 , are performance of terrestrial radio links under collaborative aerial eaves-
considered. We will extend and test the proposed scheme dropping,” IEEE Trans. Inf. Forensics Security, vol. 15, pp. 604–619,
for various application scenarios and hardware-in-the-loop 2020.
[4] S. Hu, X. Yuan, W. Ni, and X. Wang, “Trajectory planning of cellular-
(HITL) and/or SITL simulators, such as Gazebo and AirSim, connected UAV for communication-assisted radar sensing,” IEEE Trans.
in our future works. Commun., vol. 70, no. 9, pp. 6385–6396, Sep. 2022.

Authorized licensed use limited to: DELHI TECHNICAL UNIV. Downloaded on March 12,2025 at 18:43:45 UTC from IEEE Xplore. Restrictions apply.
9128 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 25, NO. 8, AUGUST 2024

[5] S. Hu, Q. Wu, and X. Wang, “Energy management and trajectory [27] L. R. G. Carrillo and K. G. Vamvoudakis, “Deep-learning tracking
optimization for UAV-enabled legitimate monitoring systems,” IEEE for autonomous flying systems under adversarial inputs,” IEEE Trans.
Trans. Wireless Commun., vol. 20, no. 1, pp. 142–155, Jan. 2021. Aerosp. Electron. Syst., vol. 56, no. 2, pp. 1444–1459, Apr. 2020.
[6] C. Sun, W. Ni, and X. Wang, “Joint computation offloading and [28] M. M. U. Chowdhury, F. Erden, and I. Guvenc, “RSS-based Q-learning
trajectory planning for UAV-assisted edge computing,” IEEE Trans. for indoor UAV navigation,” in Proc. IEEE Mil. Commun. Conf. (MIL-
Wireless Commun., vol. 20, no. 8, pp. 5343–5358, Aug. 2021. COM), Nov. 2019, pp. 121–126.
[7] Y. Zeng and R. Zhang, “Energy-efficient UAV communication with [29] Y. Chen, D. Chang, and C. Zhang, “Autonomous tracking using a swarm
trajectory optimization,” IEEE Trans. Wireless Commun., vol. 16, no. 6, of UAVs: A constrained multi-agent reinforcement learning approach,”
pp. 3747–3760, Jun. 2017. IEEE Trans. Veh. Technol., vol. 69, no. 11, pp. 13702–13717, Nov. 2020.
[8] L. Zhang et al., “Vision-based target three-dimensional geolocation using [30] W. Zhang, K. Song, X. Rong, and Y. Li, “Coarse-to-fine UAV target
unmanned aerial vehicles,” IEEE Trans. Ind. Electron., vol. 65, no. 10, tracking with deep reinforcement learning,” IEEE Trans. Autom. Sci.
pp. 8052–8061, Oct. 2018. Eng., vol. 16, no. 4, pp. 1522–1530, Oct. 2019.
[9] X. Zhang, Y. Fang, X. Zhang, J. Jiang, and X. Chen, “A novel geometric [31] H. Huang and A. V. Savkin, “Reactive 3D deployment of a flying robotic
hierarchical approach for dynamic visual servoing of quadrotors,” IEEE network for surveillance of mobile targets,” Comput. Netw., vol. 161,
Trans. Ind. Electron., vol. 67, no. 5, pp. 3840–3849, May 2020. pp. 172–182, Oct. 2019.
[32] H. Huang, A. Savkin, and W. Ni, “A method for covert video surveillance
[10] V. Shaferman and T. Shima, “Unmanned aerial vehicles cooperative
of a car or a pedestrian by an autonomous aerial drone via trajectory
tracking of moving ground target in urban environments,” J. Guid.,
planning,” in Proc. IEEE ICCAR, Singapore, Apr. 2020, pp. 1–3.
Control, Dyn., vol. 31, no. 5, pp. 1360–1371, Sep. 2008.
[33] Y. Huang, H. Wang, and P. Yao, “Energy-optimal path planning for
[11] H. Yu, K. Meier, M. Argyle, and R. W. Beard, “Cooperative path solar-powered UAV with tracking moving ground target,” Aerosp. Sci.
planning for target tracking in urban environments using unmanned air Technol., vol. 53, pp. 241–251, Jun. 2016.
and ground vehicles,” IEEE/ASME Trans. Mechatronics, vol. 20, no. 2, [34] S. Hu, W. Ni, X. Wang, A. Jamalipour, and D. Ta, “Joint optimization of
pp. 541–552, Apr. 2015. trajectory, propulsion, and thrust powers for covert UAV-on-UAV video
[12] S. Wang, F. Jiang, B. Zhang, R. Ma, and Q. Hao, “Development of tracking and surveillance,” IEEE Trans. Inf. Forensics Security, vol. 16,
UAV-based target tracking and recognition systems,” IEEE Trans. Intell. pp. 1959–1972, 2021.
Transp. Syst., vol. 21, no. 8, pp. 3409–3422, Aug. 2020. [35] S. Hu, W. Ni, X. Wang, and A. Jamalipour, “Disguised tailing and video
[13] N. Farmani, L. Sun, and D. J. Pack, “A scalable multitarget track- surveillance with solar-powered fixed-wing unmanned aerial vehicle,”
ing system for cooperative unmanned aerial vehicles,” IEEE Trans. IEEE Trans. Veh. Technol., vol. 71, no. 5, pp. 5507–5518, May 2022.
Aerosp. Electron. Syst., vol. 53, no. 4, pp. 1947–1961, Aug. 2017. [36] S. S. Baek, H. Kwon, J. A. Yoder, and D. Pack, “Optimal path
[14] S. Papaioannou, P. Kolios, T. Theocharides, C. G. Panayiotou, and planning of a target-following fixed-wing UAV using sequential decision
M. M. Polycarpou, “Integrated guidance and gimbal control for cover- processes,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., Nov. 2013,
age planning with visibility constraints,” IEEE Trans. Aerosp. Electron. pp. 2955–2962.
Syst., vol. 59, no. 2, pp. 1276–1291, Apr. 2023. [37] P. Morere, R. Marchant, and F. Ramos, “Sequential Bayesian optimiza-
[15] S. Hu, X. Chen, W. Ni, E. Hossain, and X. Wang, “Distributed tion as a POMDP for environment monitoring with UAVs,” in Proc.
machine learning for wireless communication networks: Techniques, IEEE Int. Conf. Robot. Autom. (ICRA), May 2017, pp. 6381–6388.
architectures, and applications,” IEEE Commun. Surveys Tuts., vol. 23, [38] F. Phillip, “Newton’s second law of motion,” Phys. Today, vol. 60, no. 6,
no. 3, pp. 1458–1493, 3rd Quart., 2021. p. 28, 2007.
[16] X. Yuan, S. Hu, W. Ni, R. P. Liu, and X. Wang, “Joint user, channel, [39] G. S. Aglietti, S. Redi, A. R. Tatnall, and T. Markvart, “Harnessing
modulation-coding selection, and RIS configuration for jamming resis- high-altitude solar power,” IEEE Trans. Energy Convers., vol. 24, no. 2,
tance in multiuser OFDMA systems,” IEEE Trans. Commun., vol. 71, pp. 442–451, Jun. 2009.
no. 3, pp. 1631–1645, Mar. 2023. [40] C. Chen, J. Chang, K. Chiang, H. Lin, S. Hsiao, and H. Lin, “Perovskite
[17] X. Yuan, W. Ni, M. Ding, K. Wei, J. Li, and H. V. Poor, “Amplitude- photovoltaics for dim-light applications,” Adv. Funct. Mater., vol. 25,
varying perturbation for balancing privacy and utility in federated no. 45, pp. 7064–7070, Dec. 2015.
learning,” IEEE Trans. Inf. Forensics Security, vol. 18, pp. 1884–1897, [41] K. Yoshikawa et al., “Silicon heterojunction solar cell with interdigitated
2023. back contacts for a photoconversion efficiency over 26%,” Nature
[18] Y. He, M. Yang, Z. He, and M. Guizani, “Computation offloading Energy, vol. 2, no. 5, p. 17032, Mar. 2017.
and resource allocation based on DT-MEC-assisted federated learn- [42] Z. Zheng, T. Ruan, Y. Wei, Y. Yang, and T. Mei, “VehicleNet: Learning
ing framework,” IEEE Trans. Cognit. Commun. Netw., vol. 9, no. 6, robust visual representation for vehicle re-identification,” IEEE Trans.
pp. 1707–1720, Dec. 2023. Multimedia, vol. 23, pp. 2683–2693, 2021.
[19] Y. He, X. Zhong, Y. Gan, H. Cui, and M. Guizani, “A DDPG hybrid of [43] C. J. Hsu, M.-C. Lu, and Y.-Y. Lu, “Distance and angle measurement
graph attention network and action branching for multi-scale end-edge- of objects on an oblique plane based on pixel number variation of CCD
cloud vehicular orchestrated task offloading,” IEEE Wireless Commun., images,” IEEE Trans. Instrum. Meas., vol. 60, no. 5, pp. 1779–1794,
vol. 30, no. 4, pp. 147–153, Aug. 2023. May 2011.
[44] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller,
[20] Y. He, M. Yang, Z. He, and M. Guizani, “Resource allocation based
“Deterministic policy gradient algorithms,” in Proc. 31st Int. Conf. Int.
on digital twin-enabled federated learning framework in heteroge-
Conf. Mach. Learn., vol. 32, 2014, pp. I-387–I-395.
neous cellular network,” IEEE Trans. Veh. Technol., vol. 72, no. 1,
[45] T. P. Lillicrap et al., “Continuous control with deep reinforcement
pp. 1149–1158, Jan. 2023.
learning,” in Proc. ICLR, 2016, pp. 1–14.
[21] S. M. Marvasti-Zadeh, L. Cheng, H. Ghanei-Yakhdan, and S. Kasaei, [46] M. L. Puterman, Markov Decision Processes: Discrete Stochastic
“Deep learning for visual tracking: A comprehensive survey,” IEEE Dynamic Programming. New York, NY, USA: Wiley, 2014.
Trans. Intell. Transp. Syst., vol. 23, no. 5, pp. 3943–3968, May 2022. [47] J. Han and M. Kamber, Data Mining. Concepts and Techniques.
[22] C. Ma, J.-B. Huang, X. Yang, and M.-H. Yang, “Robust visual tracking San Mateo, CA, USA: Morgan Kaufmann, 2006.
via hierarchical convolutional features,” IEEE Trans. Pattern Anal. [48] (Sep. 2023). Intel RealSense Depth Camera D435i. [Online]. Available:
Mach. Intell., vol. 41, no. 11, pp. 2709–2723, Nov. 2019. https://www.intelrealsense.com/depth-camera-d435i/?magento_session_
[23] K. Zhang, Q. Liu, Y. Wu, and M.-H. Yang, “Robust visual tracking via id=c5a2edff5b296e750d607cdd314bbc5f
convolutional networks without training,” IEEE Trans. Image Process., [49] Y. Hou, L. Liu, Q. Wei, X. Xu, and C. Chen, “A novel DDPG method
vol. 25, no. 4, pp. 1779–1792, Apr. 2016. with prioritized experience replay,” in Proc. IEEE Int. Conf. Syst., Man,
[24] Y. Wang et al., “Target tracking control of a biomimetic underwater Cybern. (SMC), Oct. 2017, pp. 316–321.
vehicle through deep reinforcement learning,” IEEE Trans. Neural Netw. [50] G. E. Uhlenbeck and L. S. Ornstein, “On the theory of the Brownian
Learn. Syst., vol. 33, no. 8, pp. 3741–3752, Aug. 2022. motion,” Phys. Rev., vol. 36, pp. 823–841, Sep. 1930, doi: 10.1103/Phys-
[25] Z. Liu et al., “Robust target recognition and tracking of self-driving cars Rev.36.823.
with radar and camera information fusion under severe weather condi- [51] R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, “Policy gradient
tions,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 7, pp. 6640–6653, methods for reinforcement learning with function approximation,” in
Jul. 2022. Proc. NIPS, 1999, pp. 1057–1063.
[26] X. Dong, J. Shen, D. Yu, W. Wang, J. Liu, and H. Huang, “Occlusion- [52] G. Serpen and Z. Gao, “Complexity analysis of multilayer perceptron
aware real-time object tracking,” IEEE Trans. Multimedia, vol. 19, no. 4, neural network embedded into a wireless sensor network,” Proc. Com-
pp. 763–771, Apr. 2017. put. Sci., vol. 36, pp. 192–197, Jan. 2014.

[53] R. Johnson and T. Zhang, “Accelerating stochastic gradient descent using Australia. He is also a Conjoint Professor with The University of New South
predictive variance reduction,” in Proc. 26th Int. Conf. Adv. Neural Inf. Wales, an Adjunct Professor with the University of Technology Sydney, and an
Process. Syst., vol. 26, 2013, pp. 315–323. Honorary Professor with Macquarie University. He has coauthored one book,
[54] J. Bhandari and D. Russo, “Global optimality guarantees for policy ten book chapters, more than 300 journal articles, more than 100 conference
gradient methods,” 2019, arXiv:1906.01786. papers, 26 patents, and ten standard proposals accepted by IEEE, and three
[55] G. R. Chandra Mouli, P. Bauer, and M. Zeman, “System design for a technical contributions accepted by ISO. His research interests include 6G
solar powered electric vehicle charging station for workplaces,” Appl. security and privacy, machine learning, stochastic optimization, and their
Energy, vol. 168, pp. 434–443, Apr. 2016. applications to system efficiency and integrity. He served as the Secretary
[56] P. J. Zarco-Tejada, R. Diaz-Varela, V. Angileri, and P. Loudjani, and then the Vice-Chair and Chair of the IEEE VTS NSW Chapter from
“Tree height quantification using very high resolution imagery acquired 2015 to 2022, the Track Chair for VTC-Spring 2017, the Track Co-Chair for
from an unmanned aerial vehicle (UAV) and automatic 3D photo- IEEE VTC-Spring 2016, the Publication Chair for BodyNet 2015, and the
reconstruction methods,” Eur. J. Agronomy, vol. 55, pp. 89–99, Student Travel Grant Chair for WPMC 2014. He also serves as a Technical
Apr. 2014. Expert at Standards Australia in support of the ISO Standardization of AI and
[57] M. Zinaddinov, S. Mil’shtein, and D. Kazmer, “Design of light-weight Big Data. He has been an Editor of IEEE T RANSACTIONS ON W IRELESS
solar panels,” in Proc. IEEE 46th Photovolt. Spec. Conf. (PVSC), C OMMUNICATIONS, since 2018; IEEE T RANSACTIONS ON V EHICULAR
Jun. 2019, pp. 0582–0587. T ECHNOLOGY, since 2022; and IEEE T RANSACTIONS ON I NFORMATION
[58] J.-K. Shiau, D.-M. Ma, P.-Y. Yang, G.-F. Wang, and J. H. Gong, “Design F ORENSICS AND S ECURITY and IEEE C OMMUNICATIONS S URVEYS AND
of a solar power management system for an experimental UAV,” IEEE T UTORIALS, since 2024.
Trans. Aerosp. Electron. Syst., vol. 45, no. 4, pp. 1350–1360, Oct. 2009.
[59] A. Kokhanovsky, “Optical properties of terrestrial clouds,” Earth-Sci.
Xin Wang (Fellow, IEEE) received the B.Sc. and
Rev., vol. 64, nos. 3–4, pp. 189–241, Feb. 2004.
M.Sc. degrees in electrical engineering from Fudan
[60] J. A. Shaffer, E. Carrillo, and H. Xu, “Hierarchal application of receding
University, Shanghai, China, in 1997 and 2000,
horizon synthesis and dynamic allocation for UAVs fighting fires,” IEEE
respectively, and the Ph.D. degree in electrical engi-
Access, vol. 6, pp. 78868–78880, 2018.
neering from Auburn University, Auburn, AL, USA,
[61] C. Antoniou, M. Ben-Akiva, and H. N. Koutsopoulos, “Nonlinear
in 2004.
Kalman filtering algorithms for on-line calibration of dynamic traffic
From September 2004 to August 2006, he was a
assignment models,” IEEE Trans. Intell. Transp. Syst., vol. 8, no. 4,
Post-Doctoral Research Associate with the Depart-
pp. 661–670, Dec. 2007.
ment of Electrical and Computer Engineering,
[62] X. Xiong, C. Sun, W. Ni, and X. Wang, “Three-dimensional trajectory
University of Minnesota, Minneapolis, MN, USA.
design for unmanned aerial vehicle-based secure and energy-efficient
In August 2006, he joined the Department of Elec-
data collection,” IEEE Trans. Veh. Technol., vol. 72, no. 1, pp. 664–678,
trical Engineering, Florida Atlantic University, Boca Raton, FL, USA, as an
Jan. 2023.
Assistant Professor, then was promoted to a tenured Associate Professor
[63] A. M. C. Rezende, V. M. Goncalves, and L. C. A. Pimenta, “Constructive
in 2010. He is currently a Distinguished Professor and the Chair of the
time-varying vector fields for robot navigation,” IEEE Trans. Robot.,
Department of Communication Science and Engineering, Fudan University.
vol. 38, no. 2, pp. 852–867, Apr. 2022.
His research interests include stochastic network optimization, energy-efficient
communications, cross-layer design, and signal processing for communi-
cations. He is a member of the Signal Processing for Communications
Shuyan Hu (Member, IEEE) received the B.Eng. and Networking Technical Committee of the IEEE Signal Processing Soci-
degree in electrical engineering from Tongji Uni- ety. He is a Senior Area Editor of IEEE T RANSACTIONS ON S IGNAL
versity, China, in 2014, and the Ph.D. degree P ROCESSING and an Editor of IEEE T RANSACTIONS ON W IRELESS C OM -
in electronic science and technology from Fudan MUNICATIONS . In the past, he served as an Associate Editor for IEEE
University, China, in 2019. She is currently a T RANSACTIONS ON S IGNAL P ROCESSING and IEEE S IGNAL P ROCESSING
Post-Doctoral Research Fellow with the School of L ETTERS, and an Editor for IEEE T RANSACTIONS ON V EHICULAR T ECH -
Information Science and Technology, Fudan Uni- NOLOGY . He is a Distinguished Speaker of the IEEE Vehicular Technology
versity. She was selected by Shanghai Post-Doctoral Society.
Excellence Program in 2019. Her research interests
include machine learning and convex optimizations
and their applications to unmanned aerial vehicle Abbas Jamalipour (Fellow, IEEE) received the
(UAV) networks and intelligent systems. Ph.D. degree in electrical engineering from Nagoya
University, Nagoya, Japan, in 1996. He is currently
a Professor of ubiquitous mobile networking with
Xin Yuan (Senior Member, IEEE) received the The University of Sydney. He has authored nine
B.E. degree from Taiyuan University of Technology, technical books, 11 book chapters, over 550 tech-
Shanxi, China, in 2013, the first Ph.D. degree from nical papers, and five patents, all in the area of
Beijing University of Posts and Telecommunica- wireless communications and networking. He is a
tions (BUPT), Beijing, China, in 2019, and the fellow of the Institute of Electrical, Information, and
second Ph.D. degree from the University of Tech- Communication Engineers (IEICE) and the Institu-
nology Sydney (UTS), Sydney, Australia, in 2020. tion of Engineers Australia, an ACM Professional
She is currently a Senior Research Scientist at Member, and an IEEE Distinguished Speaker. Since 2014, he has been an
CSIRO, Sydney, NSW, Australia. Her research inter- elected member of the Board of Governors of the IEEE Vehicular Technology
ests include machine learning and optimization, and Society. He was a recipient of the number of prestigious awards, such as
their applications to UAV networks and intelligent the 2019 IEEE ComSoc Distinguished Technical Achievement Award in Green
systems. Communications, the 2016 IEEE ComSoc Distinguished Technical Achieve-
ment Award in Communications Switching and Routing, the 2010 IEEE
ComSoc Harold Sobol Award, the 2006 IEEE ComSoc Best Tutorial Paper
Wei Ni (Fellow, IEEE) received the B.E. and Award, and over 15 best paper awards. He has been the General Chair and
Ph.D. degrees in electronic engineering from Fudan the Technical Program Chair of several prestigious conferences, including
University, Shanghai, China, in 2000 and 2005, IEEE ICC, GLOBECOM, WCNC, and PIMRC. He was the President of the
respectively. IEEE Vehicular Technology Society, from 2020 to 2021. Previously, he held
He was a Post-Doctoral Research Fellow with the positions of the Executive Vice-President and the Editor-in-Chief of VTS
Shanghai Jiao Tong University from 2005 to 2008; Mobile World. He was the Vice President-Conferences and a member of the
the Deputy Project Manager with Bell Laborato- Board of Governors of the IEEE Communications Society. He sits on the
ries, Alcatel/Alcatel-Lucent, from 2005 to 2008; and editorial board of IEEE ACCESS and several other journals. He is a member
a Senior Researcher with Devices Research and of the Advisory Board of IEEE I NTERNET OF T HINGS J OURNAL. Since
Development, Nokia, from 2008 to 2009. He is January 2022, he has been the Editor-in-Chief of IEEE T RANSACTIONS
currently a Principal Research Scientist with Com- ON V EHICULAR T ECHNOLOGY . He was also the Editor-in-Chief of IEEE
monwealth Scientific and Industrial Research Organization (CSIRO), Sydney, W IRELESS C OMMUNICATIONS.

Authorized licensed use limited to: DELHI TECHNICAL UNIV. Downloaded on March 12,2025 at 18:43:45 UTC from IEEE Xplore. Restrictions apply.

10.1515 - Nleng 2022 0299
No ratings yet
10.1515 - Nleng 2022 0299
19 pages
Nam Joshi 2017 Unmanned Aerial Vehicle Localization Using Distributed Sensors
No ratings yet
Nam Joshi 2017 Unmanned Aerial Vehicle Localization Using Distributed Sensors
8 pages
Remotesensing 12 03035
No ratings yet
Remotesensing 12 03035
28 pages
UAV Target Tracking & Recognition
No ratings yet
UAV Target Tracking & Recognition
6 pages
Applsci 13 11320
No ratings yet
Applsci 13 11320
27 pages
Visual Camouflage and Online Trajectory Planning For Unmanned Aerial Vehicle-Based Disguised Video Surveillance
No ratings yet
Visual Camouflage and Online Trajectory Planning For Unmanned Aerial Vehicle-Based Disguised Video Surveillance
10 pages
Fastadaptivegroundtargettracking UAV
No ratings yet
Fastadaptivegroundtargettracking UAV
6 pages
UAV Object Detection & Tracking
No ratings yet
UAV Object Detection & Tracking
11 pages
UAV Trajectory Optimization For Large-Scale and Low-Power Data Collection An Attention-Reinforced Learning Scheme
No ratings yet
UAV Trajectory Optimization For Large-Scale and Low-Power Data Collection An Attention-Reinforced Learning Scheme
16 pages
From Stationary To Nonstationary UAVs Deep-Learnin
No ratings yet
From Stationary To Nonstationary UAVs Deep-Learnin
13 pages
Telecom 04 00024
No ratings yet
Telecom 04 00024
18 pages
Vision-Based Navigation Techniques For Unmanned Aerial Vehicles Review and Challenges
No ratings yet
Vision-Based Navigation Techniques For Unmanned Aerial Vehicles Review and Challenges
41 pages
Autonomous Drone Hunter Operating by Deep Learning
No ratings yet
Autonomous Drone Hunter Operating by Deep Learning
18 pages
Timely Data Collection For UAV-based IoT Networks A Deep Reinforcement Learning Approach
No ratings yet
Timely Data Collection For UAV-based IoT Networks A Deep Reinforcement Learning Approach
13 pages
3rd CTA-DLR Workshop
No ratings yet
3rd CTA-DLR Workshop
10 pages
Vision-Based Target Detection and Localization Via A Team of Cooperative Uav and Ugvs
No ratings yet
Vision-Based Target Detection and Localization Via A Team of Cooperative Uav and Ugvs
12 pages
Unmanned Aerial Vehicles (UAVs) - Practical Aspects, Applications, Open Challenges, Security Isseues and Future Trends
No ratings yet
Unmanned Aerial Vehicles (UAVs) - Practical Aspects, Applications, Open Challenges, Security Isseues and Future Trends
29 pages
Image Based Visual Servo Control UAV
No ratings yet
Image Based Visual Servo Control UAV
8 pages
System Namierzania Punktowego Sygnatury Termicznej
No ratings yet
System Namierzania Punktowego Sygnatury Termicznej
16 pages
Air TO Air UAV DETECTION
No ratings yet
Air TO Air UAV DETECTION
3 pages
Tie 2018 2807401
No ratings yet
Tie 2018 2807401
10 pages
Vision-Based Anti-UAV Detection
No ratings yet
Vision-Based Anti-UAV Detection
12 pages
Drones in Action A Comprehensive Analysis of Drone
No ratings yet
Drones in Action A Comprehensive Analysis of Drone
22 pages
Electronics 12 04928
No ratings yet
Electronics 12 04928
19 pages
Memory-Based Deep Reinforcement Learning For Obstacle Avoidance in UAV With Limited Environment Knowledge
No ratings yet
Memory-Based Deep Reinforcement Learning For Obstacle Avoidance in UAV With Limited Environment Knowledge
12 pages
Moving Target Tracking by Unmanned Aerial Vehicle A Survey and Taxonomy
No ratings yet
Moving Target Tracking by Unmanned Aerial Vehicle A Survey and Taxonomy
13 pages
Drones 08 00622
No ratings yet
Drones 08 00622
46 pages
2022 Classification of UAVs Utilizing Fixed Boundary Empirical Wavelet Sub-Bands of RF Fingerprints and Deep Convolutional Neural Network
No ratings yet
2022 Classification of UAVs Utilizing Fixed Boundary Empirical Wavelet Sub-Bands of RF Fingerprints and Deep Convolutional Neural Network
9 pages
Version 9
No ratings yet
Version 9
15 pages
Adaptive Sensing Scheme Using Naive Bayes Classification For Environment Monitoring With Drone
No ratings yet
Adaptive Sensing Scheme Using Naive Bayes Classification For Environment Monitoring With Drone
12 pages
Vision Based Learning For Drones A Survey Final
No ratings yet
Vision Based Learning For Drones A Survey Final
21 pages
Kali Linux Wireless Pentesting and Security For Beginners
No ratings yet
Kali Linux Wireless Pentesting and Security For Beginners
83 pages
DONE - 2020 - A - UAV-Assisted - Data - Collection - For - Wireless - Sensor - Networks - Autonomous - Navigation - and - Scheduling
No ratings yet
DONE - 2020 - A - UAV-Assisted - Data - Collection - For - Wireless - Sensor - Networks - Autonomous - Navigation - and - Scheduling
15 pages
Detection Tracking and Classification of
No ratings yet
Detection Tracking and Classification of
9 pages
RF-Enabled Deep-Learning-Assisted Drone Detection and Identification - An End-To-End Approach
No ratings yet
RF-Enabled Deep-Learning-Assisted Drone Detection and Identification - An End-To-End Approach
18 pages
UAV Obstacle Avoidance Review
No ratings yet
UAV Obstacle Avoidance Review
21 pages
Drone Speed Finder
No ratings yet
Drone Speed Finder
16 pages
Applsci 10 05064 v2
No ratings yet
Applsci 10 05064 v2
27 pages
Visual Object Tracking Based On The Motion Predict
No ratings yet
Visual Object Tracking Based On The Motion Predict
29 pages
Multi-Agent Reinforcement Learning Aided Intelligent UAV Swarm For Target Tracking
No ratings yet
Multi-Agent Reinforcement Learning Aided Intelligent UAV Swarm For Target Tracking
15 pages
A Survey of Computer Vision Methods For
No ratings yet
A Survey of Computer Vision Methods For
38 pages
A Real-Time UAV Object Detection System Design With FPGA Implementation
No ratings yet
A Real-Time UAV Object Detection System Design With FPGA Implementation
6 pages
A Survey On Vision Based UAV Navigation
No ratings yet
A Survey On Vision Based UAV Navigation
13 pages
A Survey On Vision-Based UAV Navigation
No ratings yet
A Survey On Vision-Based UAV Navigation
14 pages
Path Planning For Moving Target Tracking by Fixed-Wing UAV
No ratings yet
Path Planning For Moving Target Tracking by Fixed-Wing UAV
26 pages
Sensor Amien To
No ratings yet
Sensor Amien To
18 pages
10 3390@drones3030058
No ratings yet
10 3390@drones3030058
14 pages
RKT Based
No ratings yet
RKT Based
33 pages
Aerial Surveillance of Public Areas With Autonomous Track and Follow Using Image Processing
No ratings yet
Aerial Surveillance of Public Areas With Autonomous Track and Follow Using Image Processing
4 pages
Survey Paper
No ratings yet
Survey Paper
14 pages
Electronics 12 03664
No ratings yet
Electronics 12 03664
21 pages
Infrared and Visible Camera Integration For Detection
No ratings yet
Infrared and Visible Camera Integration For Detection
25 pages
Drone Detection Using Visual Analysis
No ratings yet
Drone Detection Using Visual Analysis
2 pages
UAv Detection Complex Backgrounds and Rainy Conditions
No ratings yet
UAv Detection Complex Backgrounds and Rainy Conditions
9 pages
Design and Development of The Hardware For Vision Based UAV Autopilot
No ratings yet
Design and Development of The Hardware For Vision Based UAV Autopilot
6 pages
Vision-Based UAV Detection and Tracking Using Deep Learning and Kalman Filterr
No ratings yet
Vision-Based UAV Detection and Tracking Using Deep Learning and Kalman Filterr
13 pages
Real-Time Obstacle Detection and Tracking For Sense-And-Avoid Mechanism in Uavs
No ratings yet
Real-Time Obstacle Detection and Tracking For Sense-And-Avoid Mechanism in Uavs
13 pages
Vision Based Systems For UAV Applications: Aleksander Nawrat Zygmunt Kus
100% (1)
Vision Based Systems For UAV Applications: Aleksander Nawrat Zygmunt Kus
348 pages
Chi Tinhdang2013
No ratings yet
Chi Tinhdang2013
6 pages
JNTUH R09 Percentage & Credits Calculator-ECE.
No ratings yet
JNTUH R09 Percentage & Credits Calculator-ECE.
15 pages
батареи
No ratings yet
батареи
4 pages
25 9 2 - 4 (20 )
No ratings yet
25 9 2 - 4 (20 )
13 pages
Flow Behind A Bluff Body: Hands On Computer Session: (Time: 1 H 30') Objects
No ratings yet
Flow Behind A Bluff Body: Hands On Computer Session: (Time: 1 H 30') Objects
10 pages
The VLF Listener's Handbook
100% (1)
The VLF Listener's Handbook
45 pages
Colour Therapy
67% (3)
Colour Therapy
18 pages
Math Interpolation Assignment
No ratings yet
Math Interpolation Assignment
3 pages
1st Linear Algebra Exam
No ratings yet
1st Linear Algebra Exam
4 pages
Eval Exam 3
No ratings yet
Eval Exam 3
8 pages
Development and Validation of Dissolution Procedures
No ratings yet
Development and Validation of Dissolution Procedures
7 pages
Class Test 52 - Hints & Solutions - Lakshya JEE AIR O1 (2026)
No ratings yet
Class Test 52 - Hints & Solutions - Lakshya JEE AIR O1 (2026)
4 pages
Ims Cat September
No ratings yet
Ims Cat September
2 pages
Practice Probems EM-IV IT-New
No ratings yet
Practice Probems EM-IV IT-New
39 pages
Lab 4 Calorimetry Lab
No ratings yet
Lab 4 Calorimetry Lab
6 pages
2019 Heffernan Exam 1
No ratings yet
2019 Heffernan Exam 1
16 pages
BODY OF THE BOOK-Statics PDF
No ratings yet
BODY OF THE BOOK-Statics PDF
110 pages
Project Report GPC12
No ratings yet
Project Report GPC12
84 pages
Book-1 Physcis 312 Senior Secondary
0% (1)
Book-1 Physcis 312 Senior Secondary
376 pages
Examination Timetable
No ratings yet
Examination Timetable
4 pages
Radiation Oncology Management Decisions 2002 (Perez)
50% (2)
Radiation Oncology Management Decisions 2002 (Perez)
773 pages
Foot and Ankle Radiology 1st Edition Robert A. Christman 2024 Scribd Download
100% (8)
Foot and Ankle Radiology 1st Edition Robert A. Christman 2024 Scribd Download
84 pages
5th Grade Science Course Syllabus 2011 - 2012
No ratings yet
5th Grade Science Course Syllabus 2011 - 2012
3 pages
Banana
No ratings yet
Banana
3 pages
Adhesive Restorations, Centric Relation, and The Dahl Principle - Minimally Invasive Approaches To Localized Anterior Tooth Erosion PDF
No ratings yet
Adhesive Restorations, Centric Relation, and The Dahl Principle - Minimally Invasive Approaches To Localized Anterior Tooth Erosion PDF
15 pages
Appel Maths For Physicists PDF
100% (1)
Appel Maths For Physicists PDF
666 pages
Gold Mining Blasting Techniques
No ratings yet
Gold Mining Blasting Techniques
8 pages
Cubic Spline Interpolation Guide
No ratings yet
Cubic Spline Interpolation Guide
11 pages
Ansys Capabilities Chart - Fluide
No ratings yet
Ansys Capabilities Chart - Fluide
7 pages
Experimental Stress Analysis Guide
No ratings yet
Experimental Stress Analysis Guide
18 pages
Oko 2000 e 2019
No ratings yet
Oko 2000 e 2019
29 pages

Visual Based Moving Target Tracking

Uploaded by

Visual Based Moving Target Tracking

Uploaded by

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 25, NO.

8, AUGUST 2024 9115

Visual-Based Moving Target Tracking With

W ITH eminent maneuverability, quick deployment, and

• A DRL framework is developed to learn the trajectory of

Condition 1: For any θ ∈ 2, the functions θ̄ 7→ IV. E XPERIMENT AND R ESULTS

Fig. 5. Monitor-target distance, energy consumption and battery level of the

battery constraint, and reaches the maximum reward at the

slots (steps). No exploration noise is added during the testing

Fig. 12. A demonstration of the 3D trajectories generated, when

D. Comparison With Learned Policy w/o Battery Constraint

successful target tracking. This is because the feasibility of the

You might also like