[go: up one dir, main page]

0% found this document useful (0 votes)
42 views18 pages

40 - Benchmarking Deep Learning Models For Object

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views18 pages

40 - Benchmarking Deep Learning Models For Object

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Benchmarking Deep Learning Models for Object

Detection on Edge Computing Devices

Daghash K. Alqahtani1[0009−0001−5309−6996] , Aamir


2[0000−0003−2139−9121]
Cheema , and Adel N. Toosi1,2[0000−0001−5655−5337]
1
The University of Melbourney, Melbourne, Victoria, Australia
arXiv:2409.16808v1 [cs.CV] 25 Sep 2024

{daghash.alqahtani@student, adel.toosi}@unimelb.edu.au
2
Monash University, Melbourne, Victoria, Australia
Aamir.Cheema@monash.edu

Abstract. Modern applications, such as autonomous vehicles, require


deploying deep learning algorithms on resource-constrained edge devices
for real-time image and video processing. However, there is limited un-
derstanding of the efficiency and performance of various object detection
models on these devices. In this paper, we evaluate state-of-the-art object
detection models, including YOLOv8 (Nano, Small, Medium), Efficient-
Det Lite (Lite0, Lite1, Lite2), and SSD (SSD MobileNet V1, SSDLite
MobileDet). We deployed these models on popular edge devices like the
Raspberry Pi 3, 4, and 5 with/without TPU accelerators, and Jetson
Orin Nano, collecting key performance metrics such as energy consump-
tion, inference time, and Mean Average Precision (mAP). Our findings
highlight that lower mAP models such as SSD MobileNet V1 are more
energy-efficient and faster in inference, whereas higher mAP models like
YOLOv8 Medium generally consume more energy and have slower in-
ference, though with exceptions when accelerators like TPUs are used.
Among the edge devices, Jetson Orin Nano stands out as the fastest
and most energy-efficient option for request handling, despite having the
highest idle energy consumption. These results emphasize the need to
balance accuracy, speed, and energy efficiency when deploying deep learn-
ing models on edge devices, offering valuable guidance for practitioners
and researchers selecting models and devices for their applications.

Keywords: Deep Learning · Object Detection Models · Performance


evaluation · Inference Time · Energy Efficiency · Accuracy · Edge.

1 Introduction
Object detection is crucial in computer vision for identifying and locating ob-
jects in images or videos. It helps organizations optimize and automate processes
and has diverse applications in fields like autonomous vehicles, surveillance,
retail, healthcare, agriculture, manufacturing, sports analytics, environmental
monitoring, and smart cities. Object detection is fundamental to autonomous
transportation, enabling precise recognition of pedestrians, obstacles, and other
vehicles to ensure safe operation. In autonomous vehicles, for example, object
2 D. Alqahtani et al.

detectors identify the vehicle’s position, track other objects, and facilitate route
planning. Onboard computing units and Internet of Things (IoT) sensors enable
local, real-time data processing, allowing for rapid responses to avoid collisions.
These capabilities highlight the significant value of object detection in enhancing
the performance and safety of automated systems [1].
The field of object detection is marked by continuous technological progress,
with ongoing refinement of detection algorithms to improve accuracy and speed
in complex environments. The emergence of edge computing enables real-time
object detection on devices like smartphones, drones, and IoT devices, reducing
latency and cloud dependence. However, significant challenges remain in devel-
oping robust object detection systems for edge computing due to constrained
resources and varying energy requirements. Researchers and industries must se-
lect appropriate models and edge devices to balance accuracy, processing speed,
and energy consumption.
In this paper, we aim to evaluate the performance of most popular deep
learning models for object detection across prominent edge devices, collecting key
metrics such as energy consumption, inference time, and accuracy. Additionally,
we provide insights for deploying these models on the investigated edge devices.
Our key contributions can be summarized as follows:

– We developed object detection applications for processing images as a web


service using Flask-API.
– We utilized different frameworks, including PyTorch, TensorFlow Lite, and
TensorRT, to deploy our deep learning web service models on the edge de-
vices including Raspberry Pi series, Edge TPU, and Jetson Orin Nano.
– We employed the FiftyOne tool to evaluate the accuracy of the object detec-
tion models and collected the mean Average Precision using COCO datasets
for each model on each device.
– We conducted automated comprehensive performance measurement tests us-
ing the Locust tool and reported the energy consumption and inference time
of various models including YOLOv8, SSD, and EfficientDet Lite models
with different versions on the edge devices.

The remainder of this paper is organized as follows. Section 2 provides an


overview of the edge computing and object detection deep learning architectures.
Section 3 outlines the performance evaluation, including the evaluation metrics,
experimental setup, and the results of our experiments. Section 4 presents the
related work. Finally, Section 5 concludes the paper and suggests future research
directions.

2 Edge Devices, Frameworks and Model Formats

2.1 Edge Devices

Raspberry Pi: The Raspberry Pi is a line of single-board computers pro-


duced by the Raspberry Pi Foundation [8]. When connected to accessories like
Benchmarking Object Detection Models on Edge Devices 3

a keyboard, mouse, and monitor, the Raspberry Pi becomes a low-cost personal


computer. It is widely used for robotics, Internet of Things applications, and
real-time image and video processing. The latest Raspberry Pi models include
the Raspberry Pi 3 Model B+, Raspberry Pi 4, and Raspberry Pi 5. The Rasp-
berry Pi 4 maintains compatibility with the previous Raspberry Pi 3 Model B+
while offering improvements in processor speed, multimedia capabilities, memory
capacity, and connectivity. The Raspberry Pi 5 is the newest model, featuring
significant enhancements in CPU and GPU performance as well as increased
memory capacity and I/O bandwidth compared to the Raspberry Pi 4.

TPU Accelerator: The Coral USB Accelerator [6] is a USB device that func-
tions as an Edge TPU co-processor for computational devices. It features a
USB-C port, allowing it to connect to a host computer and accelerate machine
learning inference tasks. The Edge TPU, an Application-Specific Integrated Cir-
cuit developed by Google, is designed to execute machine learning models on
edge computing devices like the Raspberry Pi.

NVIDIA Jetson: The NVIDIA Jetson Orin Series represents the latest in de-
veloper boards for energy-efficient autonomous machinery. With up to 275 tril-
lion operations per second (TOPS) and an 8X performance improvement over
the previous generation, the Jetson Orin modules enable numerous concurrent AI
inference pipelines. Their support for high-speed interfaces and multiple sensors
makes them ideal for advanced robotics. The Orin Nano, the entry-level module
in this series, focuses on low power consumption and cost efficiency while main-
taining AI processing capabilities. It is suitable for edge AI devices, IoT devices,
and embedded systems where space, power, and cost efficiency are critical [16].

2.2 Deep Learning Architectures for Object Detection

Deep learning-based object detection uses convolutional neural networks (CNNs),


which replicate neural structures of the human brain with an input layer, multi-
ple hidden layers, and an output layer. These networks, trained using supervised,
semi-supervised, or unsupervised methods, achieve high speed and accuracy in
detecting single or multiple objects due to their automated learning capabilities
with minimal manual feature engineering. Early deep learning-based object de-
tection models are categorized into one-stage and two-stage detectors. One-stage
detectors predict both the bounding box and the object’s category in a single
network pass, enabling faster and more efficient real-time detection. Examples
of one-stage object detection models include YOLO, SSD, and EfficientDet.

You Only Look Once (YOLO): Introduced in 2015 by Redmon et al. [17],
the YOLO algorithm represents a significant advancement in real-time object de-
tection. Unlike conventional methods that apply a classifier to multiple regions
within an image, YOLO models treat detection as a single regression problem,
4 D. Alqahtani et al.

predicting bounding boxes and class probabilities from entire images in a sin-
gle evaluation. This approach significantly improves detection speed, making
YOLO ideal for real-time applications. The algorithm has evolved through sev-
eral iterations, including YOLOv2, YOLOv3, YOLOv4, and the latest YOLOv8
developed by Ultralytics [19], each enhancing accuracy, speed, and the ability to
detect a wider range of object sizes.

Single shot multibox detector (SSD): The SSD algorithm predicts multiple
bounding boxes and their corresponding class scores in a single pass. It uses mul-
tiple feature maps from different network layers to perform detections at various
scales. SSD’s key innovation is using feature maps from various network layers
to predict detections at multiple scales, addressing the challenge of detecting
objects of different sizes. Smaller feature maps detect larger objects, while larger
feature maps detect smaller objects [14].

EfficientDet: Developed by Google’s Brain Team, EfficientDet is renowned for


its efficiency and scalability. Built on the EfficientNet architecture, it scales the
network’s depth, width, and resolution through a compound scaling method,
enhancing performance while reducing computational cost. The model features
a novel weighted bi-directional feature pyramid network for improved feature
fusion and cross-scale connections. EfficientDet achieves superior results with
fewer parameters and FLOPs. It includes seven variants, EfficientDet0 through
EfficientDet6, and EfficientDet Lite models for embedded devices as detailed
in [18].

2.3 Deep learning Frameworks

Deep learning frameworks play a crucial role in developing and deploying object
detection models by providing comprehensive tools and functionalities. These
frameworks facilitate building, training, and implementing object detection sys-
tems with pre-built models, data augmentation techniques, and utilities for var-
ious development stages. Examples include TensorFlow Lite, PyTorch, and Ten-
sorRT. They enable researchers and developers to efficiently manage the com-
plexities of object detection tasks, from data preprocessing and model training
to evaluation and deployment.

3 Performance Evaluation

In this study, we evaluate the performance of three prominent object detection


models YOLOv8, EfficientDet Lite, and SSD on multiple edge devices, including
Raspberry Pi 3, 4, 5, Pi 3 with TPU, Pi 4 with TPU, Pi 5 with TPU, and Jetson
Orin Nano. The evaluation focuses on three key performance metrics: inference
time, energy consumption, and mean Average Precision (mAP). Next, we will
introduce these metrics.
Benchmarking Object Detection Models on Edge Devices 5

3.1 Metrics
Inference Time: This metric measures the time taken by each model from re-
ceiving the input image to producing detection results, excluding pre-processing
or post-processing steps. The inference time is crucial for real-time object detec-
tion. We report the inference time in milliseconds for each model on each device,
averaging over a series of test images for consistent and reliable measurements.

Energy Consumption: This metric evaluates the energy efficiency of each


model on different edge devices. First, we measure the base energy consumption
(BE) of the devices in an idle state for five minutes without running any compu-
tations. Next, we measure the total energy consumption (TE) for five minutes
while running an object detection model. Reported in milliwatt-hours (mWh),
the energy consumption per request excluding the base energy usage (EexcR)
is determined by subtracting the base energy consumption from the total en-
ergy consumption and dividing this difference by the number of requests (NR)
processed, as follows: EexcR = TE−BE
NR . This metric is calculated per request, as
the energy consumption depends on the number of processed requests. Evalu-
ating on a per-request basis is crucial for a fair comparison across the different
platforms. This is particularly important for battery-powered devices.

Model Evaluation Using COCO Dataset: To determine the capabilities and


accuracy of the YOLOv8, EfficientDet Lite, and SSD models, we use the COCO
validation dataset of 5,000 images. The open-source FiftyOne [20] tool facilitates
visualization, access to COCO data resources [13], and model evaluation. It
calculates model accuracy by comparing detected objects to ground reference
data. Accuracy is evaluated using four metrics: Precision, Recall, F1 score, and
COCO mean Average Precision (mAP).

3.2 Experimental Setup


Hardware and Device Setup: Our experimental setup includes various edge
devices to evaluate the performance of object detection models as Table 1 shows.
The devices tested are Raspberry Pi 3 Model B+ (1 GB LPDDR2 RAM), Rasp-
berry Pi 4 Model B (4 GB LPDDR4-3200 SDRAM) and Raspberry Pi 5 (4
GB LPDDR4x RAM). These devices are selected for their popularity and af-
fordability. To enhance computational power for deep learning tasks, we equip
these Raspberry Pi models with Google Coral USB Accelerators (TPUs). For
high-performance comparison, we include the NVIDIA Jetson Orin Nano (4 GB
RAM, integrated GPU). This allow us to compare devices with CPUs, TPUs,
and GPUs. Additionally, We use a UM25C USB power meter with Bluetooth
connectivity to measure the energy consumption of the edge devices.

Software and Frameworks: In our experimental setup, we utilize various soft-


ware frameworks and tools to deploy and run object detection models on different
6 D. Alqahtani et al.

edge devices as Table 1 displays that. The choice of software and frameworks
is influenced by the need to optimize performance for each specific device, in-
cluding those with CPUs, TPUs, and GPUs. We use PyTorch to deploy and run
YOLOv8 models on the Raspberry Pi 3, Raspberry Pi 4, and Raspberry Pi 5.
To leverage the capabilities of TPUs, the YOLOv8 models are converted from
PyTorch to TensorFlow Lite (TFLite) format and compiled to run on Raspberry
Pis with TPUs. Please note that to run YOLOv8 on a TPU, we have compressed
the model size to ensure it is executable on the TPU. This means the image size
is reduced from 640x640 to 320x320. For deployment on the NVIDIA Jetson
Orin Nano, the YOLOv8 models are converted to TensorRT format to optimize
for the GPU.
EfficientDet Lite and SSD models are initially in TFLite format from Coral [7]
to be deployed on the Raspberry Pi devices. These models are compiled to run
on TPUs in the respective Raspberry Pi devices. For the NVIDIA Jetson Orin
Nano, EfficientDet Lite and SSD models are converted from TFLite to TensorRT
format for optimized performance on the GPU.
The operating systems used are Raspberry Pi OS (Bookwork - 64 bit) for the
Raspberry Pi devices and Jetson Linux (Ubuntu-based) for the NVIDIA Jetson
Orin Nano. Additional libraries and tools, such as OpenCV for image processing
and FiftyOne for model evaluation, are utilized to facilitate the experiments. To
detect objects in an image, we write Python code and utilize the Flask-RESTful
library to run this code as a service with an API URL. This approach allow
us to deploy the object detection functionality as a web service, enabling easy
integration and testing across different edge devices.

Table 1: Edge Devices, Models and Frameworks.


YOLOv8 EfficientDet SSD
Edge Device RAM
Framework Framework Framework
Raspberry Pi 3 Model B+ 1 GB PyTorch TFLite TFLite
Raspberry Pi 4 Model B 4 GB PyTorch TFLite TFLite
Raspberry Pi 5 4 GB PyTorch TFLite TFLite
Pi 3 Model B+ with TPU 1 GB TFLite TFLite TFLite
Pi 4 Model B with TPU 4 GB TFLite TFLite TFLite
Pi 5 with TPU 4 GB TFLite TFLite TFLite
Jetson Orin Nano 4 GB TensorRT TensorRT TensorRT

Experimental Procedure: The procedure for evaluating the object detection


models involved several steps as Fig. 1 presents. First, the base energy consump-
tion for each device was measured by running an energy reader Python script
on a separate device for five minutes without any computational load. Second,
the Locust file was used to send requests sequentially, with each new request
sent immediately after the previous response, for five minutes, during which the
total energy consumption was measured using the energy reader script, and the
average inference time was calculated from the responses along with the number
of requests. All this data was automatically written to a CSV file by the Locust
Benchmarking Object Detection Models on Edge Devices 7

file. Next, to automate the testing, a bash script on the agent device ran the
object detection service on the edge device, followed by the Locust file. Upon
completion, the script terminated the service and proceeded to the next model,
ensuring consistent testing across all devices. Experiments were repeated three
times for each model on each device, with the average values used for the final
analysis. Finally, the accuracy measurement involved separate Python scripts
using the FiftyOne tool to download the COCO validation dataset, and the ob-
ject detection models were run on this dataset to calculate accuracy metrics,
including Precision, Recall, F1 score, and mean Average Precision, which were
recorded in a CSV file.

Agent Device
Locust File Automation Test File

Post image to API Run Object Detection


Service Model

Connect via Bluetooth Read Power meter Run Locust File

Write response in
CSV file Kill the proccess

Power Supply Power Meter


Response in Json Send Request Terminate The Service
Run The Service
Edge Devices (Raspberry Pi3, Pi4, Pi5, TPU and Jetson Orin Nano)

API Endpoint API Endpoint API Endpoint API Endpoint


SSD Object EfficientDet Lite0 EfficientDet Lite1
SSD Lite Object
Detection Object Detection Object Detection
Detection model
model model model
API Endpoint API Endpoint API Endpoint API Endpoint
EfficientDet Lite2 YOLOv8 Nano YOLOv8 Small YOLOv8 Medium
Object Detection Object Detection Object Detection Object Detection
model model model model

Fig. 1: Experimental Software and Hardware Setup.

3.3 Experimental Results


Energy Consumption: This section discusses the base energy consumption,
total energy consumption per request, and energy consumption per request ex-
cluding the base energy for various devices.
To start with Fig. 2(a) shows the baseline energy consumption of selected
edge devices in mWh. Comparing the different Raspberry Pi models, Pi3 con-
sumes more energy (270 mWh) than both Pi4 (199 mWh) and Pi5 (217 mWh).
This indicates an improvement in energy efficiency in newer models. However,
when considering the TPU variants, the energy consumption is similar, with
both Pi3 with TPU and Pi4 with TPU consuming roughly 300 mWh, while Pi5
with TPU shows reduced consumption at 261 mWh. Notably, the Orin Nano
device demonstrates the highest baseline energy consumption at 362 mWh.
In addition, measuring the energy consumption per request, excluding the
base energy, for the investigated object detection models on the evaluated edge
devices yields interesting results. Firstly, as Fig. 2(b) presents the outcomes on
Pi3, the Det_lite models exhibit energy consumption ranging from 0.41 mWh
to 0.98 mWh, while SSD_v1 and SSD_lite models consume 0.31 mWh and 0.41
8 D. Alqahtani et al.

mWh, respectively. The YOLO8 models demonstrate higher energy demands,


spanning from 1.22 mWh to 5.87 mWh. When the TPU is integrated with Pi3,
as Fig. 2(c) displays, the Det_lite models consume between 0.32 mWh and 0.61
mWh, and the SSD_v1 and SSD_lite models show reduced consumption of
0.11 mWh and 0.13 mWh, respectively. The YOLO8 models also show decreased
energy consumption, ranging from 0.23 mWh to 0.43 mWh.
Secondly, Pi4 introduces improved energy efficiency compared to Pi3, with
Det_lite models consuming between 0.14 mWh and 0.33 mWh, and SSD_v1 and
SSD_lite models consuming 0.11 mWh and 0.14 mWh, respectively, as Fig. 2(d)
shows. The YOLO8 models on Pi4 range from 0.77 mWh to 2.92 mWh. With
the addition of the TPU, Pi4’s energy consumption decreases further; Det_lite
models consume between 0.13 mWh and 0.19 mWh, and SSD_v1 and SSD_lite
models consume 0.10 mWh and 0.11 mWh, respectively. The YOLO8 models
range from 0.26 mWh to 0.32 mWh on Pi4 with TPU, as Fig. 2(e) presents.
Furthermore, Pi5 displays similar energy usage patterns with P4 as Fig. 2(f)
shows. Det_lite models consuming between 0.24 mWh and 0.47 mWh, and
SSD_v1 and SSD_lite models consuming 0.22 mWh and 0.24 mWh, respec-
tively. The YOLO8 models on Pi5 range from 1.02 mWh to 3.58 mWh. Fig. 2(g)
presents the integration of TPU with Pi5, with Det_lite models consuming be-
tween 0.18 mWh and 0.30 mWh, and SSD_v1 and SSD_lite models consuming
0.13 mWh and 0.14 mWh, respectively. The YOLO8 models range from 0.47
mWh to 0.62 mWh on Pi5 with TPU.
Finally, the Jetson Orin Nano as Fig. 2(h) demonstrates the lowest energy
consumption per request across all models, with Det_lite models consuming
between 0.09 mWh and 0.14 mWh, and SSD_v1 and SSD_lite models consuming
0.01 mWh and 0.06 mWh, respectively. The YOLO8 models on Jetson Orin Nano
range from 0.13 mWh to 0.22 mWh.
Key insights: Pi3 devices generally exhibit higher energy consumption com-
pared to Pi4 and Pi5 models, indicating an improvement in energy efficiency in
the newer models. The addition of TPUs consistently reduces the energy con-
sumption for object detection tasks across all Pi models, particularly in Pi4 and
Pi5. However, it is important to note that the addition of TPU has increased
the base energy consumption of these devices by 9%, 46%, and 20% for Pi 3,
4, and 5, respectively. Among all the models tested, YOLO8_m has the highest
energy consumption per request, while SSD_v1 consumes the lowest energy per
request.

Inference Time: This section analyzes the inference times of object detec-
tion models. The measurements, reported in milliseconds, reveal distinct perfor-
mance patterns across these platforms. Beginning with Pi3, the SSD_v1 model
exhibits the lowest inference time at 427 ms among all evaluated models. While
the Det_lite models require longer inference times compared to SSD_v1 and
SSD_lite, they still outperform all variants of the YOLO8 model. Notably, the
YOLO8 models demonstrate the highest inference times, with the maximum
recorded at 12,960 ms. When the Coral USB Accelerator is integrated with the
Benchmarking Object Detection Models on Edge Devices 9

362 20 Excl. Base Energy 2.5 Excl. Base Energy

Base Energy (mWh)

E.C Per Req. (mWh)

E.C Per Req. (mWh)


300 270 295 291 Total Energy
2.0
Total Energy
261 15
199 217
200 1.5
10
5.87 1.0
100 5 0.61
2.42 0.5 0.32 0.38 0.28
0.43
0.41 0.68 0.98 0.31 0.41 1.22 0.11 0.13 0.23
0 0 0.0
Pi3 i3
U
Pi4

U
Pi5
Or TPU
no

SS v1

SS v1
De ite0
De ite1
SS e2

Yol te
Yol _n
Yol _s
_m

De ite0
De ite1
SS e2

Yol ite
Yol _n
Yol _s
_m
_TP

_TP
P

Na

li

o8

o8
o8

o8
D_

D_
it

it

l
_

o8

o8
D_

D_
Pi4

Pi5

t_l
t_l
t_l

t_l
t_l
t_l
in

De

De
Device Name Model Name Model Name
(a) Base Energy (b) Raspberry Pi3 (c) Pi3 + TPU
6 0.8 Excl. Base Energy
5
Excl. Base Energy Excl. Base Energy

E.C Per Req. (mWh)

E.C Per Req. (mWh)


E.C Per Req. (mWh)

Total Energy Total Energy 4 Total Energy


0.6 3.58
4 3
2.92 0.4 0.32 2 1.79
2 1.51 0.26 0.28
0.2 0.13 0.15 0.19 1 1.02
0.77 0.1 0.11 0.47
0.23 0.33 0.11 0.14 0.24 0.35 0.22 0.24
0 0.14 0.0 0
SS v1

SS v1

SS v1
De te0
De te1
SS 2

Yol te
Yol n
Yol _s
_m

De 0
De ite1
SS e2

Yol ite
Yol _n
Yol _s
_m

De te0
De te1
SS 2

Yol te
Yol n
Yol _s
_m
ite

ite

_
li

li
o8

o8

o8
o8

o8

o8
D_

D_

D_
t

it

l
o8

o8

o8
D_

D_

D_
i
i

i
i
t_l
t_l
t_l

t_l
t_l
t_l

t_l
t_l
t_l
De

De

De
Model Name Model Name Model Name
(d) Raspberry Pi4 (e) Pi4 + TPU (f) Raspberry Pi5

1.0 Excl. Base Energy 0.8 Excl. Base Energy


E.C Per Req. (mWh)

E.C Per Req. (mWh)

Total Energy Total Energy


0.8 0.6
0.62
0.6
0.47 0.51 0.4
0.4 0.3 0.22
0.2 0.18 0.21 0.2 0.14 0.13 0.16
0.13 0.14 0.09 0.11 0.06
0.0 0.0 0.01
SS v1

SS v1
De ite0
De ite1
SS e2

Yol ite
Yol _n
Yol _s
_m

De ite0
De ite1
SS e2

Yol ite
Yol _n
Yol _s
_m
o8

o8
o8

o8
D_

D_
it

it
l

l
o8

o8
D_

D_
t_l
t_l
t_l

t_l
t_l
t_l
De

De

Model Name Model Name


(g) Pi5 + TPU (h) Jetson Orin Nano

Fig. 2: Base Energy Energy Consumption per request excluding the base energy
/ Total Energy consumption per request for different edge devices.

Raspberry Pi3, as shown in Fig. 3, the inference times for SSD and YOLO8
models improve significantly, with SSD_v1 remaining the fastest at 61 ms. In
contrast, Det_lite2 exhibits the highest inference time, taking 1,576 ms.
The results reveal that on Pi4, the SSD_v1 and SSD_lite models achieve
the fastest inference times at 209 ms and 292 ms, respectively. Conversely, the
YOLOv8 models across all versions are slower than the Det_lite0 and Lite1
models, with YOLO8_m representing the slowest at 3671 ms, as shown in Fig. 3.
The addition of the Edge TPU to Pi4 significantly reduces the inference times
of the SSD and YOLO8 models, with the SSD_v1 model achieving the lowest
inference time of 12 ms, while the Det_lite2 model has the highest at 188 ms.
Similarly, on Pi5, the SSD_v1 and SSD_lite models are the fastest, with in-
ference times of 93 ms and 127 ms, respectively. Although the Det_lite models
are slower than the SSD models, they are still faster than the YOLO8 models,
with YOLO8_m exhibiting the highest inference time of 1348 ms. The integra-
tion of the Edge TPU to Pi5 further improves the performance of the SSD and
10 D. Alqahtani et al.

12960 1576 3671


12500 1500

Inference Time (ms)

Inference Time (ms)


Inference Time (ms)
10000 3000
7500 1000 908
6516 770 2000 1750
5000 500
559
2548 282 1000 882 760
2500 1985 562
611 1179 427 589 61 93 176 297 209 292
0 0 0

SS v1

SS _v1

SS v1
De lite0
De lite1
SS e2

Yol lite
Yol _n
Yol 8_s
_m

De ite0
De ite1
SS e2

Yol ite
Yol _n
Yol 8_s
_m

De ite0
De ite1
SS e2

Yol ite
Yol _n
Yol 8_s
_m
o8

o8

o8
D_

D_
it

it

it
l

l
o8

o8

o8
D_

D_

D_
o

o
D
t_l

t_l
t_l
t_l

t_l
t_l
t_l
t_
t_
De

De

De
Model Name Model Name Model Name
(a) Raspberry Pi3 (b) Pi3 + TPU (c) Raspberry Pi4
188 1348 139
1250

Inference Time (ms)


Inference Time (ms)
Inference Time (ms)

125
150 1000
115
100 84
97
112
100 750 644 75
79 57
50
500 380 50 36
50 34 316
20 250 126 242 25 15 21
12 93 127 10
0 0 0
SS v1

SS _v1

SS v1
De ite0
De ite1
SS e2

Yol ite
Yol _n
Yol 8_s
_m

De ite0
De ite1
SS e2

Yol ite
Yol _n
Yol 8_s
_m

De ite0
De ite1
SS e2

Yol ite
Yol _n
Yol 8_s
_m
o8

o8

o8
D_

D_
it

it

it
l

l
o8

o8

o8
D_

D_

D_
o

o
D
t_l
t_l
t_l

t_l
t_l
t_l

t_l
t_l
t_l
De

De

De
Model Name Model Name Model Name
(d) Pi4 + TPU (e) Raspberry Pi5 (f) Pi5 + TPU

50 50
Inference Time (ms)

40
30 29
23 22 22
20 20 19
16
10
0
SS v1
De ite0
De ite1
SS e2

Yol te
Yol _n
Yol _s
_m
li

o8
o8
D_
it

o8
D_
t_l
t_l
t_l
De

Model Name
(g) Jetson Orin Nano

Fig. 3: Inference Time per request for different edge devices.

YOLO8 models, with SSD_v1 achieving the lowest inference time of 10 ms and
Det_lite2 having the highest at 139 ms.
Finally, on Jetson Orin Nano, the YOLO8_n model demonstrates the mini-
mum inference time of 16 ms, while the Det_lite and SSD models have similar
inference times within the range of 20 ms. The YOLO8_m model remains the
slowest at 50 ms, as presented in Fig. 3.
Key insights: SSD_v1 exhibits the most rapid inference times when de-
ployed across various edge devices. The incorporation of TPU, substantially
enhances the performance of the evaluated models. Conversely, YOLO8_m gen-
erally demonstrates the slowest inference times among the tested configurations.

Accuracy: This subsection presents the accuracy measurements. The mean Av-
erage Precision (mAP) on the Raspberry Pi devices varied across model sizes,
as shown in Fig. 4. While the SSD_v1 model has the lowest mAP at 19, the
YOLO8_m model achieves the highest mAP of 44 among all evaluated mod-
els. The Det_lite0, lite1, and lite2 models exhibit medium mAPs ranging from
26 to 33. Both SSD_lite and YOLO8_n have similar mAPs around 30, while
Benchmarking Object Detection Models on Edge Devices 11

YOLO8_s has a higher mAP of approximately 40. When deploying these mod-
els on the Pi devices equipped with TPU accelerators, the Det_lite and SSD
models, including all their versions, demonstrate comparable mAPs to those ob-
served on the standalone Pi devices, as illustrated in Fig. 4. However, running
the YOLO8 models on Pis with TPU accelerators resulted in a reduction in ac-
curacy, with YOLO8_n having the lowest mAP of 16. Furthermore, the mAP of
YOLO8 object detection models on Jetson Orin Nano follow a similar accuracy
pattern as on the Raspberry Pi, ranging from 31 to 44, as presented in Fig. 4. In
contrast, the SSD_v1, SSD_lite, Det_lite0, Det_lite1, and Det_lite2 models
exhibit a slight decrease in mAP compared to the Raspberry Pi and Edge TPU
results. The SSD_v1 model has the lowest mAP at 16, while the SSD_lite model
achieves 27. The Det_lite0 model has an accuracy of 23, while the Det_lite1
and Det_lite2 models perform better, with mAPs of 28 and 32, respectively.
Key insights: YOLO8_m demonstrates consistently superior accuracy com-
pared to other evaluated models across various device platforms. Conversely, the
SSD_v1 model often exhibits the lowest mean Average Precision (mAP) among
the tested models. The use of TPU accelerators on the Pi devices yields sim-
ilar accuracy levels for the Det_lite and SSD model families, but results in a
reduction in accuracy for the YOLO8 models. Jetson Orin Nano exhibits com-
parable accuracy patterns for the YOLO8 models to the other setups, but shows
a slightly lower mAP for the remaining models in comparison to the Raspberry
Pi and TPU-equipped configurations.

50 44
50 50 44
40 39 40 40 39
Accuracy (mAP)

Accuracy (mAP)

Accuracy (mAP)

33 31 33 32 32 31
30 30 30 30
30 26 30 26 25 30 28 27
23
20 19 20 19 16 20 16
10 10 10
0 0 0
SS v1

SS v1

SS v1
De ite0
De ite1
SS e2

Yol te
Yol _n
Yol _s
_m

De ite0
De ite1
SS e2

Yol te
Yol _n
Yol _s
_m

De ite0
De ite1
SS e2

Yol te
Yol _n
Yol _s
_m
li

li

li
o8

o8

o8
o8

o8

o8
D_

D_

D_
it

it

it
o8

o8

o8
D_

D_

D_
t_l
t_l
t_l

t_l
t_l
t_l

t_l
t_l
t_l
De

De

De

Model Name Model Name Model Name


(a) Raspberry Pi (b) Pi + TPU (c) Jetson Orin Nano

Fig. 4: Accuracy (mAP) for different edge devices.

Energy Consumption vs Inference Time: This section highlights the re-


lationship between energy consumption and inference time for object detection
models on edge devices, essential for optimizing performance, extending battery
life, meeting real-time application needs, and reducing costs. Deploying SSD
models on devices shows that energy consumption and inference time are largely
linearly correlated for various models. As shown in the Fig. 5(a-b), while Pi3
performs the worst in both metrics, Pi5 with TPU, Pi4 with TPU, and Jetson
Orin Nano perform equally well, forming a Pareto front. While Pi5 with TPU
12 D. Alqahtani et al.

Pi3 Pi5 Pi4_TPU Orin Nano Pi3 Pi5 Pi4_TPU Orin Nano Pi3 Pi5 Pi4_TPU Orin Nano

Energy Cons. Per Req. (mWh)

Energy Cons. Per Req. (mWh)

Energy Cons. Per Req. (mWh)


Pi4 Pi3_TPU Pi5_TPU Pi4 Pi3_TPU Pi5_TPU Pi4 Pi3_TPU Pi5_TPU
0.3 0.4 0.4

0.2 0.3 0.3


0.2 0.2
0.1
0.1 0.1
0.0101 102 103 104 101 102 103 104 101 102 103 104
Inference Time (ms) Inference Time (ms) Inference Time (ms)
(a) SSD_v1 (b) SSD_lite (c) Det_lit0

Pi3 Pi5 Pi4_TPU Orin Nano Pi3 Pi5 Pi4_TPU Orin Nano Pi3 Pi5 Pi4_TPU Orin Nano
Energy Cons. Per Req. (mWh)

Energy Cons. Per Req. (mWh)

Energy Cons. Per Req. (mWh)


Pi4 Pi3_TPU Pi5_TPU Pi4 Pi3_TPU Pi5_TPU Pi4 Pi3_TPU Pi5_TPU
1.0
0.6 0.8 1.0
0.4 0.6
0.4 0.5
0.2 0.2
101 102 103 104 101 102 103 104 101 102 103 104
Inference Time (ms) Inference Time (ms) Inference Time (ms)
(d) Det_lite1 (e) Det_lite2 (f) Yolo8_n

Pi3 Pi5 Pi4_TPU Orin Nano Pi3 Pi5 Pi4_TPU Orin Nano
Energy Cons. Per Req. (mWh)

Energy Cons. Per Req. (mWh)

Pi4 Pi3_TPU Pi5_TPU Pi4 Pi3_TPU Pi5_TPU


6
2
4
1 2

101 102 103 104 01


10 102 103 104
Inference Time (ms) Inference Time (ms)
(g) Yolo8_s (h) Yolo8_m

Fig. 5: Energy consumption per request (excluding base energy) versus inference
time for various object detection models (A fitted linear regression line is shown,
which appears curved due to the logarithmic scale of the inference time).

has better inference times, Jetson Orin Nano demonstrates better energy effi-
ciency. Pi3 with TPU comes next, and Pi4 and Pi5 are comparable, before the
worst performance by Pi3. When running the Det_Lite and YOLO8 models, the
edge devices exhibit a linear correlation between the two metrics. Jetson Orin
Nano outperforms the other devices, while Pi3 demonstrates the poorest perfor-
mance in terms of both inference time and energy consumption. However, Pi5
shows outlier results with YOLO8, deviating from the overall regression trend
as Fig. 5(c-h) presents.

Energy Consumption vs Accuracy: This section briefly presents the results


of energy consumption versus accuracy for the evaluated object detection models
on edge devices. When deploying SSD and Det_Lite models on edge devices,
the accuracy remains consistent across all devices, with a minor reduction on
the Jetson Orin Nano. Jetson Orin Nano is the most energy-efficient device,
while Pi3 is the least efficient. For YOLO8, accuracy remains stable across all
Benchmarking Object Detection Models on Edge Devices 13

Pi4 Pi5 Pi3_TPU Pi4_TPU Pi4 Pi5 Pi3_TPU Pi4_TPU Pi4 Pi5 Pi3_TPU Pi4_TPU
Pi5_TPU Orin Nano Pi3 Pi5_TPU Orin Nano Pi3 Pi5_TPU Orin Nano Pi3
50 50 50
40
Accuracy (mAP) 40 40

Accuracy (mAP)

Accuracy (mAP)
30 30 30
20 20 20
10 10 10
00.0 0.1 0.2 0.3 0 0.1 0.2 0.3 0.4 0 0.1 0.2 0.3 0.4
Energy Consumption (mWh) Energy Consumption (mWh) Energy Consumption (mWh)
(a) SSD_v1 (b) SSD_lite (c) Det_lit0

Pi4 Pi5 Pi3_TPU Pi4_TPU Pi4 Pi5 Pi3_TPU Pi4_TPU Pi4 Pi5 Pi3_TPU Pi4_TPU
Pi5_TPU Orin Nano Pi3 Pi5_TPU Orin Nano Pi3 Pi5_TPU Orin Nano Pi3
50 50 50
40 40 40
Accuracy (mAP)

Accuracy (mAP)

Accuracy (mAP)
30 30 30
20 20 20
10 10 10
0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1.0 0 0.25 0.50 0.75 1.00 1.25
Energy Consumption (mWh) Energy Consumption (mWh) Energy Consumption (mWh)
(d) Det_lite1 (e) Det_lite2 (f) Yolo8_n

Pi4 Pi5 Pi3_TPU Pi4_TPU Pi4 Pi5 Pi3_TPU Pi4_TPU


Pi5_TPU Orin Nano Pi3 Pi5_TPU Orin Nano Pi3
50 50
40 40
Accuracy (mAP)

Accuracy (mAP)

30 30
20 20
10 10
0 0.5 1.0 1.5 2.0 2.5 00 2 4 6
Energy Consumption (mWh) Energy Consumption (mWh)
(g) Yolo8_s (h) Yolo8_m

Fig. 6: Energy consumption per request (excluding base energy) versus accuracy
for various object detection models

devices, except for Pis with TPUs, which show a significant reduction due to the
compression process required for execution on TPUs. Overall, Jetson Orin Nano
achieves the best results in terms of accuracy and energy consumption, whereas
Pi3 performs the worst as Fig. 6 displays.

Inference Time vs Accuracy: This section summarizes the results of the


evaluation of inference time and accuracy for the investigated object detection
models on different edge devices. When deploying SSD models, the accuracy
remains consistent across all evaluated devices, with a slight decrease observed
on Jetson Orin Nano. Pi5 with TPU exhibits the fastest inference time, while Pi3
is the slowest. A similar trend is observed for the Det_Lite models, with Jetson
Orin Nano demonstrating the fastest inference time. For the YOLO8 models,
the accuracy remains stable across most devices, except for the Pis with TPUs,
which show a significant reduction in accuracy due to the compression process
required for execution on these platforms. Overall, Jetson Orin Nano achieves
14 D. Alqahtani et al.

Pi4 Pi5 Pi3_TPU Pi4_TPU Pi4 Pi5 Pi3_TPU Pi4_TPU Pi4 Pi5 Pi3_TPU Pi4_TPU
Pi5_TPU Orin Nano Pi3 Pi5_TPU Orin Nano Pi3 Pi5_TPU Orin Nano Pi3
50 50 50
40
Accuracy (mAP) 40 40

Accuracy (mAP)

Accuracy (mAP)
30 30 30
20 20 20
10 10 10
01
10 102 103 104 01
10 102 103 104 01
10 102 103 104
Inference Time (ms) Inference Time (ms) Inference Time (ms)
(a) SSD_v1 (b) SSD_lite (c) Det_lit0

Pi4 Pi5 Pi3_TPU Pi4_TPU Pi4 Pi5 Pi3_TPU Pi4_TPU Pi4 Pi5 Pi3_TPU Pi4_TPU
Pi5_TPU Orin Nano Pi3 Pi5_TPU Orin Nano Pi3 Pi5_TPU Orin Nano Pi3
50 50 50
40 40 40
Accuracy (mAP)

Accuracy (mAP)

Accuracy (mAP)
30 30 30
20 20 20
10 10 10
01
10 102 103 104 01
10 102 103 104 01
10 102 103 104
Inference Time (ms) Inference Time (ms) Inference Time (ms)
(d) Det_lite1 (e) Det_lite2 (f) Yolo8_n

Pi4 Pi5 Pi3_TPU Pi4_TPU Pi4 Pi5 Pi3_TPU Pi4_TPU


Pi5_TPU Orin Nano Pi3 Pi5_TPU Orin Nano Pi3
50 50
40 40
Accuracy (mAP)

Accuracy (mAP)

30 30
20 20
10 10
01
10 102 103 104 01
10 102 103 104
Inference Time (ms) Inference Time (ms)
(g) Yolo8_s (h) Yolo8_m

Fig. 7: Inference time versus accuracy for various object detection models

the best performance in terms of accuracy and inference time for the YOLOv8
models, while Pi3 exhibits the poorest results as shown in Fig. 7.

Energy Consumption vs Inference Time vs Accuracy This section pro-


vides a concise summary of the performance evaluation results for the investi-
gated object detection models across various edge computing platforms. The 3D
plots in the Fig. 8 depict that with the SSD models, Pi5 with TPU, Pi4 with
TPU, and Jetson Orin Nano demonstrate comparable capabilities. However, Pi5
with TPU and Pi4 with TPU are preferred options as they do not compromise
accuracy, unlike the Jetson Orin Nano. Regarding the Det_Lite models, Jetson
Orin Nano emerges as the optimal choice, exhibiting superior energy efficiency
and inference speed, with only a minor impact on accuracy. For the YOLO8
models, Jetson Orin Nano is the most suitable choice among the evaluated de-
vices, delivering the best overall performance in terms of energy consumption,
inference time, and accuracy.
Benchmarking Object Detection Models on Edge Devices 15

Pi4 Pi5 Pi3_TPU Pi4_TPU Pi4 Pi5 Pi3_TPU Pi4_TPU Pi4 Pi5 Pi3_TPU Pi4_TPU
Pi5_TPU Orin Nano Pi3 Pi5_TPU Orin Nano Pi3 Pi5_TPU Orin Nano Pi3

20

Accuracy (mAP)

Accuracy (mAP)

Accuracy (mAP)
30 25
15 25 20
20 15
10 15
10 10
5 5 5
0 0 0
400 600 600
500 500

)
300 400

ms

ms

ms
0.000.05 0.0 0.0 400
300 300

e(

e(

e(
Energ0.100.15 200 0.1
Energ 0.1
Energ
y Co0.2 200 y C0.2 200

Tim

Tim

Tim
y Con 0.200.25 100 0.3
ns. (m 100 ons. (m
0.3 100
s. (m 0.30 0

er

er

er
Wh) Wh)0.4 0 Wh0.4 0

Inf

Inf

Inf
)

(a) SSD_v1 (b) SSD_lite (c) Det_lit0


Pi4 Pi5 Pi3_TPU Pi4_TPU Pi4 Pi5 Pi3_TPU Pi4_TPU Pi4 Pi5 Pi3_TPU Pi4_TPU
Pi5_TPU Orin Nano Pi3 Pi5_TPU Orin Nano Pi3 Pi5_TPU Orin Nano Pi3
Accuracy (mAP)

Accuracy (mAP)

Accuracy (mAP)
30 30 30
25 25
20 20 20
15 15
10 10 10
5 5
0 0 0
1200 2000 2500
1000 1500 2000
)

)
800
ms

ms

ms
0.0 600 0.0 0.2 1000 0.0 0.2 1500
e(

e(

e(
Energ0.2 400 Energ 0.4 Energ0.4 0.6 1000
im

im

im
y Con0.4 y Con 0.6 0.8 500 y Con 0.8 1.0
200 s. (m 1.2 0500
T

T
s. (m 0.6 s. (m 1.0 0
er

er

er
Wh) 0 Wh) Wh)
Inf

Inf

Inf
(d) Det_lite1 (e) Det_lite2 (f) Yolo8_n
Pi4 Pi5 Pi3_TPU Pi4_TPU Pi4 Pi5 Pi3_TPU Pi4_TPU
Pi5_TPU Orin Nano Pi3 Pi5_TPU Orin Nano Pi3

40
Accuracy (mAP)

Accuracy (mAP)
40
30 30
20 20
10 10
0 0
6000 12500
10000
)

)
ms

ms
0.0 0.5 4000 0 1 7500
e(

e(
Energ 1.0 2000 Energ 2 3 5000
im

im
y Con 1.5 2.0 y Con 4
s. (m 5 6 02500
T

s. (m 2.5 0 T
er

er
Wh) Wh)
Inf

Inf

(g) Yolo8_s (h) Yolo8_m

Fig. 8: Energy consumption per request (excluding base energy) versus inference
time versus accuracy for various object detection models

4 Related Work

This section provides an overview of the most relevant research on object de-
tection models for edge computing devices and compares our work with existing
related work as shown in Table 2. To the best of our knowledge, our work is
a unique study due to its comprehensive evaluation of various object detection
models and edge devices.
Cantero et al. [4] examine quantization levels and model architectures (SSD,
CenterNet, EfficientDet, Faster R-CNN) on the NXP i-MX8M-PLUS and Google
Coral Dev Board with EdgeTPU, using metrics like warm-up time, inference
time, and accuracy. In contrast, our study measures energy consumption and
evaluates YOLOv8 on Raspberry Pi 3, Pi 4, Pi 5, and Jetson Orin Nano. Also,
Kamath and Renuka [10] examine EfficientDet models with integer quantization
on Raspberry Pi, recommending EfficientDet0 and EfficientDet1 for resource-
constrained devices. Our study includes a broader range of models and also
measures energy consumption.
16 D. Alqahtani et al.

Kang and Somtham [11] evaluate YOLOv4-Tiny and SSD MobileNet V2


models on devices like Google Coral Dev Board Mini, NVIDIA Jetson Nano,
and Jetson Xavier NX, comparing detection accuracy, inference latency, and
energy efficiency. We included EfficientDet Lite and YOLOv8, and deployed
models on Raspberry Pi as well. Baller et al. [2] present DeepEdgeBench for
assessing DNN performance on devices like Asus Tinker Edge R, Raspberry Pi
4, Google Coral Dev Board, NVIDIA Jetson Nano, and Arduino Nano 33 BLE,
focusing on inference time, power consumption, and accuracy. Our study covers
recent models like EfficientDet Lite and YOLOv8. Moreover, Bulut et al. [3]
assess lightweight YOLO models (YOLOv5-Nano, YOLOX-Nano, YOLOX-Tiny,
YOLOv6-Nano, YOLOv6-Tiny, YOLOv7-Tiny) on NVIDIA Jetson Nano for
traffic safety, evaluating metrics like average precision, inference time, memory
usage, and energy consumption. Our evaluation includes YOLOv8, SSD, and
EfficientDet on Raspberry Pi, Edge TPU, and Jetson Orin Nano.
Chen et al. [5] deploy SSD-MobileNets models on Raspberry Pi 3 with Neural
Compute Sticks (NCS) for enhanced performance. Similarly, Zagitov et al. [21]
assess models (MobileNetV2 SSD, CenterNet MobileNetV2 FPN, EfficientDet,
YOLOv5, YOLOv7, YOLOv7 Tiny, YOLOv8) on Raspberry Pi and NVIDIA
Jetson Nano, using metrics like mAP, latency, and FPS. In contrast, our study
includes energy efficiency and deployment on Edge TPU. In addition, Galliera
and Suri [9] explore integrating deep learning accelerators with IoT devices for
low-latency decision-making, deploying models like YOLOv5 on devices such as
NVIDIA Jetson Nano, Jetson Xavier, Google Coral Dev Board, Google Coral
USB Accelerator, and Intel Movidius Neural Compute Stick 2. However, they
did not measure energy consumption.
Finally, while Magalhães et al. [15] evaluate a variety of heterogeneous plat-
forms, including GPU, TPU, and FPGA using the RetinaNet ResNet-50 model.
Lema et al. [12] assess YOLOv3, YOLOv5, and YOLOX models on devices like
NVIDIA Jetson Nano, Jetson AGX Xavier, and Google Coral Dev Board, using
the MS COCO dataset to analyze FPS relative to power consumption and cost.
However, these works do not consider other object detection models such as SSD
and EfficientDet, nor does it investigate these models on the Raspberry Pi.

Table 2: Comparison of Studies Based on Device Architectures and Key Criteria.


Study CPU TPU GPU YOLOv8 EfficientDet SSD Infer Time Energy Cons. mAP
[4] ✓ ✓ ✓ ✓ ✓
[10] ✓ ✓ ✓ ✓
[11] ✓ ✓ ✓ ✓ ✓ ✓
[2] ✓ ✓ ✓ ✓ ✓ ✓ ✓
[3] ✓ ✓ ✓ ✓
[5] ✓ ✓ ✓ ✓
[21] ✓ ✓ ✓ ✓ ✓ ✓ ✓
[9] ✓ ✓ ✓ ✓
[12] ✓ ✓ ✓ ✓ ✓
[15] ✓ ✓ ✓ ✓ ✓
Our Work ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
Benchmarking Object Detection Models on Edge Devices 17

5 Conclusions and Future Direction

In this paper, we evaluated the performance of deep learning object detection


models, including YOLOv8 (Nano, Small, Medium), EfficientDet Lite (Lite0,
Lite1, Lite2), and SSD (SSD MobileNet V1, SSDLite MobileDet), on edge devices
like Raspberry Pi 3, 4, and 5 (with/without TPU accelerators) and Jetson Orin
Nano. We developed an object detection application and deployed models across
these devices using frameworks like TensorFlow Lite, Edge TPU, PyTorch, and
TensorRT. We collected the mean Average Precision (mAP) metric and assessed
the models’ performance in terms of inference time and energy consumption.
Our evaluation reveals a trade-off between accuracy, energy consumption,
and inference time. The SSD_v1 model had the lowest energy consumption
and fastest inference time but was the least accurate. Jetson Orin Nano was
the fastest and most energy-efficient device for YOLOv8 models without com-
promising accuracy. However, converting SSD and EfficientDet_Lite models to
TensorRT framework reduced their accuracy. Edge TPU accelerator improved
the performance of SSD and EfficientDet Lite models without affecting accuracy
but significantly decreased the accuracy of YOLOv8 models.
For future work, we will examine different quantized models, such as int8 and
float16, when deploying them on Raspberry Pi devices and Jetson Orin Nano,
as these optimizations might significantly impact the results.

References
1. Balasubramaniam, A., Pasricha, S.: Object detection in autonomous vehicles: Sta-
tus and open challenges. arXiv preprint arXiv:2201.07706 (2022)
2. Baller, S.P., Jindal, A., Chadha, M., Gerndt, M.: Deepedgebench: Benchmarking
deep neural networks on edge devices. In: 2021 IEEE International Conference on
Cloud Engineering (IC2E). pp. 20–30. IEEE (2021)
3. Bulut, A., Ozdemir, F., Bostanci, Y.S., Soyturk, M.: Performance evaluation of
recent object detection models for traffic safety applications on edge. In: Proceed-
ings of the 2023 5th International Conference on Image Processing and Machine
Vision. pp. 1–6 (2023)
4. Cantero, D., Esnaola-Gonzalez, I., Miguel-Alonso, J., Jauregi, E.: Benchmarking
object detection deep learning models in embedded devices. Sensors 22(11), 4205
(2022)
5. Chen, C.W., Ruan, S.J., Lin, C.H., Hung, C.C.: Performance evaluation of edge
computing-based deep learning object detection. In: Proceedings of the 2018 VII
International Conference on Network, Communication and Computing. pp. 40–43
(2018)
6. Coral: Usb accelerator datasheet. Tech. rep., Google LLC,
https://coral.ai/docs/accelerator/datasheet/ (2019)
7. Coral: Object detection (May 2024), https://coral.ai/models/
object-detection/
8. Foundation, R.P.: About us (May 2024), https://www.raspberrypi.org/about/
9. Galliera, R., Suri, N.: Object detection at the edge: Off-the-shelf deep learning
capable devices and accelerators. Procedia Computer Science 205, 239–248 (2022)
18 D. Alqahtani et al.

10. Kamath, V., Renuka, A.: Performance analysis of the pretrained efficientdet for
real-time object detection on raspberry pi. In: 2021 International Conference on
Circuits, Controls and Communications (CCUBE). pp. 1–6. IEEE (2021)
11. Kang, P., Somtham, A.: An evaluation of modern accelerator-based edge devices
for object detection applications. Mathematics 10(22), 4299 (2022)
12. Lema, D.G., Usamentiaga, R., García, D.F.: Quantitative comparison and perfor-
mance evaluation of deep learning-based object detection models on edge comput-
ing devices. Integration 95, 102127 (2024)
13. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P.,
Zitnick, C.L.: Microsoft coco: Common objects in context. In: Computer Vision–
ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12,
2014, Proceedings, Part V 13. pp. 740–755. Springer (2014)
14. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd:
Single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European
Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part
I 14. pp. 21–37. Springer (2016)
15. Magalhães, S.C., dos Santos, F.N., Machado, P., Moreira, A.P., Dias, J.: Bench-
marking edge computing devices for grape bunches and trunks detection using
accelerated object detection single shot multibox deep learning models. Engineer-
ing Applications of Artificial Intelligence 117, 105604 (2023)
16. Nvidia: Nvidia jetson orin (May 2024), https://www.nvidia.com/en-us/
autonomous-machines/embedded-systems/jetson-orin/
17. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified,
real-time object detection. In: Proceedings of the IEEE conference on computer
vision and pattern recognition. pp. 779–788 (2016)
18. Tan, M., Pang, R., Le, Q.V.: Efficientdet: Scalable and efficient object detection. In:
Proceedings of the IEEE/CVF conference on computer vision and pattern recog-
nition. pp. 10781–10790 (2020)
19. Ultralytics: Home (2024), https://docs.ultralytics.com/
20. Voxel51: Fiftyone (May 2024), https://voxel51.com/fiftyone/
21. Zagitov, A., Chebotareva, E., Toschev, A., Magid, E.: Comparative analysis of
neural network models performance on low-power devices for a real-time object
detection task. Computer 48(2) (2024)

You might also like