[go: up one dir, main page]

0% found this document useful (0 votes)
40 views8 pages

YOLOv5 Parallel Processing Comparison

This study evaluates three YOLOv5 implementations—Standard, Multiprocessing, and Multithreading—within a GPU-enabled Google Colab environment, focusing on performance metrics like FPS and inference time. Multithreading outperformed the others with 55.28 FPS and 18.09 ms inference time, while Multiprocessing showed significant underperformance. The findings recommend Multithreading for speed and Standard for resource efficiency, providing a replicable framework for developers.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views8 pages

YOLOv5 Parallel Processing Comparison

This study evaluates three YOLOv5 implementations—Standard, Multiprocessing, and Multithreading—within a GPU-enabled Google Colab environment, focusing on performance metrics like FPS and inference time. Multithreading outperformed the others with 55.28 FPS and 18.09 ms inference time, while Multiprocessing showed significant underperformance. The findings recommend Multithreading for speed and Standard for resource efficiency, providing a replicable framework for developers.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Abstract

This study compares three YOLOv5 implementations—Standard, Multiprocessing, and


Multithreading—in a GPU-enabled Google Colab environment. Evaluated metrics include
FPS, inference time, CPU/GPU usage, and memory usage. Multithreading achieved the
highest FPS (55.28) and lowest inference time (18.09 ms), while Multiprocessing under-
performed (1.85 FPS, 540.01 ms). The results highlight trade-offs, recommending Multi-
threading for speed and Standard for resource efficiency. Practical details and a replicable
framework are provided.

1
YOLOv5 Parallel Processing Implementation
Comparison

Zawad Ishmam Hriddo


BRAC University

May 2025

1 Introduction
Despite its efficiency, YOLOv5’s performance under parallel processing remains underex-
plored. YOLOv5, developed by Ultralytics (YOLOv5 Repository), is a state-of-the-art object
detection model renowned for its balance of speed and accuracy. It is widely used in real-
time applications such as autonomous driving, surveillance, and robotics. As computational
demands for such tasks increase, optimizing inference through parallelization techniques is
critical. This research compares three implementations of YOLOv5: a standard sequential
approach, a multiprocessing approach leveraging multiple CPU cores, and a multithreading
approach for concurrent processing. The study evaluates these implementations based on per-
formance metrics including Frames Per Second (FPS), average inference time, CPU usage,
memory usage, GPU usage, and GPU memory usage. Conducted in a GPU-enabled Google
Colab environment, the findings provide insights into the trade-offs of each method, guiding
developers in selecting the most suitable approach for specific use cases.

The research also incorporates practical implementation details, including environment setup,
data preparation, and empirical results from a Jupyter Notebook executed in Google Colab.
These details enhance the study by offering a replicable framework for developers to test and
extend the findings.

2 Methodologies
2.1 Standard YOLOv5
The standard implementation serves as the baseline, processing images sequentially, often in
batches, to leverage PyTorch’s built-in optimizations and GPU acceleration. This approach is
straightforward, relying on efficient batching to maximize GPU throughput. However, it may
underutilize CPU resources during GPU-bound inference, as preprocessing and inference occur
sequentially.

2
2.2 YOLOv5 with Multiprocessing
The multiprocessing implementation uses Python’s multiprocessing library (Multipro-
cessing Documentation) to create multiple processes, each handling a subset of images or
pipeline stages. By bypassing Python’s Global Interpreter Lock (GIL), it enables true par-
allel computation across CPU cores. Each process loads its own model instance, increasing
memory usage. Challenges include:

• Process startup overhead.

• Memory overhead due to independent model instances.

• Inter-process communication complexity.

• Potential GPU contention when multiple processes access a single GPU.

2.3 YOLOv5 with Multithreading


The multithreading implementation employs Python’s threading library (Threading Docu-
mentation) to create threads within a single process, sharing memory and reducing overhead
compared to multiprocessing. While the GIL limits true parallelism for CPU-bound tasks, mul-
tithreading excels in I/O-bound operations, such as image loading and preprocessing, and can
overlap with GPU computations. Key considerations include:

• Lower memory overhead due to shared memory space.

• Faster thread creation compared to processes.

• Simpler data sharing.

• Challenges in managing thread safety and GPU serialization.

3 Experimental Setup
The experiments were conducted in a Google Colab environment with the following specifica-
tions:

• Hardware: NVIDIA T4 GPU (CUDA-enabled), multi-core CPU (e.g., Intel Xeon).

• Software: Python 3.11, PyTorch, YOLOv5 repository (version 7.0), psutil for re-
source monitoring, pynvml (pynvml Documentation) for GPU metrics.

• Dataset: 20 images (JPEG or PNG) from the COCO dataset, stored in a test images
folder.

• Parameters: Batch size of 1 for standard and multithreading implementations; adjusted


for multiprocessing based on CPU core count. All implementations used CUDA for GPU
acceleration.

3
3.1 Environment Setup
The YOLOv5 environment was configured using the following commands in Google Colab:

!pip install torch torchvision opencv-python matplotlib psutil


!git clone https://github.com/ultralytics/yolov5
%cd yolov5
!pip install -r requirements.txt
%cd ..

These commands installed necessary libraries and cloned the YOLOv5 repository, ensuring
compatibility with the experimental setup.

3.2 Data Preparation


Test images were sourced from the COCO dataset (COCO Dataset) using:

!wget https://github.com/ultralytics/yolov5/releases/download/v1.0/coco12
!unzip coco128.zip

The images/train2017 folder was used, with 20 images selected to create a manageable
test set for rapid experimentation. This small dataset size facilitated quick evaluation but may
limit insights into scaling behavior with larger datasets.

3.3 Practical Considerations


The implementation included robust error handling, such as checking for CUDA availability
and falling back to CPU if necessary. Additionally, the pynvml library was used to track
GPU metrics, with fallbacks in case of import failures, ensuring adaptability across different
environments.

4 Performance Metrics
The following metrics were evaluated:

• Frames Per Second (FPS): Number of images processed per second (higher is better).

• Average Inference Time: Time to process one image, measured in milliseconds (lower
is better).

• CPU Usage: Average and maximum CPU utilization (%).

• Memory Usage: Average and maximum system memory used (MB).

• GPU Usage: Average and maximum GPU utilization (%), reflecting CUDA efficiency.

• GPU Memory Usage: Average and maximum GPU memory used (MB).

Metrics were collected using psutil for CPU and memory, and pynvml for GPU metrics,
ensuring accurate resource monitoring during inference.

4
5 Results
The empirical results from running the implementations on 20 images are summarized in the
following table:

Table 1: Performance Metrics for YOLOv5 Implementations

Metric Standard Multiprocessing Multithreading


FPS 28.68 1.85 55.28
Avg Inference Time (ms) 34.87 540.01 18.09
Avg CPU Usage (%) 67.2 99.6 67.7
Max CPU Usage (%) 99.9 100.0 100.0
Avg Memory Usage (MB) 1234.6 1324.8 1405.3
Max Memory Usage (MB) 1316.4 1325.5 1411.7
Avg GPU Usage (%) 19.4 3.2 26.1
Max GPU Usage (%) 28.0 56.0 94.0
Avg GPU Memory Usage (MB) 622.3 851.7 732.7
Max GPU Memory Usage (MB) 685.9 1545.9 735.9

As shown in Table 1 and Figure 1, these results were visualized using line graphs generated
by a plot comparisons function, which plotted FPS, CPU usage, memory usage, GPU
usage, average inference time, and GPU memory usage. The line graphs, with distinct col-
ors and markers for each metric, provided a clear comparison of performance trends across
implementations.

6 Discussion
6.1 Performance Analysis
• FPS and Inference Time: Multithreading significantly outperforms others, achieving
55.28 FPS and an 18.09 ms inference time. This suggests efficient overlapping of CPU
and GPU tasks, maximizing throughput. The Standard implementation (28.68 FPS, 34.87
ms) is a reliable baseline but is surpassed by Multithreading. Multiprocessing performs
poorly (1.85 FPS, 540.01 ms), likely due to process creation overhead and model loading
in each process.

• CPU Usage: Multiprocessing has the highest average CPU usage (99.6%), indicating
full CPU core utilization but at the cost of efficiency. Multithreading and Standard have
similar average usage ( 67%) but reach 100% maximum usage, reflecting occasional CPU
spikes.

• Memory Usage: Multithreading uses the most memory (1405.3 MB avg), followed by
Multiprocessing (1324.8 MB) and Standard (1234.6 MB). Multithreading’s shared mem-
ory model is efficient but still incurs higher usage due to concurrent operations.

• GPU Usage: Multithreading maximizes GPU utilization (26.1% avg, 94.0% max), com-
pared to Standard (19.4% avg, 28.0% max) and Multiprocessing (3.2% avg, 56.0% max).

5
Figure 1: Performance Comparison of YOLOv5 Implementations: Line graphs showing FPS,
CPU Usage, Memory Usage, GPU Usage, Average Inference Time, and GPU Memory Usage
for Standard, Multiprocessing, and Multithreading methods.

Multiprocessing’s low average GPU usage indicates inefficient GPU sharing.

• GPU Memory Usage: Multiprocessing has the highest maximum GPU memory usage
(1545.9 MB), likely due to multiple model instances. Multithreading (732.7 MB avg)
and Standard (622.3 MB avg) are more memory-efficient.

6.2 Key Insights


• Multithreading is ideal for GPU-bound tasks, leveraging CUDA effectively by overlap-
ping I/O and inference.

• Multiprocessing is less effective in this GPU setup due to overhead and GPU contention,
but it may perform better in CPU-only environments.

• The Standard implementation balances performance and resource usage, suitable for
resource-constrained systems.

6
6.3 Practical Implementation
The Jupyter Notebook provided complete Python code for each implementation, including
functions for:

• Model Loading: Using attempt load from YOLOv5’s models.experimental.

• Image Preprocessing: A custom letterbox function to resize and pad images to


640x640.

• Parallel Processing: run standard yolov5, run multiprocessing yolov5,


and run multithreading yolov5 functions, with specific handling for batch pro-
cessing, process pools, and thread queues.

The code included robust error handling, such as checking for CUDA availability and handling
missing pynvml, making it adaptable to various environments.

6.4 Limitations
• The results are specific to a GPU environment with a small dataset (20 images). Larger
datasets or CPU-only setups may yield different outcomes.

• The batch size and number of threads/processes were not extensively tuned, which could
further optimize performance.

• The single-GPU setup (NVIDIA T4) limits insights into multi-GPU scenarios.

7 Conclusion
This research highlights Multithreading as the most effective YOLOv5 implementation in a
GPU-enabled environment, offering superior FPS (55.28) and low inference time (18.09 ms)
due to efficient GPU utilization. However, it demands higher memory and CPU resources.
The Standard implementation provides a balanced alternative, while Multiprocessing underper-
forms due to significant overhead. For real-time applications prioritizing speed, Multithreading
is recommended. In resource-constrained or CPU-only scenarios, the Standard or Multipro-
cessing approaches may be more appropriate, depending on specific requirements. Developers
are encouraged to prioritize Multithreading for GPU-intensive tasks and explore hybrid ap-
proaches in future studies.

Future work could explore larger datasets, multi-GPU setups, and parameter tuning to refine
these findings. The practical implementation details and empirical results from the Jupyter
Notebook provide a replicable framework for further experimentation.

8 References
• Ultralytics YOLOv5 Repository for Object Detection

• Python Multiprocessing Library Documentation

7
• Python Threading Library Documentation

• pynvml Library for NVIDIA GPU Monitoring

• COCO Dataset for Computer Vision Tasks

You might also like