[go: up one dir, main page]

0% found this document useful (0 votes)
10 views13 pages

Seg 2

medical image analysis

Uploaded by

Pandiya Rajan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views13 pages

Seg 2

medical image analysis

Uploaded by

Pandiya Rajan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Advances in Science and Technology

Research Journal
Advances in Science and Technology Research Journal 2023, 17(6), 127–139 Received: 2023.06.30
https://doi.org/10.12913/22998624/172771 Accepted: 2023.09.22
ISSN 2299-8624, License CC-BY 4.0 Published: 2023.11.21

A Comparative Analysis of Image Segmentation Using Classical


and Deep Learning Approach

Arsen Plaksyvyi1, Maria Skublewska-Paszkowska1*, Paweł Powroznik1


1
Lublin University of Technology, Faculty of Electrical Engineering and Computer Science, Department of
Computer Science, Nadbystrzycka 38D, 20-618 Lublin, Poland
* Corresponding author’s e-mail: maria.paszkowska@pollub.pl

ABSTRACT
Segmentation is one of the image processing techniques, widely used in computer vision, to extract various types
of information represented as objects or areas of interest. The development of neural networks has influenced
image processing techniques, including creation of new ways of image segmentation. The aim of this study is
to compare classical algorithms and deep learning methods in RGB image segmentation tasks. Two hypoth-
eses were put forward: (1) The quality of segmentation applying deep learning methods is higher than using
classical methods for RGB images, and (2) The increase of the RGB image resolution has positive impact on
the segmentation quality. Two traditional segmentation algorithms (Thresholding and K-means) were compared
with deep learning approach (U-Net, SegNet and FCN 8) to verify RGB segmentation quality. Two resolutions
of images were taken into consideration: 160x240 and 320x480 pixels. Segmentation quality for each algo-
rithm was estimated based on four parameters: Accuracy, Precision, Recall and Sorensen-Dice ratio (Dice score).
In the study the Carvana dataset, containing 5,088 high-resolution images of cars, was applied. The initial set
was divided into training, validation and test subsets as 60%, 20%, 20%, respectively. As a result, the best Ac-
curacy, Dice score and Recall for images with resolution 160x240 were obtained for U-Net, achieving 99.37%,
98.56%, and 98.93%, respectively. For the same resolution the highest Precision 98.19% was obtained for FCN-8
architecture. For higher resolution, 320x480, the best mean Accuracy, Dice score, and Precision were obtained
for FCN-8 network, reaching 99.55%, 99.95% and 98.85%, respectively. The highest results for classical meth-
ods were obtained for Threshold algorithm reaching 80.41% Accuracy, 58.49% Dice score, 67.32% Recall and
52.62% Precision. The results confirm both hypotheses.

Keywords: image segmentation, pattern recognition, artificial neural networks.

INTRODUCTION a significant area of the image, thus separating


objects or parts of the scene or image [3].
The increase of computing power gave an There are many types of image segmentation,
impulse to the development of new methods of but three are the most common: semantic [4],
image processing. Artificial intelligence technol- instance [5] and panoptic [6]. Semantic segmen-
ogies, including deep learning, have become the tation involves combining pixels based on their
most popular and growing in this area over the belonging to a semantic class. Instance segmen-
past few years [1]. Segmentation, as a method of tation introduces pixel classifications based on
image processing, was introduced in the 1960s., their belonging to an object instance. Unlike se-
over the past decade, this technology has under- mantic segmentation, this method does not allow
gone significant changes as a result of machine to determine a class, but only to select an area
development and deep learning [2]. Segmentation based on the edges of a single object. On the other
involves splitting images into groups of regions, hand, panoptic segmentation is a combination of
also called image segments. Each of it represents the first two methods and allows to determine the

127
Advances in Science and Technology Research Journal 2023, 17(6), 127–139

area that belongs to an instance of the object and The most extensive studies contain articles
to classify the area based on the semantic suit- comparing different segmentation methods. The
ability of the class. study described in [16] compares the effective-
Nowadays, segmentation, as well as other ness of classic segmentation algorithms with
image processing methods, is widely used in U-Net. The specificity of a given work is that
computer vision [7-9]. This type of algorithms it used an original dataset, which is a set of im-
can be divided into several groups: threshold- ages with various configurations of geometric
based segmentation, region-based segmenta- images. Furthermore, the given work included
tion, edge-based segmentation, clustering-based a single segmentation technique using a deep
segmentation, graph-based segmentation, active neural network, which did not make it possible
contour-based segmentation, and segmentation to carry out a comparative analysis clearly and
algorithms that use deep learning neural networks to map the results of the study to a set of pho-
[10]. The first group of algorithms is the oldest tos of reality. The authors of [17] study the pos-
and perform segmentation using a given type of sible correlation between image resolution and
algorithms involves finding similarities among the effectiveness of algorithms. They compared
pixels. A feature of the next group of algorithms FCN and U-Net algorithms to segment road
is the principle of image segmentation based on network images in two resolutions: 256x256
the edges of individual objects. Clustering algo- and 512x512. Another study presented in [18]
rithms, such as the K-means algorithm, iteratively presents a comparison of three neural network
assign pixels to a certain group, creating regions models: FCN, U-Net, and DeepLab. The authors
composed of pixels with similar features. Deep aimed to determine the quality of segmentation
learning image segmentation is a technique that and the computational efficiency of a particular
relies on the functioning of deep neural networks. method. A specific set of photographs showing a
This class includes a large number of algorithms, top view captured by the camera was selected for
the following are among the most popular: Deep- the study. The challenge for the authors was that
Lab - convolutional deep learning networks, Ful- the images contained groups or individuals taken
ly Convolutional Network (FCN), deep learning from different angles. The authors of [19] focus
networks based on encoder-decoder architecture: on comparing two types of neural networks: con-
U-Net and SegNet, Region-based Convolutional volutional neural networks (CNN) and fully con-
Neural Network (R-CNN) and others [11-13]. volutional neural networks (FCNN). The study
The segmentation methods and image anal- was conducted in the context of classifying areas
yses may be found in many papers. The study in underwater images of coral reef ecosystems
presented in [14] contains a detailed description into biologically relevant categories. It includes
of the deep convolutional neural network archi- 6 patch-based CNN models (a special variant
tecture for semantic pixel segmentation called of CNN, used for image segmentation) and 4
SegNet. The authors compare the segmentation FCNN approaches. The algorithms are compared
precision for different architectures for 2 datas- in terms of pixel accuracy and the CNN models
ets CamVid and SUNRGB. A comparison of the give better results than FCNN. In [20] the qual-
calculation time and hardware resources required ity of segmentation methods on complex im-
for the different deep learning architectures was ages of immunofluorescence cells is presented.
also analyzed, which allowed to confirm the ef- It compares 5 deep learning methods and 2
fectiveness of the presented neural network. An- classic methods in the form of the h-min based
other study concerns a new approach called Edge- watershed algorithm and attributed relational
Based Segmentation Network (ESNet) for Real- graphs. The authors also proposed and evalu-
Time Semantic Semantic Segmentation in Traffic ated a new strategy of adding artificial imagery
Scenes [15]. The authors describe the network to extend the training set. The effects of various
and compare it with other artificial intelligence factors such as image scaling, annotation quality
approaches such as: FCN-8, PSPNet, SegNet and and post-processing methods on segmentation
DeepLab. The main goal was to demonstrate the effectiveness were also compared. Furthermore,
advantages of the proposed ESNet in image seg- the results were compared with those of manual
mentation in terms of computational efficiency image segmentation performed by experts. The
without compromising segmentation quality. The study described in [21] demonstrated a different
Cityscapes dataset was chosen for this study. approach for comparing segmentation methods.

128
Advances in Science and Technology Research Journal 2023, 17(6), 127–139

The authors test various modifications of the U- improving the CNN performance by adding n-
Net model, such as: U-Net Vgg16, U-Net Incep- sigmoid function with Squeeze-and-Excitation
tionResNetV2, U-Net DenseNet121 in terms of block [22] or improving segmentation perfor-
the quality of segmentation and buildings extrac- mance by introducing Convolutional Block At-
tion from aerial photos of Chicago. tention Module (CBAM) for U-Net architecture
Based on the literature review it can be stat- [23]. As a response for Kaggle Carvana Image
ed that no study has presented a comparative Masking Competition in 2017 [24] that dealt
analysis of deep neural network and classical with extracting cars in high resolution from
segmentation methods while also considering the background, the study was performed with
the effect of resolution changing on segmenta- U-Net and RefineNet with various pre-trained
tion quality. The novelty of the study is to com- CNN architectures [25]. The process of training
pare the classical segmentation algorithms with was accelerated as well as the network perfor-
ones of deep learning approaches. An impact of mance was improved.
image resolution for segmentation performance
was also verified that is a very important aspect
in this type of study. Also, the research was car- MATERIAL AND METHODS
ried out on a set of images of cars of various
shapes and colors in the RGB format, which has Dataset
not been used in research so far. Therefore, the
Images from the Carvana dataset are selected
aim of the research is a comparative analysis of
for this study. The analysis will relate to the seg-
classical methods (K-means clustering, Thresh-
mentation performance of the selected techniques
old segmentation) and deep learning methods
for two different resolutions. The chosen dataset
(U-Net, FCN, SegNet) in RGB image segmenta-
tion tasks. The study aims to confirm or reject was created for the Carvana Image Masking Chal-
the following hypotheses: lenge [26] by Carvana company in 2017. This
H1. The quality of segmentation, understood collection contains 5088 photos of cars in RGB
as accuracy, precision, recall and Sorensen- format (Fig. 1) with a resolution of 1280x1918
Dice coefficient, applying deep learning pixels. 318 passenger used cars of different
methods is higher than using classical brands and colors were selected for the photos.
methods for RGB images. They were taken using a rotating photo studio that
H2. The increase of the RGB image resolution automatically captured and processed 16 photos
has positive impact on the segmentation of the car from different angles (Fig. 2). All the
quality. photos in the set also have a corresponding bi-
nary mask, marking the area occupied by the car.
The Carvana dataset, used in this study, In this experiment, the data set was divided into
was successful dataset containing various used three sets: training – 3056 photos (~60%), valida-
cars created for the purpose of selling them on- tion – 1024 photos (~20%), and testing – 1008
line. It was utilized in many research involving photos (~20%).

Fig. 1. Images and masks from Carvana dataset [26]

129
Advances in Science and Technology Research Journal 2023, 17(6), 127–139

Fig. 2. 360-degree vehicle photography for Carvana dataset [27]

Deep learning segmentation methods accurate. This method allows to detect even very
small objects in images, which is why a given
U-Net
architecture is commonly used to aid diagnosis,
U-Net (Fig. 3) is a convolutional neural net- treatment planning and tracking disease progress
work architecture designed for image segmenta- with computer tomography image analysis.
tion tasks. There are two major components to the
network: encoders and decoders [28] The encoder FCN
part consists of repeated use of convolutions, fol-
FCN (Fig. 4) is a neural network architecture
lowed by rectified linear unit (ReLU) and the max
created for the solution of image segmentation
pooling operation which results in downsampling
and increases the number of feature channels. The tasks [29]. It is an encoder and decoder archi-
decoder path consists of an upsampling of the tecture, similar to U-Net. The FCN includes an
feature map, up-convolution layers followed by encoder in the form of convolutional layers for
concatenation with the correspondingly cropped image downsampling. The encoder usually uses
feature map, and convolutions, each followed by the VGG16 network [30], which is designed for
ReLU. The skip connections between encoder image recognition tasks and has 16 layers includ-
and decoder allow segmentation mask to be more ing (13 convolutional and three fully connected)

Fig. 3. U-Net architecture [28]

130
Advances in Science and Technology Research Journal 2023, 17(6), 127–139

Fig. 4. FCN architecture [29]

and additional five max-pooling layers. Unlike Classical segmentation methods


U-Net, FCN’s decoder path is not symmetrical
to the encoder. The decoder contains upsampling Threshold segmentation
layers that enable 32x, 16x, or 8x upsampling and Threshold segmentation is one of the most
allows fusing deep features from deeper convo- popular and simple image segmentation algo-
lutional layers and spatial location information rithms [32]. This algorithm determines new pixel
from shallower layers, resulting in more precise values based on one or more threshold values. In
feature output map. binary segmentation all pixel values below the
threshold value are set to black and those above
SegNet to white. The specified segmentation algorithm is
SegNet (Fig. 5) is a neural network architec- used for both grayscale and color images in RGB,
ture created for image segmentation [31]. The HSF, or YUV format. In the second case, new
structure of this network is similar to that of U-Net pixel values are determined based on the values
and FCN networks. SegNet consists of two parts: of each color channel.
encoder and decoder. The encoder part consists of
K-means segmentation
13 layers of convolutions corresponding to the 13
layers of the VGG16 network. Unlike FCN, Seg- The k-means algorithm is a distance-based
Net does not contain three fully connected layers. data clustering algorithm [33]. Segmentation
Each encoder layer has a corresponding decoder using the specific algorithm allows dividing an
layer, resulting in a symmetrical network struc- image into a defined number of groups. Sub-
ture. Unlike the U-Net network, SegNet does not sequent pixels are assigned to each group and
use pooling indices, but allows the entire feature the average value, also called group centroid, is
map to be sent directly to the decoder. determined from their respective pixel values

Fig. 5. SegNet architecture [31]

131
Advances in Science and Technology Research Journal 2023, 17(6), 127–139 = TP + TN
𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴
TP + TN + FP + FN
Table 1. Stand specification TP
CPU 6 cores, 12 threads 𝑃𝑃𝑃𝑃𝐴𝐴𝐴𝐴𝑃𝑃𝑃𝑃𝐴𝐴𝐴𝐴𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 = (2)
TP + FP
Clock speed 2.2GHz TP + TN
GPU Base Core Clock: 1290 MHz 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 = TP
𝑅𝑅𝑅𝑅𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅TP = + TN + FP + FN (3)
GPU Memory 4GB TP + FN
RAM 12GB TP
𝑃𝑃𝑃𝑃𝐴𝐴𝐴𝐴𝑃𝑃𝑃𝑃𝐴𝐴𝐴𝐴𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃2 =
× |TP 𝑋𝑋𝑋𝑋 ×+𝑌𝑌𝑌𝑌|
Disc SSD 1TB 𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷 = FP (4)
| 𝑋𝑋𝑋𝑋 + 𝑌𝑌𝑌𝑌 |
TP
where: X – a set 𝑅𝑅𝑅𝑅𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑅𝑅𝑅𝑅 𝑅𝑅𝑅𝑅 = 𝐴𝐴𝐴𝐴𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃pixels;
2 ×of(𝑃𝑃𝑃𝑃𝐴𝐴𝐴𝐴𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃
predicted × 𝑅𝑅𝑅𝑅𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅)
(RGB channel values, grayscale values). Pixels TP + FN
𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷
Y –=a set(𝑃𝑃𝑃𝑃𝐴𝐴𝐴𝐴𝑃𝑃𝑃𝑃𝐴𝐴𝐴𝐴𝑃𝑃𝑃𝑃
of reference pixels.
are added to groups based on the distance from 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 × 𝑅𝑅𝑅𝑅𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅)
This metric also 2 × combines | 𝑋𝑋𝑋𝑋 × 𝑌𝑌𝑌𝑌| precision and
the pixel to the centroid of each group. There- 𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷 =
fore, the similarity to other pixels is determined recall measures:| 𝑋𝑋𝑋𝑋 + 𝑌𝑌𝑌𝑌 |
by the distance.
2 × (𝑃𝑃𝑃𝑃𝐴𝐴𝐴𝐴𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝐴𝐴𝐴𝐴𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 × 𝑅𝑅𝑅𝑅𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅)
𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷 = (5)
(𝑃𝑃𝑃𝑃𝐴𝐴𝐴𝐴𝑃𝑃𝑃𝑃𝐴𝐴𝐴𝐴𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 × 𝑅𝑅𝑅𝑅𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅)
Research methodology
Research stand
Algorithms and deep learning model accuracy RESULTS
tests were performed on an ASUS TUF Gaming
Segmentation quality for 160x240 resolution
FX504 laptop with the specifications in Table 1.
Three models of neural networks have been
Performance measures trained on images with a resolution of 160x240
Two traditional segmentation algorithms (K- pixels in the first part of this study. Graphs of
means and Threshold) and three deep learning al- changes in loss and accuracy during the imaging
gorithms (U-Net, FCN, and SegNet) were select- training process for individual sets of images are
ed to check and compare the quality of segmen- shown in Figures 6 and 7.
tation as well as the stated research hypotheses. To determine the quality of segmentation,
The obtained results were evaluated using four four metrics were calculated: Accuracy, Preci-
measures: Accuracy (Eq. 1), Precision (Eq. 2), sion, Recall and Dice score. The results for 3
Recall (Eq. 3) and the Sorensen-Dice coeffi- neural network models are presented in Table
cient – DSC (Eq. 4 and 5). Accuracy rate mea- 2-5, showing the minimum, maximum, mean and
sures how often the algorithm performs image standard deviation values for each metric, deter-
segmentation correctly. This rate is expressed mined on a 20-fold verification basis.
as a percentage. The Sorensen-Dice score, also The results of the comparison of the mean
known as the F1 score, is a machine learning values of the individual metrics calculated for the
evaluation metric that determines the match be- neural networks and for the conventional algo-
tween predicted segmentations and their corre- rithms are gathered in Table 6.
sponding true values.
TP + TN Study of the impact of image resolution
𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 = (1)
TP + TN + FP + FN on segmentation quality

where: TP – True Positive; TP In the second part of this study three mod-
TN –𝑃𝑃𝑃𝑃𝐴𝐴𝐴𝐴𝑃𝑃𝑃𝑃𝐴𝐴𝐴𝐴𝑃𝑃𝑃𝑃
True𝑃𝑃𝑃𝑃Negative,
𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 =
TP + FP els of neural networks have been trained on im-
FP – False Positive; ages with a resolution of 320 x 480 pixels. Due
FN – False Negative. TP to the change in image resolution, the size of the
𝑅𝑅𝑅𝑅𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 =
TP + FN
Table 2. Accuracy for resolution 160x240 in %
2 × | 𝑋𝑋𝑋𝑋 × 𝑌𝑌𝑌𝑌|
Model𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷 = Minimum value Maximum value Mean Value Standard deviation
U-Net
| 𝑋𝑋𝑋𝑋99.39518605
+ 𝑌𝑌𝑌𝑌 | 99.36156831 99.37869636 0.006505737
FCN-8 99.31386553 99.29425 99.30257 0.00439
2 × (𝑃𝑃𝑃𝑃𝐴𝐴𝐴𝐴𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝐴𝐴𝐴𝐴𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 × 𝑅𝑅𝑅𝑅𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅)
𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷
SegNet = 98.42643142 98.34756715 98.39049364 0.014620049
(𝑃𝑃𝑃𝑃𝐴𝐴𝐴𝐴𝑃𝑃𝑃𝑃𝐴𝐴𝐴𝐴𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 × 𝑅𝑅𝑅𝑅𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅)

132
Advances in Science and Technology Research Journal 2023, 17(6), 127–139

Table 3. Dice score for resolution 160x240 in %


Model Minimum value Maximum value Mean Value Standard deviation
U-Net 98.55973053 98.55955505 98.55964012 2.96021E-05
FCN-8 98.35466003 98.35455 98.35460 0.00002
SegNet 96.30705261 96.30695343 96.30699577 2.3613E-05

Table 4. Recall for resolution 160x240 in %


Model Minimum value Maximum value Mean Value Standard deviation
U-Net 98.94632578 98.91670325 98.93376738 0.006388341
FCN-8 98.56549413 98.50299 98.53718 0.01205
SegNet 98.22743008 98.12816075 98.174576 0.022404156

Table 5. Precision for resolution 160x240 in %


Model Minimum value Maximum value Mean Value Standard deviation
U-Net 98.24112535 98.10515674 98.17225146 0.034336691
FCN-8 98.22535624 98.16894 98.19469 0.01504
SegNet 94.64933175 94.35207994 94.49413146 0.055905289

Table 6. The comparison of algorithms for resolution 160x240 in %


Method Accuracy Dice score Recall Precision
U-Net 99.3787 98.5596 98.9338 98.1723
FCN-8 99.3026 98.3546 98.5372 98.1947
SegNet 98.3905 96.307 98.1746 94.4941
Threshold 75.2768 58.4872 67.3221 46.0944
K-Means 62.8176 44.8273 56.3527 30.5582

Fig. 6. Plots of loss for U-Net, FCN and SegNet models

133
Advances in Science and Technology Research Journal 2023, 17(6), 127–139

Fig. 7. Plots of accuracy for U-Net, FCN and SegNet models

Fig. 8. Plots of loss for U-Net, FCN and SegNet models

134
Advances in Science and Technology Research Journal 2023, 17(6), 127–139

Fig. 9. Plots of accuracy for U-Net, FCN and SegNet models

Table 7. Accuracy for resolution 320x480 in %


Model Minimum value Maximum value Mean Value Standard deviation
U-Net 99.53352939 99.44035391 99.49113923 0.019060937
FCN-8 99.56414176 99.54859211 99.55717859 0.00406562
SegNet 99.24130222 99.20627384 99.22529872 0.009252595

Table 8. Dice score for resolution 320x480 in %


Model Minimum value Maximum value Mean Value Standard deviation
U-Net 98.85855103 98.84001923 98.84826775 0.003995895
FCN-8 98.96173859 98.95807648 98.95945206 0.000914993
SegNet 98.20555878 98.19795227 98.20186691 0.001835556

Table 9. Recall for resolution 320x480 in %


Model Minimum value Maximum value Mean Value Standard deviation
U-Net 99.38927243 99.34984778 99.3741179 0.009097651
FCN-8 99.09820294 99.04066392 99.070748 0.011078916
SegNet 98.85431778 98.77989702 98.81829234 0.015595463

Table 10. Precision for resolution 320x480 in %


Model Minimum value Maximum value Mean Value Standard deviation
U-Net 98.45008363 98.04797398 98.26105981 0.083711949
FCN-8 98.88313856 98.82378904 98.85661844 0.01407866
SegNet 97.63833631 97.51946833 97.5782649 0.028655288

135
Advances in Science and Technology Research Journal 2023, 17(6), 127–139

Table 11. The comparison of algorithms for resolution 320x480 in %


Method Accuracy Dice score Recall Precision
U-Net 99.491 98.848 99.374 98.261
FCN-8 99.557 98.959 99.071 98.856
SegNet 99.225 98.202 98.818 97.578
Threshold 80.411 54.618 58.907 52.618
K-Means 78.484 54.270 55.984 49.352

Table 12. The state-of-the-art comparison


Methods Dataset Accuracy Dice score Recall Precision Study
U-Net with CBAM Kaggle Carvana NA 98.00-98.71% NA NA [23]
U-Net with n-sigmoid
Kaggle Carvana 99.67% NA NA NA [22]
activation function
U-Net Kaggle Carvana 99.30% NA NA NA [22]
U-Net ResNet Kaggle Carvana NA 99.69 NA NA [25]
Inception-V1 VeRi 80.08 81.00 NA NA [34]
Inception-V1 VehicleID 84.40 86.90 NA NA [34]
VggNet-16 VeRi 80.17 80.43 NA NA [34]
VggNet-16 VehicleID 85.10 89.17 NA NA [34]
ResNet-50 VeRi 80.12 82.46 NA NA [34]
ResNet-50 VehicleID 84.70 87.31 NA NA [34]
MobileNet VeRi 69.54 71.42 NA NA [34]
MobileNet VehicleID 80.29 86.40 NA NA [34]
GhostNet VeRi 80.03 80.14 NA NA [34]
GhostNet VehicleID 84.65 88.38 NA NA [34]
MicroNet VeRi 77.90 79.28 NA NA [34]
MicroNet VehicleID 83.52 88.03 NA NA [34]
AlexNet VeRi 80.51 80.09 NA NA [34]
AlexNet VehicleID 87.90 91.29 NA NA [34]
Triplet+AlexNet VeRi 71.23 75.31 NA NA [34]
Triplet+AlexNet VehicleID 76.09 84.90 NA NA [34]
ASDFL VeRi 82.08 83.23 NA NA [34]
ASDFL VehicleID 88.70 92.24 NA NA [34]
Multi-View VeRi 82.64 NA NA NA [35]
Multi-View VehicleID 66.06 NA NA NA [35]
DFN VeRi 88.14 NA NA NA [36]
DFN VehicleID 77.02 NA NA NA [36]
BIR VeRi 90.46 NA NA NA [37]
BIR VehicleID 77.17 NA NA NA [37]
SGAT VeRi 89.69 NA NA NA [38]
SGAT VehicleID 78.12 NA NA NA [38]
U-Net 160x240 Kaggle Carvana 99.3787 98.5596 98.9338 98.1723 Own
U-Net 320x480 Kaggle Carvana 99.4911 98.8483 99.3741 98.2611 Own
FCN-8 160x240 Kaggle Carvana 99.3026 98.3546 98.5372 98.1947 Own
FCN-8 320x480 Kaggle Carvana 99.5572 98.9595 99.0707 98.8566 Own
SegNet 160x240 Kaggle Carvana 98.3905 96.307 98.1746 94.4941 Own
SegNet 320x480 Kaggle Carvana 99.2253 98.2019 98.8183 97.5783 Own
Threshold 160x240 Kaggle Carvana 75.2768 58.4872 67.3221 46.0944 Own
Threshold 320x480 Kaggle Carvana 77.1690 57.8845 59.0558 49.1739 Own
K-Means 160x240 Kaggle Carvana 62.8176 44.8273 56.3527 30.5582 Own
K-Means 320x480 Kaggle Carvana 55.8672 40.7987 49.6109 24.2774 Own

136
Advances in Science and Technology Research Journal 2023, 17(6), 127–139

individual data batches has also been changed The comparison of the obtained results in this
from 8 to 2 images. Graphs of changes in loss and study for the deep learning approaches applied
accuracy during the imaging training process for for vehicle datasets with the state-of-the-art are
individual sets of images are shown in Figures 8 presented in Table 12. Three datasets were tak-
and 9. In view of the fast attainment of optimal en into consideration: Kaggle Carvana, Veri and
accuracy and loss values, as in the first study, a VehicleId. On the dataset utilized in this paper,
limit to 10 epochs has been set. Kaggle Carvana, the U-Net and its modifications
To determine the impact of image resolu- were verified [22-23, 25]. The obtained Dice
tion on segmentation quality, 4 metrics were score results for CBAM solution were in range
calculated: Accuracy, Precision, Recall and Dice of 98.00-98.71%, which are slightly lower than
score (Table 7-10). They were compared with for analyzed U-Net and FCN-8 for 320x480 im-
the results of the first study. The results of the age resolution, obtained in this study. The results
comparison of the mean values of the individual achieved for U-Net and FCN-8 for 160x240 im-
metrics calculated for the neural networks and age resolution are comparable with the CBAM
for the conventional algorithms are gathered in performance. The approach of combining U-Net
Table 11. The results of the comparison of the together with the ResNet gave the highest Dice
mean values of the individual metrics calculated score on the level of 99.69% [25]. The Accuracy
for the neural networks and for the convention- obtained for U-Net architecture in [22] gave the
al algorithms for two resolutions are shown in comparable results to ones obtained in this study.
Table 12. In [34] an ASDFL approach was compared
with eight neural networks using two vehicle da-
tasets. The obtained Accuracy and Dice score did
DISCUSSION not exceed 93%. The Multi-View approach [35],
the discriminative fine-grained network (DFN)
The aim of this study is a comparative analy- [36] and the Structured Graph Attention Network
sis of classical methods (K-means clustering, (SGAN) [38] did not reach 90%. The framework
Threshold segmentation) and deep learning meth- with background interference removal (BIR)
ods (U-Net, FCN, SegNet) in RGB image seg- mechanism [37] achieved up to 90.46%. The per-
mentation tasks. Images from the Carvana dataset formance of the methods presented in [34-38] is
have been selected for this study. The analysis lower according to the networks proposed in this
relates to the performance of the selected tech- study. They are much more adequate for segmen-
niques for two various resolutions: 160x240 and tation purposes. The research results confirm the
320x480. Based on the obtained results, the cor- results of the analyzed papers [16] and [17] and
rectness of the H1 hypothesis was proven, while clearly show the performance advantage of the
the H2 hypothesis was true only for deep learn- neural networks over the classical algorithms in
ing networks methods. In case of classical algo- image segmentation tasks.
rithms, changing the resolution to a higher one
resulted in an increase in segmentation quality for
Accuracy and Dice score for both algorithms and CONCLUSIONS
a decrease in another measures.
The segmentation accuracy of images from The study presented in this paper allows for a
the Carvana dataset was 99.38%, 99.30%, 98.39% thorough analysis of the performance of the im-
for the U-Net, FCN-8 and SegNet models, re- age segmentation solutions for Carvana dataset.
spectively, 75.17% and 62.82% for threshold Both classical methods and those based on deep
and Kmeans algorithms (Table 6), which clearly learning approaches were taken into consider-
confirms the advantage of the algorithms using ation. K-means and Threshold were applied as
neural network models. Changing the resolution classical segmentation algorithms, while U-Net,
resulted in an increase in the accuracy metric for FCN-8 and SegNet were chosen as deep learn-
all analyzed models (Table 11). It means that the ing solutions. The most important measures such
increasing the image resolution has impact on the as Accuracy, Dice score, Recall and Precision
improvement of the segmentation quality. How- were applied in order to verify the segmentation
ever, in classical methods, for Precision and Re- performance. All analysed deep learning meth-
call lower values were obtained. ods reached very high results for both image

137
Advances in Science and Technology Research Journal 2023, 17(6), 127–139

resolutions, 160x240 and 320x480. They stated to 14. Badrinarayanan V., Kendall A., Cipolla R. Segnet:
be more appropriate choice for image segmenta- A deep convolutional encoder-decoder architecture
tion in comparison to classical algorithms. More- for image segmentation. IEEE transactions on pat-
over, the state-of-the-art analysis showed that the tern analysis and machine intelligence 2017; 39.12:
2481-2495.
proposed deep learning approaches in this study
are one the best tools for segmentation purposes 15. Lyu H., Fu H., Hu X., Liu L. Esnet: Edge-based
segmentation network for real-time semantic seg-
of vehicles.
mentation in traffic scenes. In Proc: 2019 IEEE In-
ternational Conference on Image Processing Taipei,
Taiwan 2019, 1855-1859.
REFERENCES 16. Karabağ C., Verhoeven J., Miller N. R., Reyes-
Aldasoro C. C. Texture segmentation: An objective
1. Garcia-Garcia A., Orts-Escolano S., Oprea S., Vil- comparison between five traditional algorithms and
lena-Martinez V., Martinez-Gonzalez P., Garcia-Ro- a deep-learning U-Net architecture. Applied Sci-
driguez J. A survey on deep learning techniques for ences 2019; 9(18): 3900.
image and video semantic segmentation. Applied
17. Ozturk O., Saritürk B., Seker D. Z. Comparison of
Soft Computing 2018; 70: 41-65.
fully convolutional networks (FCN) and U-Net for
2. Zhang, Y.-J. Advances in image and video segmen- road segmentation from high resolution imager-
tation, IGI Global, 2006. ies. International journal of environment and geo-
3. Stockman G., Shapiro L. G. Computer vision, Pren- informatics 2020; 7(3): 272-279.
tice Hall PTR, 2001. 18. Ahmed I., Ahmad M., Khan F. A., Asif M. Com-
4. Li B., Shi Y., Qi Z., Chen Z. A survey on semantic parison of deep-learning-based segmentation mod-
segmentation. In Proc. 2018 IEEE International els: Using top view person images. IEEE Access
Conference on Data Mining Workshops, Singa- 2020; 8: 136361-136373.
pore 2018, 1233-1240. 19. King A., Bhandarkar S. M., Hopkinson B. M. A
5. Hafiz A. M., Bhat G. M. A survey on instance segmen- comparison of deep learning methods for semantic
tation: state of the art. International journal of mul- segmentation of coral reef survey images. In Proc.
timedia information retrieval 2020; 9(3): 171-189. of the IEEE conference on computer vision and
6. Kirillov A., He K., Girshick R., Rother C., Dollár, P. pattern recognition workshops, Salt Lake City, UT,
Panoptic segmentation. In Proc. of the IEEE/CVF USA 2018, 1394-1402.
conference on computer vision and pattern recogni- 20. Kromp F., Fischer L., Bozsaky E., Ambros I. M.,
tion, Long Beach, USA 2019, 9404-9413. Dörr W., Beiske, K., Ambros P. F., Hanbury A., Tas-
chner-Mandl, S. Evaluation of deep learning archi-
7. Tilton, J. C. Image Segmentation Analysis for
tectures for complex immunofluorescence nuclear
NASA Earth Science Applications. Capital Science
image segmentation. IEEE Transactions on Medical
2010. 2010.
Imaging 2021; 40(7): 1934-1949.
8. Aljabri M., AlGhamdi M. A review on the use
21. Erdem F., Avdan U. Comparison of different U-net
of deep learning for medical images segmenta-
models for building extraction from high-resolution
tion. Neurocomputing 2022.
aerial imagery. International Journal of Environ-
9. Mousavirad S. J., Ebrahimpour-Komleh H. Image ment and Geoinformatics 2020; 7(3): 221-227.
segmentation as an important step in image-based
22. Mulindwa D. B., Du S. An n-Sigmoid Activation
digital technologies in smart cities: a new nature-
Function to Improve the Squeeze-and-Excitation
based approach. Information Innovation Technol- for 2D and 3D Deep Networks. Electronics 2023;
ogy in Smart Cities 2018; 75-89. 12(4): 911.
10. Gonzalez R. C., WoodsR. E., Eddins S. L. Digital 23. Fang X. Research on the Application of Unet with
Image Processing Using Matlab. Digital Image Pro- Convolutional Block Attention Module to Semantic
cessing Using Matlab, 2004. Segmentation Task. In: Proc. of the 2022 5th Inter-
11. Yu, H., Yang Z., Tan L., Wang Y., Sun W., Sun M., national Conference on Sensors, Signal and Image
Tang, Y. Methods and datasets on semantic segmenta- Processing, Nanjing Chin 2022, 13-16.
tion: A review. Neurocomputing 2018; 304: 82-103. 24. Shaler B., Gill D., Mark M., McDonald P., Cukier-
12. Liu D., Soran B., Petrie G., Shapiro L. A review of ski W. Carvana Image Masking Challenge. Kaggle.
computer vision segmentation algorithms. Lecture https://kaggle.com/competitions/carvana-image-
notes 2012, 53. masking-challenge [access September 2023].
13. Smołka, J. Fast watershed-based dilation. Advances 25. Xu J., Guo H., Kageza A., Wu S., AlQarni S. Re-
in Science and Technology Research Journal, 2014; moving Background with Semantic Segmentation
8(23): 41-44. Based on Ensemble Learning. In 2018 11th EAI

138
Advances in Science and Technology Research Journal 2023, 17(6), 127–139

International Conference on Mobile Multimedia 32. Niu Z., Li H. Research and analysis of threshold seg-
Communications: 187-197. mentation algorithms in image processing. In: Proc
26. Carvana Image Masking Challenge. Retrieved May, of Journal of Physics: Conference Series, Ningbo,
2023, https://www.kaggle.com/c/carvana-image- China 2019, 1237(2), 022122.
masking-challenge [access 26.05.2023] 33. Shan P. Image segmentation method based on K-
27. Carvana Upgrades Its Own Industry-Changing Vir- mean algorithm. EURASIP Journal on Image and
tual Vehicle Tool Experience with New Automotive Video Processing, 2018; (1): 1-9.
Imaging Technology. https://investors.carvana.com/ 34. Qin R., Lv H., Zhang Y., Huangfu L., Huang S. AS-
news-releases/2020/08-19-2020-140123662 [ac- DFL: An adaptive super‐pixel discriminative fea-
cess September 2023]. ture‐selective learning for vehicle matching. Expert
28. Ronneberger O., Fischer P., Brox T. U-net: Convolu- Systems 2023; 40(2): e13144.
tional networks for biomedical image segmentation. 35. Teng S., Zhang S., Huang Q., Sebe N. Multi-view
In: Proc. 18th International Conference Medical Im- spatial attention embedding for vehicle re-identifi-
age Computing and Computer-Assisted Interven- cation. IEEE Transactions on Circuits and Systems
tion–MICCAI, Munich, Germany, 2015, 234-241. for Video Technology 2020; 31(2): 816–827.
29. Long J., Shelhamer E., Darrell T. Fully convolu- 36. Wang Q., Min W., He D., Zou S., Huang T., Zhang
tional networks for semantic segmentation. In Proc. Y., Liu R. Discriminative fine-grained network for ve-
of the IEEE conference on computer vision and hicle re-identification using two-stagere-ranking. Sci-
pattern recognition, Boston, Massachusetts 2015, ence China Information Sciences 2020; 63(11): 1–12.
3431-3440. 37. Wu M., Zhang Y., Zhang T., Zhang W. Background
30. Simonyan K., Zisserman A. Very deep convolution- segmentation for vehicle re-identification. In: Proc
al networks for large-scale image recognition. arXiv of the International Conference on Multimedia
preprint arXiv 2014, 1409.1556. Modeling, Daejeon, South Kore 2020 88–99.
31. Badrinarayanan V., Kendall A., Cipolla R. Segnet: 38. Zhu Y., Zha Z.-J., Zhang T., Liu J., Luo J. A struc-
A deep convolutional encoder-decoder architecture tured graph attention network for vehicle reidenti-
for image segmentation. IEEE transactions on pat- fication. In: Proc. of the 28th ACM International
tern analysis and machine intelligence 2017; 39(12): Conference on Multimedia, Seattle, WA, USA 2020,
2481-2495. 646–654.14 of 15QINET AL.

139

You might also like