Seg 2
Seg 2
Research Journal
Advances in Science and Technology Research Journal 2023, 17(6), 127–139 Received: 2023.06.30
https://doi.org/10.12913/22998624/172771 Accepted: 2023.09.22
ISSN 2299-8624, License CC-BY 4.0 Published: 2023.11.21
ABSTRACT
Segmentation is one of the image processing techniques, widely used in computer vision, to extract various types
of information represented as objects or areas of interest. The development of neural networks has influenced
image processing techniques, including creation of new ways of image segmentation. The aim of this study is
to compare classical algorithms and deep learning methods in RGB image segmentation tasks. Two hypoth-
eses were put forward: (1) The quality of segmentation applying deep learning methods is higher than using
classical methods for RGB images, and (2) The increase of the RGB image resolution has positive impact on
the segmentation quality. Two traditional segmentation algorithms (Thresholding and K-means) were compared
with deep learning approach (U-Net, SegNet and FCN 8) to verify RGB segmentation quality. Two resolutions
of images were taken into consideration: 160x240 and 320x480 pixels. Segmentation quality for each algo-
rithm was estimated based on four parameters: Accuracy, Precision, Recall and Sorensen-Dice ratio (Dice score).
In the study the Carvana dataset, containing 5,088 high-resolution images of cars, was applied. The initial set
was divided into training, validation and test subsets as 60%, 20%, 20%, respectively. As a result, the best Ac-
curacy, Dice score and Recall for images with resolution 160x240 were obtained for U-Net, achieving 99.37%,
98.56%, and 98.93%, respectively. For the same resolution the highest Precision 98.19% was obtained for FCN-8
architecture. For higher resolution, 320x480, the best mean Accuracy, Dice score, and Precision were obtained
for FCN-8 network, reaching 99.55%, 99.95% and 98.85%, respectively. The highest results for classical meth-
ods were obtained for Threshold algorithm reaching 80.41% Accuracy, 58.49% Dice score, 67.32% Recall and
52.62% Precision. The results confirm both hypotheses.
127
Advances in Science and Technology Research Journal 2023, 17(6), 127–139
area that belongs to an instance of the object and The most extensive studies contain articles
to classify the area based on the semantic suit- comparing different segmentation methods. The
ability of the class. study described in [16] compares the effective-
Nowadays, segmentation, as well as other ness of classic segmentation algorithms with
image processing methods, is widely used in U-Net. The specificity of a given work is that
computer vision [7-9]. This type of algorithms it used an original dataset, which is a set of im-
can be divided into several groups: threshold- ages with various configurations of geometric
based segmentation, region-based segmenta- images. Furthermore, the given work included
tion, edge-based segmentation, clustering-based a single segmentation technique using a deep
segmentation, graph-based segmentation, active neural network, which did not make it possible
contour-based segmentation, and segmentation to carry out a comparative analysis clearly and
algorithms that use deep learning neural networks to map the results of the study to a set of pho-
[10]. The first group of algorithms is the oldest tos of reality. The authors of [17] study the pos-
and perform segmentation using a given type of sible correlation between image resolution and
algorithms involves finding similarities among the effectiveness of algorithms. They compared
pixels. A feature of the next group of algorithms FCN and U-Net algorithms to segment road
is the principle of image segmentation based on network images in two resolutions: 256x256
the edges of individual objects. Clustering algo- and 512x512. Another study presented in [18]
rithms, such as the K-means algorithm, iteratively presents a comparison of three neural network
assign pixels to a certain group, creating regions models: FCN, U-Net, and DeepLab. The authors
composed of pixels with similar features. Deep aimed to determine the quality of segmentation
learning image segmentation is a technique that and the computational efficiency of a particular
relies on the functioning of deep neural networks. method. A specific set of photographs showing a
This class includes a large number of algorithms, top view captured by the camera was selected for
the following are among the most popular: Deep- the study. The challenge for the authors was that
Lab - convolutional deep learning networks, Ful- the images contained groups or individuals taken
ly Convolutional Network (FCN), deep learning from different angles. The authors of [19] focus
networks based on encoder-decoder architecture: on comparing two types of neural networks: con-
U-Net and SegNet, Region-based Convolutional volutional neural networks (CNN) and fully con-
Neural Network (R-CNN) and others [11-13]. volutional neural networks (FCNN). The study
The segmentation methods and image anal- was conducted in the context of classifying areas
yses may be found in many papers. The study in underwater images of coral reef ecosystems
presented in [14] contains a detailed description into biologically relevant categories. It includes
of the deep convolutional neural network archi- 6 patch-based CNN models (a special variant
tecture for semantic pixel segmentation called of CNN, used for image segmentation) and 4
SegNet. The authors compare the segmentation FCNN approaches. The algorithms are compared
precision for different architectures for 2 datas- in terms of pixel accuracy and the CNN models
ets CamVid and SUNRGB. A comparison of the give better results than FCNN. In [20] the qual-
calculation time and hardware resources required ity of segmentation methods on complex im-
for the different deep learning architectures was ages of immunofluorescence cells is presented.
also analyzed, which allowed to confirm the ef- It compares 5 deep learning methods and 2
fectiveness of the presented neural network. An- classic methods in the form of the h-min based
other study concerns a new approach called Edge- watershed algorithm and attributed relational
Based Segmentation Network (ESNet) for Real- graphs. The authors also proposed and evalu-
Time Semantic Semantic Segmentation in Traffic ated a new strategy of adding artificial imagery
Scenes [15]. The authors describe the network to extend the training set. The effects of various
and compare it with other artificial intelligence factors such as image scaling, annotation quality
approaches such as: FCN-8, PSPNet, SegNet and and post-processing methods on segmentation
DeepLab. The main goal was to demonstrate the effectiveness were also compared. Furthermore,
advantages of the proposed ESNet in image seg- the results were compared with those of manual
mentation in terms of computational efficiency image segmentation performed by experts. The
without compromising segmentation quality. The study described in [21] demonstrated a different
Cityscapes dataset was chosen for this study. approach for comparing segmentation methods.
128
Advances in Science and Technology Research Journal 2023, 17(6), 127–139
The authors test various modifications of the U- improving the CNN performance by adding n-
Net model, such as: U-Net Vgg16, U-Net Incep- sigmoid function with Squeeze-and-Excitation
tionResNetV2, U-Net DenseNet121 in terms of block [22] or improving segmentation perfor-
the quality of segmentation and buildings extrac- mance by introducing Convolutional Block At-
tion from aerial photos of Chicago. tention Module (CBAM) for U-Net architecture
Based on the literature review it can be stat- [23]. As a response for Kaggle Carvana Image
ed that no study has presented a comparative Masking Competition in 2017 [24] that dealt
analysis of deep neural network and classical with extracting cars in high resolution from
segmentation methods while also considering the background, the study was performed with
the effect of resolution changing on segmenta- U-Net and RefineNet with various pre-trained
tion quality. The novelty of the study is to com- CNN architectures [25]. The process of training
pare the classical segmentation algorithms with was accelerated as well as the network perfor-
ones of deep learning approaches. An impact of mance was improved.
image resolution for segmentation performance
was also verified that is a very important aspect
in this type of study. Also, the research was car- MATERIAL AND METHODS
ried out on a set of images of cars of various
shapes and colors in the RGB format, which has Dataset
not been used in research so far. Therefore, the
Images from the Carvana dataset are selected
aim of the research is a comparative analysis of
for this study. The analysis will relate to the seg-
classical methods (K-means clustering, Thresh-
mentation performance of the selected techniques
old segmentation) and deep learning methods
for two different resolutions. The chosen dataset
(U-Net, FCN, SegNet) in RGB image segmenta-
tion tasks. The study aims to confirm or reject was created for the Carvana Image Masking Chal-
the following hypotheses: lenge [26] by Carvana company in 2017. This
H1. The quality of segmentation, understood collection contains 5088 photos of cars in RGB
as accuracy, precision, recall and Sorensen- format (Fig. 1) with a resolution of 1280x1918
Dice coefficient, applying deep learning pixels. 318 passenger used cars of different
methods is higher than using classical brands and colors were selected for the photos.
methods for RGB images. They were taken using a rotating photo studio that
H2. The increase of the RGB image resolution automatically captured and processed 16 photos
has positive impact on the segmentation of the car from different angles (Fig. 2). All the
quality. photos in the set also have a corresponding bi-
nary mask, marking the area occupied by the car.
The Carvana dataset, used in this study, In this experiment, the data set was divided into
was successful dataset containing various used three sets: training – 3056 photos (~60%), valida-
cars created for the purpose of selling them on- tion – 1024 photos (~20%), and testing – 1008
line. It was utilized in many research involving photos (~20%).
129
Advances in Science and Technology Research Journal 2023, 17(6), 127–139
Deep learning segmentation methods accurate. This method allows to detect even very
small objects in images, which is why a given
U-Net
architecture is commonly used to aid diagnosis,
U-Net (Fig. 3) is a convolutional neural net- treatment planning and tracking disease progress
work architecture designed for image segmenta- with computer tomography image analysis.
tion tasks. There are two major components to the
network: encoders and decoders [28] The encoder FCN
part consists of repeated use of convolutions, fol-
FCN (Fig. 4) is a neural network architecture
lowed by rectified linear unit (ReLU) and the max
created for the solution of image segmentation
pooling operation which results in downsampling
and increases the number of feature channels. The tasks [29]. It is an encoder and decoder archi-
decoder path consists of an upsampling of the tecture, similar to U-Net. The FCN includes an
feature map, up-convolution layers followed by encoder in the form of convolutional layers for
concatenation with the correspondingly cropped image downsampling. The encoder usually uses
feature map, and convolutions, each followed by the VGG16 network [30], which is designed for
ReLU. The skip connections between encoder image recognition tasks and has 16 layers includ-
and decoder allow segmentation mask to be more ing (13 convolutional and three fully connected)
130
Advances in Science and Technology Research Journal 2023, 17(6), 127–139
131
Advances in Science and Technology Research Journal 2023, 17(6), 127–139 = TP + TN
𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴
TP + TN + FP + FN
Table 1. Stand specification TP
CPU 6 cores, 12 threads 𝑃𝑃𝑃𝑃𝐴𝐴𝐴𝐴𝑃𝑃𝑃𝑃𝐴𝐴𝐴𝐴𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 = (2)
TP + FP
Clock speed 2.2GHz TP + TN
GPU Base Core Clock: 1290 MHz 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 = TP
𝑅𝑅𝑅𝑅𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅TP = + TN + FP + FN (3)
GPU Memory 4GB TP + FN
RAM 12GB TP
𝑃𝑃𝑃𝑃𝐴𝐴𝐴𝐴𝑃𝑃𝑃𝑃𝐴𝐴𝐴𝐴𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃2 =
× |TP 𝑋𝑋𝑋𝑋 ×+𝑌𝑌𝑌𝑌|
Disc SSD 1TB 𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷 = FP (4)
| 𝑋𝑋𝑋𝑋 + 𝑌𝑌𝑌𝑌 |
TP
where: X – a set 𝑅𝑅𝑅𝑅𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑅𝑅𝑅𝑅 𝑅𝑅𝑅𝑅 = 𝐴𝐴𝐴𝐴𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃pixels;
2 ×of(𝑃𝑃𝑃𝑃𝐴𝐴𝐴𝐴𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃
predicted × 𝑅𝑅𝑅𝑅𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅)
(RGB channel values, grayscale values). Pixels TP + FN
𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷
Y –=a set(𝑃𝑃𝑃𝑃𝐴𝐴𝐴𝐴𝑃𝑃𝑃𝑃𝐴𝐴𝐴𝐴𝑃𝑃𝑃𝑃
of reference pixels.
are added to groups based on the distance from 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 × 𝑅𝑅𝑅𝑅𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅)
This metric also 2 × combines | 𝑋𝑋𝑋𝑋 × 𝑌𝑌𝑌𝑌| precision and
the pixel to the centroid of each group. There- 𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷 =
fore, the similarity to other pixels is determined recall measures:| 𝑋𝑋𝑋𝑋 + 𝑌𝑌𝑌𝑌 |
by the distance.
2 × (𝑃𝑃𝑃𝑃𝐴𝐴𝐴𝐴𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝐴𝐴𝐴𝐴𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 × 𝑅𝑅𝑅𝑅𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅)
𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷 = (5)
(𝑃𝑃𝑃𝑃𝐴𝐴𝐴𝐴𝑃𝑃𝑃𝑃𝐴𝐴𝐴𝐴𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 × 𝑅𝑅𝑅𝑅𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅)
Research methodology
Research stand
Algorithms and deep learning model accuracy RESULTS
tests were performed on an ASUS TUF Gaming
Segmentation quality for 160x240 resolution
FX504 laptop with the specifications in Table 1.
Three models of neural networks have been
Performance measures trained on images with a resolution of 160x240
Two traditional segmentation algorithms (K- pixels in the first part of this study. Graphs of
means and Threshold) and three deep learning al- changes in loss and accuracy during the imaging
gorithms (U-Net, FCN, and SegNet) were select- training process for individual sets of images are
ed to check and compare the quality of segmen- shown in Figures 6 and 7.
tation as well as the stated research hypotheses. To determine the quality of segmentation,
The obtained results were evaluated using four four metrics were calculated: Accuracy, Preci-
measures: Accuracy (Eq. 1), Precision (Eq. 2), sion, Recall and Dice score. The results for 3
Recall (Eq. 3) and the Sorensen-Dice coeffi- neural network models are presented in Table
cient – DSC (Eq. 4 and 5). Accuracy rate mea- 2-5, showing the minimum, maximum, mean and
sures how often the algorithm performs image standard deviation values for each metric, deter-
segmentation correctly. This rate is expressed mined on a 20-fold verification basis.
as a percentage. The Sorensen-Dice score, also The results of the comparison of the mean
known as the F1 score, is a machine learning values of the individual metrics calculated for the
evaluation metric that determines the match be- neural networks and for the conventional algo-
tween predicted segmentations and their corre- rithms are gathered in Table 6.
sponding true values.
TP + TN Study of the impact of image resolution
𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 = (1)
TP + TN + FP + FN on segmentation quality
where: TP – True Positive; TP In the second part of this study three mod-
TN –𝑃𝑃𝑃𝑃𝐴𝐴𝐴𝐴𝑃𝑃𝑃𝑃𝐴𝐴𝐴𝐴𝑃𝑃𝑃𝑃
True𝑃𝑃𝑃𝑃Negative,
𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 =
TP + FP els of neural networks have been trained on im-
FP – False Positive; ages with a resolution of 320 x 480 pixels. Due
FN – False Negative. TP to the change in image resolution, the size of the
𝑅𝑅𝑅𝑅𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 =
TP + FN
Table 2. Accuracy for resolution 160x240 in %
2 × | 𝑋𝑋𝑋𝑋 × 𝑌𝑌𝑌𝑌|
Model𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷 = Minimum value Maximum value Mean Value Standard deviation
U-Net
| 𝑋𝑋𝑋𝑋99.39518605
+ 𝑌𝑌𝑌𝑌 | 99.36156831 99.37869636 0.006505737
FCN-8 99.31386553 99.29425 99.30257 0.00439
2 × (𝑃𝑃𝑃𝑃𝐴𝐴𝐴𝐴𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝐴𝐴𝐴𝐴𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 × 𝑅𝑅𝑅𝑅𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅)
𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷
SegNet = 98.42643142 98.34756715 98.39049364 0.014620049
(𝑃𝑃𝑃𝑃𝐴𝐴𝐴𝐴𝑃𝑃𝑃𝑃𝐴𝐴𝐴𝐴𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 × 𝑅𝑅𝑅𝑅𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅)
132
Advances in Science and Technology Research Journal 2023, 17(6), 127–139
133
Advances in Science and Technology Research Journal 2023, 17(6), 127–139
134
Advances in Science and Technology Research Journal 2023, 17(6), 127–139
135
Advances in Science and Technology Research Journal 2023, 17(6), 127–139
136
Advances in Science and Technology Research Journal 2023, 17(6), 127–139
individual data batches has also been changed The comparison of the obtained results in this
from 8 to 2 images. Graphs of changes in loss and study for the deep learning approaches applied
accuracy during the imaging training process for for vehicle datasets with the state-of-the-art are
individual sets of images are shown in Figures 8 presented in Table 12. Three datasets were tak-
and 9. In view of the fast attainment of optimal en into consideration: Kaggle Carvana, Veri and
accuracy and loss values, as in the first study, a VehicleId. On the dataset utilized in this paper,
limit to 10 epochs has been set. Kaggle Carvana, the U-Net and its modifications
To determine the impact of image resolu- were verified [22-23, 25]. The obtained Dice
tion on segmentation quality, 4 metrics were score results for CBAM solution were in range
calculated: Accuracy, Precision, Recall and Dice of 98.00-98.71%, which are slightly lower than
score (Table 7-10). They were compared with for analyzed U-Net and FCN-8 for 320x480 im-
the results of the first study. The results of the age resolution, obtained in this study. The results
comparison of the mean values of the individual achieved for U-Net and FCN-8 for 160x240 im-
metrics calculated for the neural networks and age resolution are comparable with the CBAM
for the conventional algorithms are gathered in performance. The approach of combining U-Net
Table 11. The results of the comparison of the together with the ResNet gave the highest Dice
mean values of the individual metrics calculated score on the level of 99.69% [25]. The Accuracy
for the neural networks and for the convention- obtained for U-Net architecture in [22] gave the
al algorithms for two resolutions are shown in comparable results to ones obtained in this study.
Table 12. In [34] an ASDFL approach was compared
with eight neural networks using two vehicle da-
tasets. The obtained Accuracy and Dice score did
DISCUSSION not exceed 93%. The Multi-View approach [35],
the discriminative fine-grained network (DFN)
The aim of this study is a comparative analy- [36] and the Structured Graph Attention Network
sis of classical methods (K-means clustering, (SGAN) [38] did not reach 90%. The framework
Threshold segmentation) and deep learning meth- with background interference removal (BIR)
ods (U-Net, FCN, SegNet) in RGB image seg- mechanism [37] achieved up to 90.46%. The per-
mentation tasks. Images from the Carvana dataset formance of the methods presented in [34-38] is
have been selected for this study. The analysis lower according to the networks proposed in this
relates to the performance of the selected tech- study. They are much more adequate for segmen-
niques for two various resolutions: 160x240 and tation purposes. The research results confirm the
320x480. Based on the obtained results, the cor- results of the analyzed papers [16] and [17] and
rectness of the H1 hypothesis was proven, while clearly show the performance advantage of the
the H2 hypothesis was true only for deep learn- neural networks over the classical algorithms in
ing networks methods. In case of classical algo- image segmentation tasks.
rithms, changing the resolution to a higher one
resulted in an increase in segmentation quality for
Accuracy and Dice score for both algorithms and CONCLUSIONS
a decrease in another measures.
The segmentation accuracy of images from The study presented in this paper allows for a
the Carvana dataset was 99.38%, 99.30%, 98.39% thorough analysis of the performance of the im-
for the U-Net, FCN-8 and SegNet models, re- age segmentation solutions for Carvana dataset.
spectively, 75.17% and 62.82% for threshold Both classical methods and those based on deep
and Kmeans algorithms (Table 6), which clearly learning approaches were taken into consider-
confirms the advantage of the algorithms using ation. K-means and Threshold were applied as
neural network models. Changing the resolution classical segmentation algorithms, while U-Net,
resulted in an increase in the accuracy metric for FCN-8 and SegNet were chosen as deep learn-
all analyzed models (Table 11). It means that the ing solutions. The most important measures such
increasing the image resolution has impact on the as Accuracy, Dice score, Recall and Precision
improvement of the segmentation quality. How- were applied in order to verify the segmentation
ever, in classical methods, for Precision and Re- performance. All analysed deep learning meth-
call lower values were obtained. ods reached very high results for both image
137
Advances in Science and Technology Research Journal 2023, 17(6), 127–139
resolutions, 160x240 and 320x480. They stated to 14. Badrinarayanan V., Kendall A., Cipolla R. Segnet:
be more appropriate choice for image segmenta- A deep convolutional encoder-decoder architecture
tion in comparison to classical algorithms. More- for image segmentation. IEEE transactions on pat-
over, the state-of-the-art analysis showed that the tern analysis and machine intelligence 2017; 39.12:
2481-2495.
proposed deep learning approaches in this study
are one the best tools for segmentation purposes 15. Lyu H., Fu H., Hu X., Liu L. Esnet: Edge-based
segmentation network for real-time semantic seg-
of vehicles.
mentation in traffic scenes. In Proc: 2019 IEEE In-
ternational Conference on Image Processing Taipei,
Taiwan 2019, 1855-1859.
REFERENCES 16. Karabağ C., Verhoeven J., Miller N. R., Reyes-
Aldasoro C. C. Texture segmentation: An objective
1. Garcia-Garcia A., Orts-Escolano S., Oprea S., Vil- comparison between five traditional algorithms and
lena-Martinez V., Martinez-Gonzalez P., Garcia-Ro- a deep-learning U-Net architecture. Applied Sci-
driguez J. A survey on deep learning techniques for ences 2019; 9(18): 3900.
image and video semantic segmentation. Applied
17. Ozturk O., Saritürk B., Seker D. Z. Comparison of
Soft Computing 2018; 70: 41-65.
fully convolutional networks (FCN) and U-Net for
2. Zhang, Y.-J. Advances in image and video segmen- road segmentation from high resolution imager-
tation, IGI Global, 2006. ies. International journal of environment and geo-
3. Stockman G., Shapiro L. G. Computer vision, Pren- informatics 2020; 7(3): 272-279.
tice Hall PTR, 2001. 18. Ahmed I., Ahmad M., Khan F. A., Asif M. Com-
4. Li B., Shi Y., Qi Z., Chen Z. A survey on semantic parison of deep-learning-based segmentation mod-
segmentation. In Proc. 2018 IEEE International els: Using top view person images. IEEE Access
Conference on Data Mining Workshops, Singa- 2020; 8: 136361-136373.
pore 2018, 1233-1240. 19. King A., Bhandarkar S. M., Hopkinson B. M. A
5. Hafiz A. M., Bhat G. M. A survey on instance segmen- comparison of deep learning methods for semantic
tation: state of the art. International journal of mul- segmentation of coral reef survey images. In Proc.
timedia information retrieval 2020; 9(3): 171-189. of the IEEE conference on computer vision and
6. Kirillov A., He K., Girshick R., Rother C., Dollár, P. pattern recognition workshops, Salt Lake City, UT,
Panoptic segmentation. In Proc. of the IEEE/CVF USA 2018, 1394-1402.
conference on computer vision and pattern recogni- 20. Kromp F., Fischer L., Bozsaky E., Ambros I. M.,
tion, Long Beach, USA 2019, 9404-9413. Dörr W., Beiske, K., Ambros P. F., Hanbury A., Tas-
chner-Mandl, S. Evaluation of deep learning archi-
7. Tilton, J. C. Image Segmentation Analysis for
tectures for complex immunofluorescence nuclear
NASA Earth Science Applications. Capital Science
image segmentation. IEEE Transactions on Medical
2010. 2010.
Imaging 2021; 40(7): 1934-1949.
8. Aljabri M., AlGhamdi M. A review on the use
21. Erdem F., Avdan U. Comparison of different U-net
of deep learning for medical images segmenta-
models for building extraction from high-resolution
tion. Neurocomputing 2022.
aerial imagery. International Journal of Environ-
9. Mousavirad S. J., Ebrahimpour-Komleh H. Image ment and Geoinformatics 2020; 7(3): 221-227.
segmentation as an important step in image-based
22. Mulindwa D. B., Du S. An n-Sigmoid Activation
digital technologies in smart cities: a new nature-
Function to Improve the Squeeze-and-Excitation
based approach. Information Innovation Technol- for 2D and 3D Deep Networks. Electronics 2023;
ogy in Smart Cities 2018; 75-89. 12(4): 911.
10. Gonzalez R. C., WoodsR. E., Eddins S. L. Digital 23. Fang X. Research on the Application of Unet with
Image Processing Using Matlab. Digital Image Pro- Convolutional Block Attention Module to Semantic
cessing Using Matlab, 2004. Segmentation Task. In: Proc. of the 2022 5th Inter-
11. Yu, H., Yang Z., Tan L., Wang Y., Sun W., Sun M., national Conference on Sensors, Signal and Image
Tang, Y. Methods and datasets on semantic segmenta- Processing, Nanjing Chin 2022, 13-16.
tion: A review. Neurocomputing 2018; 304: 82-103. 24. Shaler B., Gill D., Mark M., McDonald P., Cukier-
12. Liu D., Soran B., Petrie G., Shapiro L. A review of ski W. Carvana Image Masking Challenge. Kaggle.
computer vision segmentation algorithms. Lecture https://kaggle.com/competitions/carvana-image-
notes 2012, 53. masking-challenge [access September 2023].
13. Smołka, J. Fast watershed-based dilation. Advances 25. Xu J., Guo H., Kageza A., Wu S., AlQarni S. Re-
in Science and Technology Research Journal, 2014; moving Background with Semantic Segmentation
8(23): 41-44. Based on Ensemble Learning. In 2018 11th EAI
138
Advances in Science and Technology Research Journal 2023, 17(6), 127–139
International Conference on Mobile Multimedia 32. Niu Z., Li H. Research and analysis of threshold seg-
Communications: 187-197. mentation algorithms in image processing. In: Proc
26. Carvana Image Masking Challenge. Retrieved May, of Journal of Physics: Conference Series, Ningbo,
2023, https://www.kaggle.com/c/carvana-image- China 2019, 1237(2), 022122.
masking-challenge [access 26.05.2023] 33. Shan P. Image segmentation method based on K-
27. Carvana Upgrades Its Own Industry-Changing Vir- mean algorithm. EURASIP Journal on Image and
tual Vehicle Tool Experience with New Automotive Video Processing, 2018; (1): 1-9.
Imaging Technology. https://investors.carvana.com/ 34. Qin R., Lv H., Zhang Y., Huangfu L., Huang S. AS-
news-releases/2020/08-19-2020-140123662 [ac- DFL: An adaptive super‐pixel discriminative fea-
cess September 2023]. ture‐selective learning for vehicle matching. Expert
28. Ronneberger O., Fischer P., Brox T. U-net: Convolu- Systems 2023; 40(2): e13144.
tional networks for biomedical image segmentation. 35. Teng S., Zhang S., Huang Q., Sebe N. Multi-view
In: Proc. 18th International Conference Medical Im- spatial attention embedding for vehicle re-identifi-
age Computing and Computer-Assisted Interven- cation. IEEE Transactions on Circuits and Systems
tion–MICCAI, Munich, Germany, 2015, 234-241. for Video Technology 2020; 31(2): 816–827.
29. Long J., Shelhamer E., Darrell T. Fully convolu- 36. Wang Q., Min W., He D., Zou S., Huang T., Zhang
tional networks for semantic segmentation. In Proc. Y., Liu R. Discriminative fine-grained network for ve-
of the IEEE conference on computer vision and hicle re-identification using two-stagere-ranking. Sci-
pattern recognition, Boston, Massachusetts 2015, ence China Information Sciences 2020; 63(11): 1–12.
3431-3440. 37. Wu M., Zhang Y., Zhang T., Zhang W. Background
30. Simonyan K., Zisserman A. Very deep convolution- segmentation for vehicle re-identification. In: Proc
al networks for large-scale image recognition. arXiv of the International Conference on Multimedia
preprint arXiv 2014, 1409.1556. Modeling, Daejeon, South Kore 2020 88–99.
31. Badrinarayanan V., Kendall A., Cipolla R. Segnet: 38. Zhu Y., Zha Z.-J., Zhang T., Liu J., Luo J. A struc-
A deep convolutional encoder-decoder architecture tured graph attention network for vehicle reidenti-
for image segmentation. IEEE transactions on pat- fication. In: Proc. of the 28th ACM International
tern analysis and machine intelligence 2017; 39(12): Conference on Multimedia, Seattle, WA, USA 2020,
2481-2495. 646–654.14 of 15QINET AL.
139