Department of Electrical and Computer Engineering
North South University
Junior Design Project
Malaria Cell Segmentation Using Machine Learning &
Watershed Algorithm
Mahdi Mohammed Shibli ID # 1712784042
Md. Navid Bin Islam ID # 1712404642
Section: 9, Group: 6
Faculty Advisor:
Dr. Tanzilur Rahman
Assistant Professor
Department of ECE
Spring 2020
ABSTRACT
Malaria is a mosquito-borne life-threatening blood disease caused by parasites of the genus
Plasmodium. The parasite enters into the blood through the saliva of the mosquito while biting.
This parasite directly infects the red blood corpuscles and causes roughly 400 million deaths per
year. Conventional analysis for malaria detection is the examination of a patient's stained blood
sample in a microscope. The sample blood smear is placed on a slide and observed under a
microscope to count the number of infected RBC manually. An expert operator is involved in the
examination of the sample having intense visual and mental concentration. It is a tiresome and
time-consuming process where precision depends on the skill of the operator. Machine-based
blood smear analysis and detection of malaria-infected cells have opened a new area for early
malaria detection. This system has shown the potential to overcome the drawbacks of manual
strategies. As the traditional diagnostic process is problematic and error-prone, in this report, we
are developing a machine learning-based malaria cell segmentation system using watershed
algorithms. The result shows that the proposed methodology achieved more accurate results
and performed faster compared to other existing methods
Page || 2
Table of Contents
CHAPTERS PAGES
ABSTRACT 2
CHAPTER 1: INTRODUCTION 5
1.1 Malaria 6
1.2 Malaria Detection Process 6
1.3 Cell Segmentation 7
1.4 Machine Learning 8
1.5 Watershed Algorithm 8
1.6 Project Aim and Objective 9
1.7 Motivation 10
CHAPTER 2: LITERATURE REVIEW 11
2.1 Existing Literature Explanation 12
CHAPTER 3: METHODOLOGY 14
3.1 Workflow 15
3.2 Data Collection 16
3.3 Preprocessing 16
3.4 Watershed Threshold 17
Page || 3
CHAPTER 4: EXPERIMENT RESULT 19
4.1 Sample Test Data 20
4.2 Result Analysis 21
CHAPTER 5: CONCLUSION 24
5.1 Discussion 25
5.2 Summary 26
5.3 Future Work 26
REFERENCES 27
Page || 4
CHAPTER 1: INTRODUCTION
Page || 5
In this chapter, we are going to discuss malaria, why the automated segmentation of malaria is
needed, our project’s aim, objectives, and the motivation for doing this project.
1.1 Malaria
Malaria is one of the most threatening global problems, causing worldwide sufferings and
deaths, particularly in underdeveloped countries. Plasmodium parasites cause malaria. The
disease transmitted by the bite of an Anopheles mosquito, which is already infected with
plasmodium parasites. Then the parasite is released into the human bloodstream when the
mosquito bites. Upon getting into the bloodstream, they travel to the liver of the infected person
and mature there.
In 2018, an estimated 288 million malaria cases were found, and among these numbers of
deaths stood at 405000. The most vulnerable group affected by malaria is children under 5; in
that year, they accounted for 67% of the total death toll. According to the WHO African Region
carries a high percentage of global malaria. In 2018, 93% of global disease and 94% of the total
death toll was from this region [1].
1.2 Malaria Detection Process
Malaria parasitic cells, which means plasmodium parasites infected cells can be detected
visually using chemicals in RBC [2]. The staining process colorizes the RBC but highlights the
plasmodium. Thus, plasmodium can be identified by detecting the highlights. Traditionally,
malaria detection is done by hand. Which is a very error-prone method to detect a dangerous
Page || 6
disease like malaria. Because a small error in the detection process can cause someone's
death.
Figure 1: Conventional Process of malaria diagnosis ( a. A patient goes for a test, b. A blood
sample is taken, c. The sample is placed on a slide, d. The sample is stained with contrasting
agents, e. Malaria parasites get highlighted, f. Clinician examines the slides manually) [3]
1.3 Cell Segmentation
The process of identifying cells from blood smears is called cell segmentation. It is the serial
repetitions of similar organs such as tissues, cell types, or body cavities. Cell segmentation is
the most basic and essential step for the analysis of microscopic cell images. Cell segmentation
can be done by many approaches like traditional machine learning, deep learning, and different
other methods such as morphological operators. However, cell segmentation is challenging
using automation, one of which is identifying blood cells separately when there are overlapping
Page || 7
cells. But, in this modernization of technologies, automation is the best of doing cell
segmentation.
1.4 Machine Learning
Humans learn from past experiences if they want to, but computers do not. It was one of the
differences between humans and computers in ancient times. However, computers or machines
need instructions to accomplish a task because they are strict logic machines having zero
common sense. Hence detailed, step by step instructions on exactly what has to be done must
be provided to these machines for accomplishing a task. Thus scripts are written, and
computers are programmed to follow and run on the given instructions. That is where machine
learning comes in as the concept consists of training computers or machines based on the
experiences from past data. Machine learning is simply a probabilistic approach for solving
some real-life problems based on data of previous records and which no longer needed human
assistance as before. A large number of people often use the term artificial intelligence (AI) and
machine learning interchangeably apart. But honestly, it is a subset of an application of artificial
intelligence (AI). The system has the power to mechanically learn and improve from expertise
while not being expressly programmed.
1.5 Watershed Algorithm
A classical algorithm for separating different objects in an image termed as ‘segmentation’ is the
watershed algorithm. Simply outlined, watershed could be a transformation in grayscale images.
This system aims to phase the image usually, once two regions of interest are near to one
another or the objects in the image touch each other. In the blood smears, most of the blood
Page || 8
cells are not adequately scattered. In these circumstances, it becomes challenging to segment
all the blood cells in a particular smear. The watershed is used for this overlapping issue since it
uses the connectivity in the given image pixel.
Figure 2: Watershed Segmentation of overlapping objects [4].
Distance transform helps to calculate the difference between a pixel and non zero pixels nearest
to it. The strategy works very well on the rounded objects and binary images so that the darkest
parts of the image are the centers of the objects.
1.6 Project Aim and Objective
In this time of technology, everything is becoming automated. Still, malaria diagnosis is
performed by traditional procedures. The traditional malaria diagnosis process is error-prone,
time-consuming, and the accuracy of diagnosis depends on the operator’s concentration level
and mental state. Malaria can cause serious health hazards to the patient and can even cause
death. But, with faster treatment, the complications can be minimized.
Page || 9
The points below are our project’s aims and objectives:
❖ To find an optimized algorithm to segment malaria blood cells for further detection
automatically.
❖ To increase the accuracy of the segmentation of the overlapping cells.
❖ To make an easily implementable model for the real world for the betterment of the mass
people.
❖ To assist the medical practitioners in segmenting the cells efficiently without wasting a lot
of time.
1.7 Motivation
It is quite clear that malaria is prevalent throughout the world, particularly in tropical regions. The
motivation of this project is based totally on the character and fatality of this disease. Initially, if
the infected mosquito bites a person, parasites carried all the way will enter the blood and
slowly start destroying the red blood cells. Typically the first symptoms of malaria are just like
the flu or an endemic. The affected person starts feeling unwell within a few days or weeks after
the mosquito bite. Although these lethal parasites can live in the hosts for over a year without
any problem, thus, a put off in the proper treatment can lead to complications, eventually death.
Hence, rapid and fruitful malaria tests and detection can save a million patients from dying.
Computer vision techniques for malaria diagnosis represents a new area for early malaria
detection. According to the WHO malaria parasite counting protocol, a clinician may have to
count up to 5,000 cells [5] manually. Hence this error-prone and time-consuming visual
inspection process must be replaced with a technical system.
Page || 10
CHAPTER 2: LITERATURE REVIEW
Page || 11
While working on this project, we studied different research papers to learn more about the
approaches we can take to solve this problem. We selected a few from there, because of the
similarities they had with our project, and then we tried to take ideas from their works
2.1 Existing Literature Explanation
A study has been done in [6] by Suman Kunwar to detect malarial parasites by constructing a
new image processing system for the detection and quantification of plasmodium parasites in
blood smears. Gradually they have developed Machine Learning algorithms to learn, detect,
and determine the types of infected cells. Here image acquisition is the first process. Malaria
infected images that are less noisy and devoid of artifacts were used. Segmentation of blood
smears has been done by identifying common properties. Pixels share intensity in a region.
Hence a natural way to segment such areas is thresholding. It is the separation of light and dark
areas. Thresholding creates binary images by turning all pixels below some threshold to zero
and all pixels above that threshold to one. Pixels labeled one denote an object, and zero
indicates background. For further processing, enhancement is done on the input image after
thresholding. Erosion and dilation are fundamental steps for morphological processing. It helps
in detecting the objects. For the segmentation part, two kinds of segmentation have been done
here. Firstly Watershed Segmentation is a relatively new approach, and secondly, Color-based
segmentation. They have tested 40 images. Although there are some errors, if at least one
parasite is found in a blood smear, then he/she is declared as malaria-infected.
In [7], Weikang Wang and Yi-Jiun Chen and others have used a different strategy that combines
CNN and the watershed algorithm. At first, CNN is trained to learn Euclidean distance transform
Page || 12
(EDT) of binary masks according to the input images. Again they have trained another CNN,
which is a faster R-CNN (Region with CNN). It detects individual cells in the Euclidean distance
transform (EDT) image (deep cell detector). In the following step, the watershed algorithm was
applied for the final segmentation using the previous two steps. The combined method and
different types of pixel-wise classification methods achieved similar pixel-wise accuracy, But the
combined approach had made higher cell count accuracy than the other ones. Pixel-wise
classification had a drawback of separating connected cells as well as the cells connected by
blurry boundaries. Nevertheless, deep-distance estimators and deep cell detectors are easy to
train, and they also converge quickly.
In [8], Yousef Al-Kofahi and Mirabela Rusu and others have designed a single channeled cell
segmentation algorithm. A cytoplasm marker has been used in this research, which shows
hypo-intense nuclear regions and hyper-intense cellular regions. In the first step, a deep
learning predictive model has been trained using the images of the dataset. The model is
trained to implement image patches of 160x160 pixels to predict three different labels. The
second step is the deep learning inference, where the unseen image is divided into 176x176
patches. It results in the creation of a probability map of nuclei, cytoplasm, and background.
Then the patches are joined together for the prediction of the full image. In the third step, a
multiple level Laplacian of Gaussian (LoG) blob detector is applied. It results in enhancing the
blob-like nuclei regions at multiple scales. An automated multi-level Otsu thresholding is
implemented for extracting the binary nuclear mask. Segmented nuclei have been used as
seeds for the robustness of the mentioned design. Background labels and segmented nuclei,
which were identified earlier, have been used in the seeded watershed segmentation.
Page || 13
CHAPTER 3: METHODOLOGY
Page || 14
This chapter gives an overview of the different parts of the work chronologically. It mainly
discusses the theories, techniques, and step by step workflow of the work
3.1 Workflow
A complete workflow diagram of the proposed method is shown in the figure below.
Figure 3: Overall Workflow of the proposed method.
Page || 15
3.2 Data Collection
The Images [9] in .png or .jpg format. There are three sets of pictures consisting of 1364
pictures (~80,000 cells) with totally different researchers having ready everyone: from Brazil
(Stefanie Lopes), from the geographic region (Benoit Malleret), and time course (Gabriel
Rangel). Blood smears were stained with Giemsa chemical agents.
The data consists of 2 categories of clean cells (RBCs and leukocytes) and four types of
infected cells (gametocytes, rings, trophozoites, and schizonts). The info had a significant
imbalance towards clean RBCs versus clean leukocytes and infected cells, creating over
ninety-fifths of all cells.
A class label and set of bounding box coordinates got for every cell. For all knowledge sets,
infected cells got a category label by Stefanie Lopes, protozoal infection investigator at the Dr.
Heitor Vieira Dourado medical specialty Foundation hospital, indicating the stage of
development or marked as tough.
3.3 Preprocessing
Preprocessing is done on a dataset before applying any algorithm to increase features of the
dataset. In the first step of preprocessing the data, we are converting the single-channel image
into a three-channel RGB image, which will help in the next preprocessing steps.
Page || 16
Figure 5: splitting the input image and merging (a. R-channel, b. G-channel, c. B-channel, d.
3-channel image)
Then we turned the 3-channel image into a grayscale image. This grayscale image is used for
thresholding the image. In this case, we are using Otsu’s binarization. Then we are filtering the
resulting image using dilation followed by erosion and a 2x2 kernel for the filtration process.
3.4 Watershed Threshold
In the blood smears, the RBC is very near to each other, and sometimes even overlapping each
other. It leads to the miscount of the RBC, hence a health hazard. Watershed transformation is
used in our work because it uses the connectivity in
Page || 17
Figure 6: Watershed Transform (a. Greyscale, b
. Threshold, c. Filtered, d. Sure Background, e.
Distance Transformation, f. Sure foreground, g. Unknown Regions,
h. Markers, i. Result of Watershed Transformation)
the given image pixel. To apply the watershed transform, firstly, we are finding the sure
background and foreground of the resulting image of the preprocessing steps. Distance
transform helps to calculate the difference between the pixel and non-zero pixels nearest to it,
which allows us to find the sure foreground of the image. Euclidean distance transformation is
used for calculating the distance between the background and foreground, from where we are
generating the unknowns, which helps us to plot the markers. After getting the unknown
regions, we are applying the watershed algorithm.
Page || 18
CHAPTER 4: EXPERIMENT RESULT
Page || 19
This chapter gives an idea of the results of our experiment. It also discusses and analyzes
different results.
4.1 Sample Test Data
After training the model, here are some of the sample test data that we tested and checked our
results. We made a split of train and test images with a ratio of 3:1 In the following figure, output
images are bounded with a red-colored region. The cells of the input blood smear are
successfully segmented. These sample images were taken from the test images.
Figure 7: Some of the test input and output
Page || 20
4.2 Result Analysis
Our technique is segmenting the blood cells successfully in some cases with the right level of
accuracy from the results of our experiment. In some of the cases, it is failing to segment the
blood cells with satisfactory accuracy.
Figure 8: Some of the output images (a. Densely Overlapping cells with light boundaries, b.
Sparsely overlapping cells , c. Sparsely Overlapping cells with dark boundaries).
From Figure 8.a, we can get that; our technique is not segmenting cells properly where the
blood smears have cells that are overlapping and have light boundaries. Here the segmentation
method was able to identify 43 out of 69 cells(Figure 9a). From Figure 8.b, we can see that the
cells are distributed and do not overlap cells as much as Figure 8.a. In this case, the
segmentation method detected almost 73 out of 76 cells(Figure 9b), which is very accurate.
Lastly, from Figure 8.c, we can observe that the cells are densely overlapping, and dark
boundaries. Here 48 out of 48 cells were detected (Figure 9c).
Page || 21
Figure 9: Comparison of three scenarios.
The bar chart, in Figure 9, shows a comparison of three scenarios of the data and how the
method of our approach performed in these scenarios. We can observe that the number of
identified cells depends on the contrast of the image.
Figure 10: Comparison of the accuracy of the three scenarios.
Page || 22
The bar chart in Figure 10 shows that the method we used was able to segment the cells with
an impressive rate of accuracy overlapping cells where a moderate amount of contrast was
present. But the accuracy goes down when the amount of contrast present in the smear is low.
The dataset we used came from ex vivo samples from Plasmodium vivax infected patients in
Brazil. Seven labels used to cover all possible cell types, such as RBC, leukocyte, gametocyte,
ring, trophozoite, and schizont. RBCs and leukocytes are uninfected cell types generally found
in the blood. Some cells marked as difficult when not clearly in any one of the classes, but those
marked difficulties ignored in training. The data is also naturally imbalanced among the object
classes.
Hence, we also manually counted the accuracy of our model for 50 images. In this process, we
calculated the total number of cells in those images and the number of cells recognized by the
model. In this case, we achieved an accuracy of 75%, which is a satisfactory result. So we have
found that cells present in the blood smears have detected successfully. More discussion has
done comparing our efficiency with other models in the next chapter.
Page || 23
CHAPTER 5: CONCLUSION
Page || 24
In this chapter, we will discuss and compare our works with other noble works in this field, the
challenges we faced while working on the project, and how we could make our work better and
about future developments.
5.1 Discussion
A few notable works related to our work discussed earlier. [10] used the watershed threshold
like ours and achieved an accuracy of 97.7%. Though their dataset was small and different,
containing only 250 RBC images, their result is excellent. Since they have used watershed
algorithms, but a different dataset was implemented on their system, we can not compare our
model with their model directly.
In [11], they have used the same data set [9] like ours, but their model was different. Firstly,
using traditional machine learning segmentation, their model attained an accuracy of 50% in
segmenting the cells of the images. Then, Two-stage classification is done using faster R-CNN
attaining accuracy of 59% then 98%, respectively disregarding background, RBCs, and delicate
cells. Thus they achieved a significant improvement over the one stage classification method
along with a traditional deep learning cell segmentation.[10]
In our work, we used 800 images taken from the dataset of kaggle [8], where different sets for
train and test are present. Therefore, the dataset is not biased. We used 600 images from the
train set and 200 images from the test set, achieving an accuracy of 75%. As we used the
minimum number of features in our work, it does not require high computation power as well as
time.
Page || 25
5.2 Summary
The death toll due to malaria is increasing day by day. Finding an optimized algorithm for
segmenting the blood cells from the blood smear images might help reduce the deaths of the
patients suffering from this disease. Our proposed method is more straight forward and
optimized than the conventional detecting process. With an accuracy of 75%, our model might
be implemented in the real world to detect the blood cells as it takes less than a second to
segment the cells in a blood smear. Increasing the dataset size may make the model more
credible through the dataset size is more significant than other works in this field. Minimal
features of Machine Learning have been used in our model to make the cell segmentation
process much more straightforward and computation friendly.
5.3 Future Work
We want to work on different datasets to test our algorithm to make it more credible. Though the
achieved accuracy is satisfactory, we would like to compare our machine learning model’s
performance with a deep learning model using the Convolutional Neural Network (CNN) on the
same dataset. Besides, we would like to work on pixel rendering for increasing the accuracy of
our model.
Page || 26
REFERENCES
[1]"Fact sheet about Malaria", W
ho.int, 2020. [Online]. Available:
https://www.who.int/news-room/fact-sheets/detail/malaria. [Accessed: 11- Mar- 2020].
[2]P. Bloland, Drug resistance in malaria. Geneva: World Health Organization, 2001.
[3]"Medical Image Analyses for Malaria Detection", Medium, 2020. [Online]. Available:
https://towardsdatascience.com/medical-image-analyses-for-malaria-detection-fc26dc39793b.
[Accessed: 11- Mar- 2020].
[4]"Watershed segmentation — skimage v0.18.dev0 docs", S cikit-image.org, 2020. [Online].
Available:
https://scikit-image.org/docs/dev/auto_examples/segmentation/plot_watershed.html.
[Accessed: 16- May- 2020].
[5]"Strategy, speed and collaboration are essential to eliminate malaria", W ho.int, 2020.
[Online]. Available:
https://www.who.int/westernpacific/news/feature-stories/detail/strategy-speed-and-collaborati
on-are-essential-to-eliminate-malaria. [Accessed: 19- May- 2020].
[6].[Kunwar, Suman & Shrestha, Manchana & Shikhrakar, Rojesh. (2018). Malaria Detection Using
Image Processing and Machine Learning. [Accessed: 11- Mar- 2020].
[7]W. Wang et al., "Learn to segment single cells with deep distance estimator and deep cell
detector", C
omputers in Biology and Medicine, vol. 108, pp. 133-141, 2019. Available:
10.1016/j.compbiomed.2019.04.006.
[8]Y. Al-Kofahi, A. Zaltsman, R. Graves, W. Marshall and M. Rusu, "A deep learning-based
algorithm for 2-D cell segmentation in microscopy images", BMC Bioinformatics, vol. 19, no. 1,
2018. Available: 10.1186/s12859-018-2375-z.
[9]"Malaria Bounding Boxes", Kaggle.com, 2020. [Online]. Available:
https://www.kaggle.com/kmader/malaria-bounding-boxes?fbclid=IwAR0NUiGrFAiPeqNngxueQW
4YO5mLOn0cLHi4M7USG0RxqnN1-Sg372IRUk4. [Accessed: 17- May- 2020].
[10]K. Charpe, V. Bairagi, S. Desarda and S. Barshikar, "A Novel Method for Automatic Detection
of Malaria Parasite Stage in Microscopic Blood Image", I nternational Journal of Computer
Applications, vol. 128, no. 17, pp. 32-37, 2015. Available: 10.5120/ijca2015906763.
[11]J. Hung et al., "Applying Faster R-CNN for Object Detection on Malaria Images", a
rXiv.org,
2020. [Online]. Available: https://arxiv.org/abs/1804.09548. [Accessed: 17- May- 2020].
Page || 27