COMPARISON OF MACHINE LEARNINGMETHODS
FOR BREAST CANCER DIAGNOSIS
PATHAKOTI NAGARJUNA REDDY
ROLL NO: 101122901062
MCA
ABSTRACT
Cancer is the common problem for all people in the world with all types. Particularly,
Breast Cancer is the most frequent disease as a cancer type for women. Therefore, any
development for diagnosis and prediction of cancer disease is capital important for a healthy
life. Machine learning techniques can make a huge contribute on the process of early
diagnosis and prediction of cancer. In this paper, two of the most popular machine learning
techniques have been used for classification of Wisconsin Breast Cancer (Original) dataset and
the classification performance of these techniques have been compared with each other using
the values of accuracy, precision, recall and ROC Area. The best performance has been
obtained by Support Vector Machine technique with the highest accuracy
INTRODUCTION
Cancer is the second reason of human death all over the world and accounts for
roughly
9.6 million deaths in 2018. Globally, for 1 human death in 6 can be said that is
caused by cancer. Almost 70 percent of the deaths from cancer disease happen in
countries that have low and middle income [1]. The most common cancer type among
women are breast, lung and colorectal, which totally symbolize half of the all cancer
cases. Also, breast cancer is responsible for the thirty percent of all new cancer
diagnoses in women [2]. Machine learning (ML) methods ensure analyzing the data
and extracting key characteristics of relationships and information from dataset.
EXISTING SYSTEM
We have applied SVM and ANN techniques for prediction of the classification of breast
cancer to find which machine learning methods performance is better. Support Vector
Machines (SVMs) have been first explained by Vladimir Vapnik and the good
performances of SVMs have been noticed in many pattern recognition problems. SVMs
can indicate better classification performance when it is compared with many other
classification techniques . SVM is one of the most popular machine learning classification
technique that is used for the prognosis and diagnosis of cancer. According to SVM, the
classes are separated with hyperplane that is consisted of support vectors that are critical
samples from all classes. The hyperplane is a separator that is identified as decision
boundary among the two sample clusters. SVM can be used for classifying tumors as
benign or malignant based on patient‘s age and tumors size. Artificial Neural Network
(ANN) can be expressed in terms of biological neuron system. Especially, it is similar to
human brain process system. It is consisted of a lot of nodes that connect each node [12].
ANN have the ability of modelling typical and powerful non-linear functions. It is
consisted of a network of lots of artificial neurons.
DISADVANTAGES:
As the diagnosis of this disease manually takes long hours and the lesser availability of
systems
PROPOSED SYSTEM
Machine learning involves predicting and classifying data and to do so we employ
various machine learning algorithms according to the dataset. SVM or Support Vector
Machine is a linear model for classification and regression problems. It can solve linear
and non-linear problems and work well for many practical problems. The idea of SVM
is simple: The algorithm creates a line or a hyperplane which separates the data into
classes. In machine learning, the radial basis function kernel, or RBF kernel, is a
popular kernel function used in various kernelized learning algorithms. In particular, it
is commonly used in support vector machine classification. As a simple example, for a
classification task with only two features(like the image above), you can think of a
hyperplane as a line that linearly separates and classifies a set of data. Intuitively, the
further from the hyperplane our data points lie, the more confident we are that they
have been correctly classified. We therefore want our data points to be as far away from
the hyperplane as possible, while still being on the correct side of it.
ADVANTAGES
ML techniques can be handled on early detection and prognosis of
cancer.
SYSTEM ARCHITECTURE
HARDWARE REQUIREMENTS
◼ System : Pentium i3
◼ Hard Disk : 500 GB.
◼ Monitor : 14‘ Colour Monitor.
◼ Mouse : Optical Mouse.
◼ Ram : 4 GB.
SOFTWARE REQUIREMENTS
➢ Operatingsystem : Windows 8/10.
➢ Coding Language : PYTHON
MODULES:
◼ Upload Dataset
◼ Pre-processing
◼ Process On Training and Testing Model
◼ SVM Model
MODULES:
Upload Dataset
Upload Dataset is the process of importing raw data sets into your analytical
platform. It can be acquired from traditional databases (SQL and query browsers),
remote data (web services), text files (scripting languages), NoSQL storage (web
services, programming interfaces), etc. Upload Dataset involves the identification of
data sets, retrieval of data, query of data from the dataset. The dataset used in the
project is collected from Wiconsin Dataset . We used additional tools to get other
information, such as, server country with Whois. The final dataset consists of around
1780 values which can serve as a training set for Machine Learning models
MODULES:
Pre-Process Data :
Pre-processing of data involves 2 criteria:
Cleaning Data: Data cleaning involves removal of inconsistent values, duplicate
records, missing values, invalid data and outliers.
Data Munging / Data Wrangling: Data Wrangling techniques involve scaling,
transformation, feature selection, dimensionality reduction and data manipulation.
Scaling is performed over the dataset to avoid having certain features with large
values from dominating the results. The transformation technique reduced the noise
and variability present in the dataset. Multiple features are handpicked for the
removal of redundant/irrelevant features present in the dataset. Dimensionality
reduction helped in eliminating irrelevant features and made analysis simpler.
MODULES:
Support Vector Machine :-
―Support Vector Machine‖ (SVM) is a supervised machine learning algorithm
which can be used for both classification or regression challenges. However, it is
mostly used in classification problems. In the SVM algorithm, we plot each data
item as a point in n- dimensional space (where n is number of features you have)
with the value of each feature being the value of a particular coordinate. Then, we
perform classification by finding the hyper-plane that differentiates the two classes
very well (look at the below snapshot).
USE CASE DIAGRAM:
CONCLUSION
Breast Cancer is the most frequent disease as a cancer type for women. Therefore, any
development for diagnosis and prediction of cancer disease is capital important for a healthy
life. In this paper, we have discussed two popular machine learning techniques for Wisconsin
Breast Cancer classification
THANK YOU