[go: up one dir, main page]

0% found this document useful (0 votes)
20 views10 pages

Experiment # 10

The document outlines a lab experiment focused on using Support Vector Machines (SVMs) for data classification in R, detailing objectives, theoretical background, and practical implementation steps. It includes an evaluation sheet for assessing student performance across various knowledge components and provides a structured lab report format. Additionally, it discusses the e1071 package in R, which facilitates the implementation of SVMs, and lists applications of SVMs in real-world scenarios.

Uploaded by

Ali Raza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views10 pages

Experiment # 10

The document outlines a lab experiment focused on using Support Vector Machines (SVMs) for data classification in R, detailing objectives, theoretical background, and practical implementation steps. It includes an evaluation sheet for assessing student performance across various knowledge components and provides a structured lab report format. Additionally, it discusses the e1071 package in R, which facilitates the implementation of SVMs, and lists applications of SVMs in real-world scenarios.

Uploaded by

Ali Raza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Lab Name: To perform classification of data using classification algorithm named support vector

machines (SVMs) in R.
Course title: Soft Computing and Data mining Lab Total Marks: ___20_________
Practical No. 10 Date of experiment performed: ____________
Course teacher/Lab Instructor: Engr. Muhammad Usman Date of marking: ____________
Student Name:__________________________
Registration no.__________________________

Marking Evaluation Sheet

Knowledge components Domain Taxonomy Contribution Max. Obtained


level marks marks

1. Student is aware with


requirement and use of Imitation (P1) 3
apparatus involved in
experiment.
2. Student has conducted the Psychomotor 70%
experiment by practicing the Manipulate (P2) 11
hands-on skills as per
instructions.
3. Student has achieved required -
Precision (P3)
accuracy in performance.

4. Student is aware of discipline &


safety rules to follow them rules Receiving (A1) 2
Affective
during experiment.
20%

5. Student has responded well and


Respond (A2) 2
contributed affectively in
respective lab activity.
6. Student understands use of Understand.
modern programming languages Cognitive 10% 2
and software environment for (C2)
Data Mining (DM)
Total 20

Normalize
marks out of 5
(5)

Signed by Course teacher/ Lab Instructor


EXPERIMENT # 10
To perform classification of data using classification algorithm named support vector
machines (SVMs) in R

PRE LAB TASK

Objective:
1. To be familiar with classification of data and Support vector machines (SVMs).
2. To be familiar with package (e1071) that is helpful for classification of data in modern
programming language R.
3. To know how to use e1071 for classification of data in modern programming language
named R.
Theory:

1. Classification of data:
Classification is labeling new examples with the appropriate class. In the field of machine
learning, statistical classification is the technique of identifying which of a set of categories
(sub-populations) an observation (or observations) belongs to. Classification is a useful tool in
machine learning and data mining. In general, the goal of classification is to use an object's
characteristics to identify which class (or group) it belongs to. A. A. Soofi and Arshad Awan
has defined classification as a data mining (machine learning) technique in “Soofi, A. A., &
Awan, A. (2017). Classification techniques in machine learning: applications and issues. J.
Basic Appl. Sci, 13, 459-465.” as “Classification is a data mining (machine learning) technique
used to predict group membership for data instances.” Classification is categorised as one of
the supremos studied problems by researches of the machine learning and data mining field.

Machine learning can be categorised into supervised and unsupervised methods. Classification
is a key category of supervised machine learning techniques. The supervised machine learning
process involves (1) asking a question, (2) gathering data, (3)based on our research developing
a hypothesis, and (4) analysing the data. If the hypothesis supports the data, it can be accepted
as a scientific theory. If not, it can be rejected or modified. The goal is to find a model that
matches previous predictions. The hypothesis space in classification involves any function that
categorises data into classes, but most hypotheses are incorrect. Training and fitting models
are necessary to filter out bad hypotheses. There are several ways we narrow down the
hypothesis space.

Different classification learning algorithms exist, focusing on specific hypotheses. There is no


single form of classification which is appropriate for all data sets hence a large toolkit of
classification algorithms have also been developed. A list of five basic supervised learning
(classification) techniques along with their associated classification methods (learning
algorithms), as well as details on learning algorithms strengths, weaknesses, potential
applications and issues with their available solution, is given by A. A. Soofi and Arshad Awan
(2017) in “Classification Techniques in Machine Learning: Applications and Issues”.
1.1 Examples:
Some of the classification examples are assigning a given email to the "spam" or "non-spam"
class, and assigning a diagnosis to a given patient based on observed characteristics of the
patient (sex, blood pressure, presence or absence of certain symptoms, etc.).

Classification and clustering are examples of the more general problem of pattern recognition,
which is the assignment of some sort of output value to a given input value. Other examples
are regression, which assigns a real-valued output to each input; sequence labeling, which
assigns a class to each member of a sequence of values (for example, part of speech tagging,
which assigns a part of speech to each word in an input sentence); parsing, which assigns a
parse tree to an input sentence, describing the syntactic structure of the sentence; etc.
2. Support Vector Machines (SVMs):
In machine learning, support vector machines (SVMs) are supervised learning models with
associated learning algorithms that analyze data for classification and regression analysis,
developed by Vladimir Naumovich Vapnik at Bell labs in 1963. The support vector machines
(SVMs) implements the following idea: SVM maps input vectors of training examples to points
in space so as to maximise the width of the gap between the two categories. In this space, an
optimal separating hyperplane is constructed. New examples are then mapped into that same
space and predicted to belong to a category based on which side of the gap they fall.
The original maximum-margin hyperplane algorithm proposed by Vapnik in 1963 constructed
a linear classifier. In addition to performing linear classification, SVMs can efficiently perform
a non-linear classification using what is called the kernel trick, implicitly mapping their inputs
into high-dimensional feature spaces. In nonlinear classifiers such as kernel methods, which
map data to a higher dimensional space, linear classifiers directly work on data in the original
input space. While linear classifiers fail to handle some inseparable data, they may be sufficient
for data in a rich dimensional space. An important advantage of linear classification is that
training and testing procedures are much more efficient. Therefore, linear classification can be
very useful for some large-scale applications.
In general, a SVMs plots input data objects as points in an n-dimensional space, where the
dimensions represent the various features of the object. The algorithm then attempts to
iteratively find a function that represents a hyperplane that can act as a separator between the
spaces occupied by different target output classes. An SVM model is a representation of the
input data objects in a graphical space with a clear gap between groups of points representing
different categories. This division is caused by the hyperplane, which is a line (in case of 2D
space) or a plane (in case of the 3D plane). The hyperplane is a division curve that splits the
space such as it clearly signifies which section of the space is occupied by which category.
The following is an example of a trained SVM model.

Fig. 1. Support Vector Machines (SVMs) Model


In the figure above, the hyperplane has two parallel dotted lines on either side of it. The
perpendicular distance between these two lines is called the margin. Margin is the distance
between the data points of the two different categories. The data points closest to the hyperplane
have the largest impact on the position of the hyperplane. these points are called support
vectors.
2.1 Applications:
Like many other machine learning algorithms, SVM’s have also found wide-spread
applications in the real world. SVM’s help in solving many day-to-day classification problems
all over the world. Some of these SVMs applications are given below.
1. Handwriting detection: Many handwriting detection programs use SVM’s to identify
handwritten characters.
2. Image based searching: SVM’s are an avenue for improving images based searching.
3. Face detection: Every smartphone has a face detection feature in its camera these days.
SVM separates the faces from the rest of the picture.
4. Bioinfomeatics: SVM’s are used to classify people based on genes and other biological
features.
5. Cancer detection: SVM’s can detect malignant tumors from benign ones by considering
their images.
6. Classification of Satellite data: Classification of satellite data like SAR data can be
performed using supervised SVMs.
3. Package e1071:
Package e1071 is specific open source R package for R programming that provides functions
for statistic and probabilistic algorithms like a fuzzy classifier, naive Bayes classifier, bagged
clustering, short-time Fourier transform, support vector machine, etc..

When it comes to SVM, there are many packages available in R to implement it. However,
e1071 is the most intuitive package for this purpose. The svm() function of the e1071 package
provides a robust interface in the form of the libsvm. This interface makes implementing
SVM’s very quick and simple. It also facilitates probabilistic classification by using the kernel
trick. It provides the most common kernels like linear, RBF, sigmoid, and polynomial.
4. Practical Implementation of SVM in R:
Let us now create an SVM model in R to learn it more thoroughly by means of practical
implementation. We will be using the e1071 packages for this.
The following steps are taken as procedure for implementation of SVM in R.
• Step 1: Install package e1071:
• Step 2: Load data set and package e1071:
• Step 3: Select columns of the data set:
• Step 4: Encoding the target feature:
• Step 5: Split the data set:
• Step 6: Feature Scaling:
• Step 7: Fitting SVM to training set:
• Step 8: Predicting the test set result:
• Step 9: Making confusion matrix:
• Step 10: Visualising the training set results:
• Step 11: Visualising the test set results:
LAB SESSION

Lab Task:
1. To perform classification of data using classification algorithm named support vector
machines (SVMs) in R.
Apparatus:
• Laptop
• R

Experimental Procedure:

1. How to Setup R:

1. Start-up the Microsoft Windows.


2. Open the website http://cran.r-project.org or use Pin drive to access software folder
named R-4.2.2-win.exe
3. Double click on the software folder and double click on ‘R-4.2.2-win.exe’ file and run
the setup.
4. Press next until you reach the window which ask for the key.
5. Finally chose Finish and close the installation.

2. Get started with R:

1. Start R by double-click on the R icon on your desktop. It will open following windows
in your PC as shown in image.

Fig. 1. R Startup GUI window

2. Install package (e1071).


> install.packages("e1071")

3. Load the data set and package e1071. Instead of importing data let us generate some 2-
dimensional data. We will generate 20 random observations of 2 variables in the form
of a 20 by 2 matrix. This gives us 20 objects with 2 features each.

> library(e1071)
> set.seed(100)
> x <- matrix(rnorm(40),20,2)
> y <- rep(c(-1,1),c(10,10))
> x[y == 1,] = x[y == 1,] + 1
> plot(x, col = y + 3, pch = 19)

4. Encode the target data as factor and convert data into data frame.

> data = data.frame(x, y = as.factor(y))

5. Split data set into Training set and Test set. Use below packages for this purpose
anyhow we do not split as data is small.
> install.packages('caTools')
> library(caTools)

6. As our data is on a relatively smaller scale, we have set the scale argument as FALSE.
And
7. Create the model by Fitting SVM to data by using svm function. Specify the kernel as
linear, and cost as 10.
> data.svm = svm(y ~ ., data = data, kernel = "linear", cost = 10, scale = FALSE)

8. As our data is on a relatively smaller scale, we have set the scale argument as FALSE.
> data.svm = svm(y ~ ., data = data, kernel = "linear", cost = 10, scale = FALSE)
> print(data.svm)

svm(formula = y ~ ., data = data, kernel = "linear", cost = 10, scale = FALSE)


Parameters:
SVM-Type: C-classification
SVM-Kernel: linear
cost: 10
Number of Support Vectors: 5

9. Predict the test set results using ‘predict( )’ function & training set.
10. Visualise the results by plotting the model using the plot() function
> plot(data.svm, data)

Extra Credit Points:


(Follow Similar procedure as well as using PRE-LAB TASK Session data complete the tasks
provided to you as Exercise)

EXPERIMENT DOMAIN:

Domains Psychomotor (70%) Affective (20%) Cognitive


(10%)

Attributes Realization of Conducting Data Data Discipline Individual Understa


Experiment Experiment Collection Analysis Participation nd
(Receiving)
(Awareness) (Act) (Use (Perform) (Respond/
Instrument) Contribute)
Taxonomy P1 P2 P2 P2 A1 A2 C2
Level
Marks 3 5 3 3 3 1 2
distribution
LAB REPORT
Prepare the Lab Report as below:
TITLE:

OBJECTIVE:

APPARATUS:

PROCEDURE:
(Note: Use all steps you studied in LAB SESSION of this tab to write procedure and to
complete the experiment)
DISCUSSION:

Q1.: List the broad categories of Machine Learning (ML)?

________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________

Q2.: List the activities involved in supervised machine learning process?


________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
_______________

Conclusion /Summary
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________

Domains Psychomotor (70%) Affective (20%) Cognitive


(10%)
Attributes Realization of Conducting Data Data Discipline Individual Understa
Experiment Experiment Collection Analysis Participation nd
(Receiving)
(Awareness) (Act) (Use (Perform) (Respond/
Instrument) Contribute)
Taxonomy P1 P2 P2 P2 A1 A2 C2
Level
Marks 3 5 3 3 2 2 2
distribution

Obtained
Marks

You might also like