0% found this document useful (0 votes)

14 views10 pages

Exp4 - Supervised Learning

Uploaded by

mnbatrawi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views10 pages

Exp4 - Supervised Learning

Uploaded by

mnbatrawi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Experiment 4 - Supervised Learning

In this experiment, we will explore supervised learning techniques for regression and
classification tasks. We will use Python and the Scikit-learn library to implement
linear regression, polynomial regression, random forest classifier, and SVM models.
We will also evaluate the models using appropriate evaluation measures.

Scikit-learn installation

To install the Scikit-learn library, you can use the following command
$ pip install -U scikit-learn

In order to check your installation you can use

$ python -m pip show scikit-learn # to see which version and where
scikit-learn is installed
$ python -m pip freeze # to see all packages installed in the active
virtualenv
$ python -c "import sklearn; sklearn.show_versions()"

Note that in order to avoid potential conflicts with other packages it is strongly
recommended to use a virtual environment (venv) or a conda environment.

1.1 Regression
Regression is a statistical technique that relates a continuous dependent variable to
one or more independent variables. In this part, we will fit different regression models
on a toy dataset.

1.1.1 Generating examples

We will start by generating a toy dataset. Generate 40 examples using the following
function
f (x) = sin(x) + ϵ,
where ϵ is sampled from a normal distribution with standard deviation 0.1, and
x ∈ [0, 1). Split the samples to into two equal sets: training set, and testing set.

1
Listing 1.1: Generating examples.
import numpy as np

def true_fun(X):
return np.sin(1.5 * np.pi * X)

np.random.seed(0)
n_samples = 40

X = np.random.rand(n_samples)
X_train = np.sort(X[:n_samples//2])
y_train = true_fun(X_train) + np.random.randn(n_samples//2) * 0.1

X_test = np.sort(X[n_samples//2:])
y_test = true_fun(X_test) + np.random.randn(n_samples//2) * 0.1

Plot both the training and testing examples in a scatter plot and show the true
function curve for the range [0, 1).

Listing 1.2: Plotting the dataset.

import matplotlib.pyplot as plt

x = np.linspace(0, 1, 100)
plt.plot(x, true_fun(x), label="True function")
plt.scatter(X_train, y_train, edgecolor="b", s=20, label="Train Examples")
plt.scatter(X_test, y_test, edgecolor="r", s=20, label="Test Examples")
plt.xlabel("x")
plt.ylabel("y")
plt.xlim((0, 1))
plt.ylim((-1.5, 1.5))

plt.legend(loc="best")
plt.title("Toy Dataset")
plt.show()

You should get a figure similar to Figure 1.1

1.1.2 Linear regression

Let’s begin with a simple linear regression model. I.e., we will fit a line to the training
set of the form
h(x) = wo + w1 x,
where x is the input feature, w0 is the intercept of the fitted line, and w1 is its slope.
We will use the linear regression implementation from sklearn to get our model. Plot
the fitted line.

2
Toy Dataset
1.5
True function
Train Examples
1.0 Test Examples
0.5

0.0
y

0.5

1.0

1.5
0.0 0.2 0.4 0.6 0.8 1.0
x

Figure 1.1: Toy dataset for regression.

Listing 1.3: Linear regression.

from sklearn.linear_model import LinearRegression

linear_regression = LinearRegression()
linear_regression.fit(X_train[:, np.newaxis], y_train)

plt.plot(x, true_fun(x), label="True function")

plt.scatter(X_train, y_train, edgecolor="b", s=20, label="Train Examples")
plt.plot(x, linear_regression.predict(x[:, np.newaxis]), label="Model")
plt.xlabel("x")
plt.ylabel("y")
plt.xlim((0, 1))
plt.ylim((-1.5, 1.5))

plt.legend(loc="best")
plt.title("Linear regression")
plt.show()

You should get a figure similar to Figure 1.2

Task 1: Compute the mean squared error of the learned linear model on the test set.

3
Linear regression
1.5
True function
Training Examples
1.0 Model
0.5

0.0
y

0.5

1.0

1.5
0.0 0.2 0.4 0.6 0.8 1.0
x

Figure 1.2: Linear regression.

1.1.3 Polynomial regression

The linear model we got in the previous part is too simple to explain the data. In
this part we will use more complex models. Let’s start with a quadratic function of
the form

h(x) = wo + w1 x + w2 x2 ,
To get the non-linear basis functions for the quadratic model, we will use Polyno-
mialFeatures from sklearn

Listing 1.4: Quadratic model.

from sklearn.preprocessing import PolynomialFeatures

polynomial_features = PolynomialFeatures(2, include_bias=True)

q_model = LinearRegression()
q_model.fit(polynomial_features.fit_transform(X_train[:, np.newaxis]),
y_train)
x = np.linspace(0, 1, 100)
plt.plot(x, true_fun(x), label="True function")
plt.scatter(X_train, y_train, edgecolor="b", s=20, label="Samples")
plt.plot(x, q_model.predict(polynomial_features.fit_transform(x[:,
np.newaxis])), label="Model")
plt.xlabel("x")

4
plt.ylabel("y")
plt.xlim((0, 1))
plt.ylim((-1.5, 1.5))

plt.legend(loc="best")
plt.title("Degree 2")
plt.show()

The results will be similar to Figure 1.3

Degree 2
1.5
True function
Samples
1.0 Model
0.5

0.0
y

0.5

1.0

1.5
0.0 0.2 0.4 0.6 0.8 1.0
x

Figure 1.3: Quadratic model for regression.

Task 2: Repeat the previous part with polynomials with degree 4 and 15.

Task 3: Compute the mean squared error of the learned models on the test set. Which
model is the best?

1.2 Classification
Classifications is a supervised learning task where the goal is to predict a categorical
(discrete) target label. In this part we will experiment with random forests and SVM
classifiers on a toy dataset.

5
1.2.1 Generating examples
In this part, we will generate a toy dataset for classification. We are going to use the
make classification function from sklearn to generate the data and then split them
into training and testing sets using train test split.

Run the following code to generate and visualize the dataset.

Listing 1.5: Toy dataset for classification.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

X, y = make_classification(n_samples=500, n_features=2, n_classes=2,

n_informative=2, n_redundant=0,
n_clusters_per_class=2, random_state=0,
shuffle=True, class_sep=1.5)

X_train, X_test, y_train, y_test = train_test_split(X, y,

test_size=0.33, random_state=0)

colors = ["b", "r"]

fig, (ax1, ax2) = plt.subplots(1, 2)
ax1.set_title("Training Set")
ax2.set_title("Testing Set")
for c in np.unique(y_train):
ax1.scatter([X_train[i, 0] for i in np.where(y_train==c)] ,
[X_train[i, 1] for i in np.where(y_train==c)],
edgecolor=colors[c], s=20, label="Samples")
ax2.scatter([X_test[i, 0] for i in np.where(y_test==c)] ,
[X_test[i, 1] for i in np.where(y_test==c)],
edgecolor=colors[c], s=20, label="Samples")

After running the code above, you should get something similar to Figure 1.4

1.2.2 Classification with random forests

A random forest is an ensemble model that fits a number of decision tree classifiers
on various sub-samples of the dataset. Each tree is trained on a bootstrap samples of
the training set to introduce randomness in the trees. Furthermore, when selecting
a feature for a test node during tree construction, only a subset of the features is
considered for the candidate tests. The final prediction in random forests is usually
obtained by averaging predictions from all trees.
Let’s start by testing a random forest of 2 trees on the dataset we generated in
the previous section. The following code trains a random forest on the training set
and prints both the training and testing accuracy.

6
Training Set Testing Set

4 4

2 2

0 0

2 2

4 4
4 2 0 2 4 4 2 0 2 4

Figure 1.4: Toy dataset for classification.

Listing 1.6: Random forest classifier.

from sklearn.ensemble import RandomForestClassifier

clf = RandomForestClassifier(n_estimators=2,
criterion="entropy",
max_features="sqrt", max_samples=.8,
random_state=0)

clf.fit(X_train, y_train)
print("Training Accuracy: ",clf.score(X_train, y_train))
print("Testing Accuracy: ",clf.score(X_test, y_test))

For classification, accuracy is not the only used metrics. There are many other
metrics such as precision, recall, F1-score, ...etc. All of these metrics can be derived
form the the confusion matrix. The following code computes the confusion matrix
for the random forest trained in the previous part.

Listing 1.7: Confusion Matrix.

from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

y_pred = clf.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
disp = ConfusionMatrixDisplay(confusion_matrix=cm,
display_labels=clf.classes_)
disp.plot()

7
Task 4: In the previous example, compute the precision, recall, F1-score. Discuss
when accuracy is not the most suitable metric.

Task 5: Train different random forests by changing the number of trees from 2 to 15.
Plot the training and testing accuracy of the trained models vs the number of trees.
What do you notice?

Decision Surface
Decision surface is a plot that shows how a machine learning model divides the feature
space to different class labels. The following code visualizes the decision surface of
our random forest model in the previous example. The result should be similar to
Figure 1.5

Listing 1.8: Decision Surface.

import matplotlib.pyplot as plt
import numpy as np
from matplotlib.colors import ListedColormap
from sklearn.ensemble import RandomForestClassifier

cmap = plt.cm.RdBu
plot_step = 0.02 # fine step width for decision surface contours

clf = RandomForestClassifier(n_estimators=2, criterion="entropy",

max_features="sqrt", max_samples=.8,
random_state=0)
clf.fit(X_train, y_train)

# Now plot the decision boundary using a fine mesh as input to a

# filled contour plot
x_min, x_max = X_train[:, 0].min() - 1, X_train[:, 0].max() + 1
y_min, y_max = X_train[:, 1].min() - 1, X_train[:, 1].max() + 1
xx, yy = np.meshgrid(
np.arange(x_min, x_max, plot_step),
np.arange(y_min, y_max, plot_step))

estimator_alpha = 1.0 / len(clf.estimators_)

for tree in clf.estimators_:
Z = tree.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
cs = plt.contourf(xx, yy, Z, alpha=estimator_alpha, cmap=cmap)

colors = ["r", "b"]

for c in np.unique(y_train):
plt.scatter([X_train[i, 0] for i in np.where(y_train==c)] ,
[X_train[i, 1] for i in np.where(y_train==c)],
edgecolor=colors[c], s=20, label="Samples")

8
4

4
4 2 0 2 4

Figure 1.5: Decision Surface.

Task 6: Draw the decision surface for a set of random forests with different number
of trees. Do you notice any pattern?

1.2.3 Classification with SVM

In this section, we will test another popular classifier, which is support vector machine
(SVM). SVM tries to find a separating hyper-plane between the classes with rthe
maximum margin. The following code trains an SVM classifier with linear kernel on
our toy dataset

Listing 1.9: SVM classifier.

from sklearn import svm

clf = svm.SVC(kernel="linear", C=100)

clf.fit(X_train, y_train)
print("Training Accuracy: ",clf.score(X_train, y_train))
print("Testing Accuracy: ",clf.score(X_test, y_test))

Task 7: Draw the decision surface for the SVM model in the previous example.

Task 8: Train an SVM model with rbf kernel and draw its decision boundary. What
do you notice?

Task 9: Repeat task 8 but with C equal 0.5, 1, 100, and 1000. What do you notice?

9
1.3 Hyper-parameters selection
Hyper-parameters are parameters that are set before starting the training process
and not directly learnt within estimators. Typical examples include number of trees
and max depth in random forests, C and kernel for SVM ...etc.
To select hyper-parameters, usually we search the hyper-parameter space for the
best cross-validation score (or if the dataset is large enough, we could search for the
parameters with the best score on a validation set).
The following code use grid search with cross-validation to find the best hyper-
parameters for the SVM example in the previous section.

Listing 1.10: Hyper-parameters selection.

from sklearn.model_selection import GridSearchCV
from sklearn import svm

parameters = {"kernel":("linear", "rbf"), "C":[1, 10, 100]}

svc = svm.SVC()

clf = GridSearchCV(svc, parameters, cv=5)

clf.fit(X_train, y_train)
print(clf.best_params_)

Task 10: GridSearchCV has an attribute called cv results , which is a dict with
keys as column headers and values as columns. Print it and try to explain the values
that you get.

1.4 To DO
This part will be given by the instructor during the lab.

Week 7 Laboratory Activity
No ratings yet
Week 7 Laboratory Activity
12 pages
ML Manual With Outputs
No ratings yet
ML Manual With Outputs
30 pages
08 CSE358 Intro To Machine Learning II
No ratings yet
08 CSE358 Intro To Machine Learning II
100 pages
C2W3 Lab 01 Model Evaluation and Selection
No ratings yet
C2W3 Lab 01 Model Evaluation and Selection
21 pages
C2W3 Lab 01 Model Evaluation and Selection
No ratings yet
C2W3 Lab 01 Model Evaluation and Selection
21 pages
P05 The Regression Pipeline - Training and Testing Ans
No ratings yet
P05 The Regression Pipeline - Training and Testing Ans
13 pages
Machine Learning Evaluation Guide
100% (1)
Machine Learning Evaluation Guide
504 pages
MlLabManualdocx 2024 09 04 22 02 58
No ratings yet
MlLabManualdocx 2024 09 04 22 02 58
19 pages
ML Lab 01999676272
No ratings yet
ML Lab 01999676272
12 pages
Lecture 7.2 - DTC Algorithm Implementation
No ratings yet
Lecture 7.2 - DTC Algorithm Implementation
7 pages
Machine Learning Guide for Beginners
No ratings yet
Machine Learning Guide for Beginners
24 pages
Machine Learning Model Evaluation
No ratings yet
Machine Learning Model Evaluation
437 pages
Moocs Ritesh
No ratings yet
Moocs Ritesh
22 pages
Statistical Learning Slides
No ratings yet
Statistical Learning Slides
60 pages
ML File - Merged
No ratings yet
ML File - Merged
24 pages
Mlaifile1 3
No ratings yet
Mlaifile1 3
27 pages
Shobit Sharma (2124399) ML Lab File PDF
No ratings yet
Shobit Sharma (2124399) ML Lab File PDF
19 pages
Iris Dataset EDA & ML Techniques
100% (2)
Iris Dataset EDA & ML Techniques
24 pages
Aiml Practicals
No ratings yet
Aiml Practicals
22 pages
ML Lab Programs 2
No ratings yet
ML Lab Programs 2
16 pages
5b Python Implementation of Decision Tree
No ratings yet
5b Python Implementation of Decision Tree
7 pages
Machine Learning With SQL
100% (1)
Machine Learning With SQL
12 pages
Machine Learning Strategies
No ratings yet
Machine Learning Strategies
59 pages
Lab Manual 04
No ratings yet
Lab Manual 04
12 pages
20MEMECH Part 3 - Classification
No ratings yet
20MEMECH Part 3 - Classification
49 pages
TD2345
No ratings yet
TD2345
3 pages
ML Unit 2
No ratings yet
ML Unit 2
37 pages
ML Prac1-10
No ratings yet
ML Prac1-10
32 pages
ML Lab Manual
No ratings yet
ML Lab Manual
13 pages
ADS - Phase 3
No ratings yet
ADS - Phase 3
34 pages
Lecture 1 2022
No ratings yet
Lecture 1 2022
55 pages
Data Science
No ratings yet
Data Science
8 pages
Python Implementation of Random Forest Algorithm
No ratings yet
Python Implementation of Random Forest Algorithm
10 pages
CS3491 Lab Manual
No ratings yet
CS3491 Lab Manual
21 pages
ES335
No ratings yet
ES335
22 pages
Tutorial 6
No ratings yet
Tutorial 6
8 pages
Model Learning Steps
No ratings yet
Model Learning Steps
12 pages
ML Lab Manual
No ratings yet
ML Lab Manual
38 pages
Machine Learning Classification Bootcamp Cheatsheet
No ratings yet
Machine Learning Classification Bootcamp Cheatsheet
7 pages
Dsbda 10
No ratings yet
Dsbda 10
5 pages
ML Lab Manual
No ratings yet
ML Lab Manual
14 pages
Linear Regression (Code)
No ratings yet
Linear Regression (Code)
9 pages
Python Predictive Modeling
No ratings yet
Python Predictive Modeling
24 pages
ML Models
No ratings yet
ML Models
21 pages
FIND-S Algorithm Implementation
No ratings yet
FIND-S Algorithm Implementation
51 pages
C1 W1 Lab02 Model Representation Soln
No ratings yet
C1 W1 Lab02 Model Representation Soln
7 pages
Linear Regression Lab Guide
No ratings yet
Linear Regression Lab Guide
5 pages
C1 W1 Lab02 Model Representation Soln
No ratings yet
C1 W1 Lab02 Model Representation Soln
5 pages
C1 W1 Lab02 Model Representation Soln
No ratings yet
C1 W1 Lab02 Model Representation Soln
7 pages
Data Mining Assignment No. 1
No ratings yet
Data Mining Assignment No. 1
7 pages
PythonForML2023 Laboratory07 08 Regression Classification Update2
No ratings yet
PythonForML2023 Laboratory07 08 Regression Classification Update2
6 pages
Machine Learning Cheat Sheet
No ratings yet
Machine Learning Cheat Sheet
15 pages
Lab Experiment 5
No ratings yet
Lab Experiment 5
5 pages
C1 W1 Lab03 Model Representation Soln-Copy1
No ratings yet
C1 W1 Lab03 Model Representation Soln-Copy1
7 pages
Enthought Python Machine Learning SciKit Learn Cheat Sheets 1 3 v1.0
No ratings yet
Enthought Python Machine Learning SciKit Learn Cheat Sheets 1 3 v1.0
3 pages
MLSP Lab Exp4
No ratings yet
MLSP Lab Exp4
9 pages
Skit Learn Cheatsheet
No ratings yet
Skit Learn Cheatsheet
11 pages
Supple Maximizing Performance in Cs CuBiCl
No ratings yet
Supple Maximizing Performance in Cs CuBiCl
5 pages
Practicalpgm ML
No ratings yet
Practicalpgm ML
33 pages
IJANS - Format - Phytochemicals Analysis of Various Parts of The Avocado Plant - Persea Americana
No ratings yet
IJANS - Format - Phytochemicals Analysis of Various Parts of The Avocado Plant - Persea Americana
11 pages
S Agrees To Sing in M's Theatre - S Dies in The Meanwhile The Contract Is
100% (1)
S Agrees To Sing in M's Theatre - S Dies in The Meanwhile The Contract Is
9 pages
Chinese Firm
No ratings yet
Chinese Firm
13 pages
Blood Bank I
No ratings yet
Blood Bank I
136 pages
Template - WordPress Web Design Proposal
100% (1)
Template - WordPress Web Design Proposal
6 pages
Eloise Henry For Richmond Heights Mayor
No ratings yet
Eloise Henry For Richmond Heights Mayor
4 pages
Bajrang Dal Hate Speeches Against Muslims - Sanjeev Sabhlok's Revolutionary Blog
No ratings yet
Bajrang Dal Hate Speeches Against Muslims - Sanjeev Sabhlok's Revolutionary Blog
7 pages
Ventilation Solutions for Homes
No ratings yet
Ventilation Solutions for Homes
4 pages
Flexible Polyurethane Foam A Primer
No ratings yet
Flexible Polyurethane Foam A Primer
7 pages
Sertifikat Kalibrasi Elcometer
100% (1)
Sertifikat Kalibrasi Elcometer
2 pages
施耐德SD328变频器说明书
No ratings yet
施耐德SD328变频器说明书
11 pages
Inter Islamic Physics 2 2024
No ratings yet
Inter Islamic Physics 2 2024
4 pages
Employee Engagement Survey
100% (1)
Employee Engagement Survey
3 pages
Technology Entrepreneurship Syllabus
No ratings yet
Technology Entrepreneurship Syllabus
6 pages
Food Web Stations
No ratings yet
Food Web Stations
8 pages
Learner'S Licence: Form 3 (See Rule 3 (A) and 13)
No ratings yet
Learner'S Licence: Form 3 (See Rule 3 (A) and 13)
1 page
ISO Guide 73:2009, 3.5.1.3 ISO Guide 73:2009, 3.6.1.3 ISO Guide 73:2009, 3.6.1.1
100% (1)
ISO Guide 73:2009, 3.5.1.3 ISO Guide 73:2009, 3.6.1.3 ISO Guide 73:2009, 3.6.1.1
2 pages
Book 15 - Test 3 - Reading - Wordlist
No ratings yet
Book 15 - Test 3 - Reading - Wordlist
6 pages
Diode Characteristics Lab Guide
No ratings yet
Diode Characteristics Lab Guide
5 pages
Iq2wjqjhfkkby1lyhkyd5z3i
No ratings yet
Iq2wjqjhfkkby1lyhkyd5z3i
2 pages
Understanding Center of Gravity
No ratings yet
Understanding Center of Gravity
10 pages
The Stony Brook Press - Volume 26, Issue 10
No ratings yet
The Stony Brook Press - Volume 26, Issue 10
48 pages
Networking Essentials - Network Types
No ratings yet
Networking Essentials - Network Types
1 page
Teacher Level 2
No ratings yet
Teacher Level 2
6 pages
Skull and Bones Chart
No ratings yet
Skull and Bones Chart
63 pages
Q2 DLP For UCSP MELC BASED
No ratings yet
Q2 DLP For UCSP MELC BASED
3 pages
Lithogeochemistry Interpretation
100% (5)
Lithogeochemistry Interpretation
45 pages
PA 304 HRD - IT and HRIS
No ratings yet
PA 304 HRD - IT and HRIS
23 pages
Diseño de Vigas
No ratings yet
Diseño de Vigas
12 pages
Coding Prony 'S Method in MATLAB and Applying It To Biomedical Signal Filtering
No ratings yet
Coding Prony 'S Method in MATLAB and Applying It To Biomedical Signal Filtering
14 pages

Exp4 - Supervised Learning

Uploaded by

Exp4 - Supervised Learning

Uploaded by

Experiment 4 - Supervised Learning

In order to check your installation you can use

1.1.1 Generating examples

Listing 1.2: Plotting the dataset.

You should get a figure similar to Figure 1.1

1.1.2 Linear regression

Figure 1.1: Toy dataset for regression.

Listing 1.3: Linear regression.

plt.plot(x, true_fun(x), label="True function")

You should get a figure similar to Figure 1.2

Figure 1.2: Linear regression.

1.1.3 Polynomial regression

Listing 1.4: Quadratic model.

polynomial_features = PolynomialFeatures(2, include_bias=True)

The results will be similar to Figure 1.3

Figure 1.3: Quadratic model for regression.

Run the following code to generate and visualize the dataset.

Listing 1.5: Toy dataset for classification.

X, y = make_classification(n_samples=500, n_features=2, n_classes=2,

X_train, X_test, y_train, y_test = train_test_split(X, y,

colors = ["b", "r"]

1.2.2 Classification with random forests

Figure 1.4: Toy dataset for classification.

Listing 1.6: Random forest classifier.

Listing 1.7: Confusion Matrix.

Listing 1.8: Decision Surface.

clf = RandomForestClassifier(n_estimators=2, criterion="entropy",

# Now plot the decision boundary using a fine mesh as input to a

estimator_alpha = 1.0 / len(clf.estimators_)

colors = ["r", "b"]

Figure 1.5: Decision Surface.

1.2.3 Classification with SVM

Listing 1.9: SVM classifier.

clf = svm.SVC(kernel="linear", C=100)

Listing 1.10: Hyper-parameters selection.

parameters = {"kernel":("linear", "rbf"), "C":[1, 10, 100]}

clf = GridSearchCV(svc, parameters, cv=5)

You might also like