[go: up one dir, main page]

0% found this document useful (0 votes)
65 views11 pages

Cs7602 - Machine Learning Assignment 1: Submitted by

This document describes assignments for a machine learning course involving single and multi-layer perceptrons and linear regression. It analyzes two datasets - a diabetes classification dataset and auto MPG regression dataset. For perceptrons, it varies learning rates and iterations to find the highest accuracy. The multi-layer perceptron achieves best results with a learning rate of 0.03 and 5000 iterations. Linear regression models miles per gallon using cylinders, displacement and horsepower as independent variables. It splits the auto data into train and test sets and calculates the gradient, intercept and error.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views11 pages

Cs7602 - Machine Learning Assignment 1: Submitted by

This document describes assignments for a machine learning course involving single and multi-layer perceptrons and linear regression. It analyzes two datasets - a diabetes classification dataset and auto MPG regression dataset. For perceptrons, it varies learning rates and iterations to find the highest accuracy. The multi-layer perceptron achieves best results with a learning rate of 0.03 and 5000 iterations. Linear regression models miles per gallon using cylinders, displacement and horsepower as independent variables. It splits the auto data into train and test sets and calculates the gradient, intercept and error.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

CS7602 - MACHINE LEARNING

ASSIGNMENT 1

SUBMITTED BY

JAYASREE LAKSHMI NARAYAN 2016103033


AKHILA G P 2016103503

DATE : 28-01-2019

CONTENTS

1. SINGLE PERCEPTRON
2. MULTI LAYER PERCEPTRON
3. LINEAR REGRESSION (WITH SINGLE VARIABLE)

DATASETS USED

1. PIMA INDIAN DIABETES DATASET (CLASSIFICATION)


2. AUTO-MPG DATASET (REGRESSION)
A DESCRIPTION ON THE DATASET UNDER STUDY
1. PERCEPTRON

The jupyter notebook with the code is uploaded in Github and the link for the
document is https://github.com/Akhilagp/ML_Assignment.

PROCEDURE:
 The perceptron is based on activation and threshold concept.
 A neuron fires when the output of the activation function is above the threshold
set.
 It has a single layer of neurons with random weights attached to it.
 PARAMETERS VARIED For Understanding
1. Learning rate
2. Number of Iterations

INFERENCE:
 The perceptron does well on the training set of the pima dataset, when the
number of iterations are higher for a particular learning rate.
 A nominal learning rate produces a good result on the preprocessed set.

OUTPUT:
Learning rate Number of Iterations Accuracy
100 0.6197916667
500 0.7057291667
0.01
1000 0.703125
2000 0.6979166667
100 0.6276041667
500 0.7083333333
0.03
1000 0.703125
2000 0.7083333333
100 0.6380208333
500 0.6770833333
0.1
1000 0.671875
2000 0.6770833333
100 0.6432291667
500 0.7213541667
0.25
1000 0.6979166667
2000 0.7083333333
100 0.7213541667
0.3 500 0.7135416667
1000 0.7083333333
2000 0.671875

A learning rate of 0.25 and 500 iterations was the highest recorded accuracy for the
particular run. By testing the algorithm, an accuracy of 78% was achieved.
2. MULTI LAYER PERCEPTRON

The jupyter notebook with the code is uploaded in Github and the link for the
document is https://github.com/Akhilagp/ML_Assignment.
PROCEDURE :
 Ten nodes were used in the hidden layer.
 Running a logistic function, on the training data, ouputs were obtained and
tabulated.
 The dataset was split into training set (50%), validation set (20%) and test set
(30%).
 PARAMETERS VARIED For a Deeper Insight
1. Learning rate (eta)

2. Number of Iterations

INFERENCE:
 Higher the learning rate, converging of the descent is not proper and the error
seems to increase or stay stable.
 With lower learning rate(<0.1), accuracy is high and loss is minimized.
 Increasing the hidden nodes from 5 to 10 seem to increase the accuracy of the
classifier.

OUTPUT:
To support the inferences made, the algorithm was run for different learning rates
(0.001 < eta < 0.9) for different iterations ( 1000 < it < 9000) . The accuracy and loss
for each variation is tabulated below

Learning rate Number of Iterations Accuracy Error


1000 88.5416666667 18.6579579272
0.001 2500 88.5416666667 17.8883626083
5000 89.84375 16.8562355692
1000 90.625 16.2648364351
0.003 2500 90.8854166667 15.0400030387
5000 92.96875 13.2560325229
0.01 1000 94.2708333333 12.0870142879
2500 95.0520833333 10.6999256481
5000 95.3125 9.2779888677
1000 94.53125 10.5823039961
0.03 2500 92.4479166667 12.8231271803
5000 95.33 8.9968897062
1000 77.6041666667 39.5647531552
0.1 2500 83.8541666667 29.0762649929
5000 80.9895833333 29.8457619597
1000 74.21875 47.2384863936
0.3
2500 68.78 59.9290039822
5000 68.75 59.8750835422

The row corresponding to learning rate 0.03 and 5000 iteration shows minimum error
and maximum accuracy. As the learning rate increases, the dataset gets over-fitted
leading to a increasing value of error. The algorithm on test set produced an accuracy
of 71-75%.
3. LINEAR REGRESSION

Linear regression is a linear approach to modeling the relationship between a dependent variable
and one or more independent variables.

The Error in a linear regression is calculated as follows

The Code is uploaded in Github and the link is


https://github.com/Akhilagp/ML_Assignment.

The weights can be adjusted by the following formula

weights = (XTX)-1XTy

PROCEDURE:

 The Auto-mpg dataset is split into training(80%) and test sets(20%) and the regression is
carried out on the input features.

 The features considered were


1. Dependent variable: miles per gallon (mpg)
2. Independent variables: cylinders, displacement and horsepower

 The data is normalized and split.

 The gradient and the intercept for the calculation of the decision boundary line is obtained
from stats module

gradient, intercept, r_value, p_value, std_err =


stats.linregress(xtrain,ytrain)

 The gradient turns out to be negative implying the negative co-relation between the
variables taken.

 PARAMETERS VARIED For Insight:


1. Split size of Training and testing
2. Independent variables taken for Linear Regression

INFERENCES:

The value of cost function/ error is computed and is found to be in powers of -26. The theta /
weights matrix returned will be a column vector.

Training Testing
Cylinders vs mpg Cylinders vs mpg

Training Testing
Displacement vs mpg Displacement vs mpg
Training Testing
Horsepower vs mpg Horsepower vs mpg

You might also like