The aim of this assignment is to learn the application of machine learning algorithms to data sets. This involves learning what data means, how to handle data, training, cross validation, prediction, testing your model, etc.
This dataset contains information collected by the U.S Census Service concerning housing in the area of Boston Mass. It was obtained from the StatLib archive, and has been used extensively throughout the literature to benchmark algorithms. The data was originally published by Harrison, D. and Rubinfeld, D.L. Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978.
The dataset is small in size with only 506 cases. It can be used to predict the median value of a home, which is done here. There are 14 attributes in each case of the dataset. They are:
CRIM
- per capita crime rate by townZN
- proportion of residential land zoned for lots over 25,000 sq.ft.INDUS
- proportion of non-retail business acres per town.CHAS
- Charles River dummy variable (1 if tract bounds river; 0 otherwise)NOX
- nitric oxides concentration (parts per 10 million)RM
- average number of rooms per dwellingAGE
- proportion of owner-occupied units built prior to 1940DIS
- weighted distances to five Boston employment centresRAD
- index of accessibility to radial highwaysTAX
- full-value property-tax rate per $10,000PTRATIO
- pupil-teacher ratio by townB
- 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by townLSTAT
- % lower status of the populationMEDV
- Median value of owner-occupied homes in $1000's
- To implement a linear regression with regularization via gradient descent.
- to implement gradient descent with Lp norm, for 3 different values of p in (1,2]
- To contrast the difference between performance of linear regression Lp norm and L2 norm for these 3 different values.
- Tally that the gradient descent for L2 gives same result as matrix inversion based solution.
All the code is written in a single python file. The python program accepts the data directory path as input where the train and test csv files reside. Note that the data directory will contain two files train.csv
used to train your model and test.csv
for which the output predictions are to be made. The output predictions get written to a file named output.csv
. The output.csv
file should have two comma separated columns [ID,Output].
NumPy
library would be required, so code begins by importing it- Import
phi
andphi_test
from train and test datasets usingNumPy
'sloadtxt
function - Import
y
from train dataset using theloadtxt
function - Concatenate coloumn of 1s to right of
phi
andphi_test
- Apply min max scaling on each coloumn of
phi
andphi_test
- Apply log scaling on
y
- Define a function to calculate change in error function based on
phi
,w
andp
norm - Make a dictionary containing filenames as keys and
p
as values - For each item in this dictionary
- Set the
w
to all 0s - Set an appropriate value for
lambda
andstep size
- Calculate new value of
w
- Repeat steps until error between consecutive
w
s is less than threshold - Load values of
id
from test data file - Calculate
y
for test data usingphi_test
and applying inverse log - Save the
id
s andy
according to filename from dictionary
- Set the
- Coloumns of
phi
are not in same range, this is because their units are different i.ephi
is ill conditioned - So, min max scaling for each coloumn is applied to bring them in range 0-1
- Same scaling would be required on coloumns of
phi_test
- Log scaling was used on
y
. This was determined by trial and error
- As p decresases error in y decreases
- As p decreases norm of w increases but this can be taken care by increasing lambda
- As p decreases number of iterations required decreases
- If
p
is fixed andlambda
is increased error decreases upto a certainlambda
, then it starts rising - So,
lambda
was tuned by trial and error. - Starting with 0,
lambda
was increased in small steps until a minimum error was achieved.
- Error from L2 Gradient descent were 4.43268 and that from closed form solution was 4.52624.
- Errors are comparable so, the L2 gradient descent performs closely with closed form solution.