[go: up one dir, main page]

0% found this document useful (0 votes)
206 views14 pages

100 Days of Machine Learning

The document discusses various techniques for feature engineering in machine learning including handling missing values, encoding categorical features, scaling features, feature selection and transformation, encoding mixed data types, and handling date/time features. Common techniques include imputation, normalization, one-hot encoding, pipelines and column transformers.

Uploaded by

HASMUKH RUSHABH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
206 views14 pages

100 Days of Machine Learning

The document discusses various techniques for feature engineering in machine learning including handling missing values, encoding categorical features, scaling features, feature selection and transformation, encoding mixed data types, and handling date/time features. Common techniques include imputation, normalization, one-hot encoding, pipelines and column transformers.

Uploaded by

HASMUKH RUSHABH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 14

100 Days of Machine learning

Feature Engineering :
1. Transformation : Missing Value , Handling categorical value , Outiler
detection , Feature Slection
2. Construction
3. Selction
4. extraction

Missing Value : -> mean , median , mode


Categorical Value : -> Encoding
Outlier Dection : -> it is very harmful when you predicted the value
Feature Scaling : -> Example : Age vs Salary then we have to convert that into many
techinches like min- max scaling , standrization , normailization , mean absolute
scaling

1. Feature scaling : types -> standarzation and normailization

A) Standarzation :

Z-Score use the formaula = z = xi - x / sigma


in when you applied in any column mean value = 0 and standrad deviation will be 1
there is scaling having mean centerning and scaling by the facti=or standard
devation the size of scatter will be compresed
Note what happen to outiler whn u use z - score method ?

In some algothrim the standization doesnt work and outilers doesnt effect by
standrazation
when to use this algothrim : KNN , K-Mean , PCA , Atrifical nueral network ,
Gradient Descent

B) Normalization : This has many methods like minmax scale , maxabsscalling ,


Robustscalling
B.1) MinMaxScalling => Xi = Xi - Xmin / Xmax - Xmin { In this
Xmin is minmum value from dataset and Xmax will maximum value } and range will be
from [0,1]
B.2) Mean Normalization => Xi = Xi - Xmean / Xmax - Xmin {This is
mean centering} and range will be [-1 , 1] <= there is no libaray so you have to
write the code
B.3) MAxAbsScaling => Xi = Xi / |Xmax| {This is genrally uesd
when the dataset has the value of 0's}
B.4) RobustScalling => Xi - Xi - Xmeadian / IQR {This method is
slightl better than other it generally used for outliers}

When you working with data then ask some questions like
? Is feature scaling requried
? Most use is standazation then minmax scalling(image processing)

@@@@@ Diffrence between the normalization and standarziation is :


Standrization is divided by the standard devation after mean is substracted
In normalization is data is tranformed into a range of between 0 and 1 .

2. Encoding : it used for categorical data {Nominal -> there is no relationship


within the column (Ex: state) , ordinal -> is realtionship between the data (Ex:
review)}

1. Ordinal Encoding : This is used when we need output as integer


2. Label Encoding : This is used when you want output as categorical
When you have diffrent type categorical data then we have to do piepline
>>> # For Odinal Encoding
>>> from sklearn.preprocessing import OrdinalEncoding
>>> oe = OrdinalEncoding
>>> oe.OrdinalEncoding(categories=[['Poor' , 'Average', 'Good'] , ['School' ,
'UG' , 'PG']]
>>> X_train = oe.tranform(X_train)
>>> X_test = oe.tranform(X_test)

>>> # if i want output as categories so use label encoding


>>> from sklearn.preprocessing import LabelEncoding
>>> le = LabelEncoding()3+
>>> le.fit(y_train)
>>> y_train = le.tranform(y_train)
>>> y_test = le.taranform(t_test)

3. OneHotEncoding : Now it is for one hot encoding -> convert categorical to


intger but it has no realtionshpim between the data
DummyVariableTrap :
When you apply one hot encoding then N columns created as number of
diffrent data is there in that column then we have to remove the n-1 columns
because of colinearity
and the reason is that there will be realtion between the columns when
you craete the one hot encoding so we do N-1

OHE using most frequent varaible

Code :
>>> # One hot encoding
>>> pd.get_dummies(df,column=['fuel','owner'],drop_first=True)

3. Column Tranfomer : When the columns has intger value (simple imputeation) ex:
Age , other has catgorical data but some has (one hot encoding) Ex: city ,
genders ; OrdinalEncoding -> review
so what to do in this case ?
Ans is column tranfomer
Step 1 import all the libraiers like numpy , pandas , simpleimputer ,
onehotencoding and odinalencoding
Step 2 split the data in X_train ,X_test , Y_train , Y_test
Step 3 now apply the column tranfomer

>>> from sklearn.compose import ColumnTranfomer


>>> tranfomer = ColumnTranfomer(tranfomers=[
('tnf1' , SimpleImputer() , ['Fever']),
('tnf2',
Ordinalencoding(categories=[['mild' ,'strong']]),['cough']),
('tnf3', OneHotEncoding(sparse = False , drop =
'First'),['Gender','City'])
] , reminder = 'passthrough')

>>> tranfomer.fit_transform(X_train)
>>> tranfomer.fit_transform(X_test)

4. Pipelines : SckitLearn says that pipelines chains together multiple steps so


that the output of each step is used as input step of next blog
Pipelines makes it esay to apply the same preprocessing to train and
test.
5. Methamatical Tranfomation :
a) Log tranformation
b) Reciporical Tranformation
c) Square , Square root or reciporical Tranformation
d) Box - Cox Tranformation
e) Yeo Johnson tranformation
Why we need this type of function because whne you work with data and
model like linear and logicistic then you have to normailzed the data then this
mathematical functions are used .
How to find if data is normal ?
=> There are three ways a) sns.displot b) pd.skew() = 0 -> normal
otherwose it is not c) QQ Plot is Imp and reliable
QQ Plot means whne you plot this QQ Plot then it will
plot the data and there is axis at 45 degree angle

Now [ All these can be acess by importing scipy.stats as scst ]


a) Log tranformation is not applied in -ve values and when it is right skew
data
b) Reciprocial tranformation is (1/x)
c) Square tranformation is (x^2) for left skew data
d) Square root tranformation is (sqrt(x))
e) Box - Cox Tranformation formula is => xi ^ lambda => there are two
posiiblity a) lambda =! 0 , (xi^ lambda - 1 / lambda) b) lambda = 0 (ln(xi))
Range is from -5 to 5 and only applicable on graeter than 0
f) Yeo - Johnson tranformation is the advance version of box cox
tranformation and also it applicable on all numbers

6. Encoding numerical data : means when you need to convert numerical data in
categorical data
Why we use this ? because
There are two types -> a) Discretization (binning) => range
=> use : handel outliers , to improve spread value
1) unsuprvised => 1.1 Equal width
1.2 Equal Frequency 1.3 KMean
2) supervised => 2.1 Desison
Binnning
3) Custom

@@@ Equal Width : max - min / bins 👌Outliers , no changes in spread


@@@ Quaintal Binning : here i have to tell how much bins you want at that
Each conatins 10% of total observation , 👌Outliers , work properly on spread
@@@ Kmean Binning : it make cluster and use when it data is cluster
@@@ Custom Binning : with pandas
>>> from sklearn.preprocessing import KBinsDiscretizer
>>> kbins_age = KBinsDiscretizer(n_bins = 10 , encode = 'orignal' ,
strategy = 'quantile')

b) Binarization => constinous value


converted into binary value 0 or 1

7. Handleing Mixed Varaible : categorical + numerical data and there are two types
of mixed data 1.) b5 , c23... 2.) 1 , 2 , 3 , A , B....
Both can be handle by
1.) B5 and C23 anc be diffrentie by categorical and numerical columns
2.) here is same as above cat and num cols but diffrence here is null value
is more

Code :

# extract the numeric value


df['number_numerical'] = pd.to_numeric(df["number"], error = 'coerce' ,
downcast = 'intger')

# extract the categorical part


df['number_categorical'] = np.where(df['numerical_number'].isnull() ,
df['number'], np.nan)

#if cabin part has B5 , C33 then the code is


df['cabin_num'] = df['cabin'].str.etract('(\d+)') # capture
numeric value
df['cabin_cat'] = df['cabin'].str[0] # capture
first character

# Ticket part A/C 25356 then code is

# extract the last bit of ticket as number

df['ticket_num'] = df['Ticket'].apply(lambda s: s.split()[-1])


df['ticket_num'] =
pd.to_numeric(df['ticket_num'],errors='coerce’,downcast="integer")

# extract the first part of ticket as category

df['ticket_cat'] = df['Ticket'].apply(lambda s: s.split()[@])


df['ticket_cat'] = np.where(df['ticket_cat'].str.isdigit(), np.nan,
df['ticket_cat'])

8. How to handle date and time column :


why date and time is imp when you work with date and time of data type then
this data has object datatype
# convert to datatime type of data
date['date'] = pd.to_datatime(date['date'])

1. extract year :
date['date_year'] = date['date'].dt.year

2. extract month
date['date_year_num'] = date['date'].dt.month => this give me numerical
value
date['date_year_cat'] = date['date'].dt.month_name() => this give me
categorical value

3. extract days
date['date_days'] = date['date'].dt.day

4. extarct weeks
date['date_dow'] = date['date'].dt.dayofweek

5. day of week - name


date['date_dow_name'] = date['date'].dt.day_name()

9. Handleing Missing Data :


There are many option -> a) Remove , b) Imputetion (univariat and
multivariant)

A. CCA (Complete Case Anaylis) : :list wise deletion


Do deltion when the data missing is randomly thats why we delete that
thing. (5% < missing data)

Code :
# to find the % of column missing value then this code
df.isnull().mean()*100

# After finding that out of 13 , 5 are missing then the code is


to find the columns that are less than 5% missing data
cols = [var for var in df.columns if df[var].isnull().mean()
<0.05 and df[var].isnull().mean() >0]
cols #This gives me the list of columns

# to check how much data is reamaing after droping


len(df[cols].dropna()) / len(df)

# to check dataset shape


new_df = df[cols].dropna()
df.shape , new_df.shape

# To check after deletion and before is any changes data


distribution
# if any changes happen then we have to reverse it
new_df.hist(bins = 50 , density = True , figsize(12 ,12))
plt.show()

# ! Always do that when you have numerical data plot the graph
that gives me the overview
fig = plt.figure()
ax = fig.add_subplot(111)

# orignal data
df['traning_hours'].hist(bins = 50, ax = ax , density = True ,
colors = 'red')

# data after cca


new_df['traning_hours'].hist(bins = 50, ax = ax , density =
True , colors = 'green' , alpha = 0.8)

@ After ploting the graph if we see the over lapping then the
data missing is at random

# Now for categorical columns


temp = pd.concat([
# %of observation per category , orginal data
df['eduction_level'].value_counts() / len(df)

# %of observation per category , cca data


df['eduction_level'].value_counts() / len(df)

B. How to fill up numerical data that is missing value (univaraite)


B.1) mean and Median
B.2) Arbitary Value
B.3) End of Distrubtion
B.4) Randomly value

1) When you use this , graph is the normallly distrubuted then


use mean , on left or right skew use median
WHAT IS BENFIT ? : Simplicity and better performance on
less missing value
Disadavantages : Distrubution graph changes , Outliers are
generaterd , covaranice or correaltion changes
WHEN TO USE ? : When data missing at randomly and less than
5%

When you apply the mean and median on the data which is
missing value then the variance changes shrink in the sense
@@@@@ Fit on the X_train and tranform on X_test

2) Arbitary Value imputation : Genrally used in the categorical


value
Benfit and Disadavantage is same as mean and median
imputeation
# Data is not missing at randomly then we us ethsis
technique

3) End of Distribution Imputenation : It will take the last value


from that column and use as filling the value
and here there are two case a) Normal Distrbuted then we
use (meam _+ 3sigma_)
b) Skewed Data Then we use IQR
proximative Q1 - 1.5*IQR and Q3 + 1.5*IQR
Q3- Q1

Benfit and Disadavantge is same as above both and when to


use then we use Data is not missing ata random

C. Handleing the Categorical Data


we can fill up this missing value in catgeorical value by using two
technque
a. Mode : This applicable when data is missing less than 10%
b. Missing Category Imputeation : THe applicable when data is
missing greater than 10%

##### When thae data csv file has more than 50 coluns and we have to use only
4 to 5 then we use this
df = pd.read_csv("File_name.csv" , usecols= ['ABC' , 'DEF' , 'GHI'])

D. Random Imputeation and Missing Indicator :


1. When you apply this technique then from data it select the value from
the given dataset
It is easy to apply by pandas no changes in varaince and distrubution
This is applicable on the linear and logistic regression but not on
tree algo

Disadavantage:
Here the covarince changes
The missing value of X_train while using this technique it consume
memory on server because when new data comes then it will take from this X_train
memory
Will suited for linear model

Code :
# for numerical
df.isnull()*mean() * 100 # to check the percanatge of mising value

#Now the imputeation


X = df.drop(columns = ['Surived'])
y = df['Surived']

X_train , X_test , y_train , y_test = train_test_split(X,y , test_size


= 0.2 , random_state = 2)

# Nothing i have changed new columns


X_train['Age_impute'] = X_train['Age']
X_test['Age_impute'] = X_test['Age']

# Now applying this technque


X_train['Age_impute'][X_train['Age_impute'].isnull()] =
X_train['Age'].dropna().sample(X_train['Age_impute'].isnull().sum()).values
X_test['Age_impute'][X_test['Age_impute'].isnull()] =
X_test['Age'].dropna().sample(X_test['Age_impute'].isnull().sum()).values

# Ploting the graph after and before


sns.displot(X_train['Age'] , label = 'Orignal')
sns.displot(X_train['Age_impute'] , label = 'Imputed')
plt.legend()
plt.show()

# The diffrence came in covaraince changes


X_train[['Fare' , 'Age' , 'Age_impute']].cov()

# When you apply this on ptoduction website when the code runs from the
column haveing value and predicted the other value just like fare and age (missing)
# Then for same fare value the age changes thats is bad modelsing so to
improve this we have to use below code
sample_value = X_train['Age'].dropna().sample(1,random_state =
int(observation['Fare']))

2. Missing Indicator
Age | Fare | Age_na
27 35 T
41 55 T
Na 41 F
62 22 T

In this method the model try to create to find the diffrence


between Age and Age_na

Code :
X = df.drop(columns = ['Surived'])
y = df['Surived']
X_train , X_test , y_train , y_test = train_test_split(X,y , test_size
= 0.2 , random_state = 2)
mi = MissingIndicator()
mi.fit(X_train)

# now the tranformation part


X_train_missing = mi.tranform(X_train)
X_test_missing = mi.tranform(X_test)

X_train['Age_Na'] = X_train_missing
X_test['Age_Na'] = X_test_missing

# and now train the model it improves the model accuracy


# And there is also class in Imputer thats is add to perform missing
indiactor
si = SimpelInputer(add_indiactor = True)

3. GridSearchCV : Just find which method is best for model

F. The above method is univaraite method now we are starting with


multivaraite
@ In the multivariate imputeation to fill any column it hepls other
columns fill that column that is missing

1. KNN Imputeation : is the imputetion which takes help of neighours


value trying to impute the value
Formula used -> Sqrt(weight * (a-b)^2 + (c-d)^2 + (e-f)^2)
{Here a-b the squarevalues }

Advanatage => Accurate


Disadvatage => More timetaken and Memory Consupation is more

Code :
>>> knn = KNNImputer(n_neighors = 5 , weights = 'distance')
>>> x_train_trf = knn.fit_tranform(x_train)
>>> x_test_trf = knn.tranform(x_test)

# here genraaly the weight are how the operation of Knn works
mean : in gerally k = 2 -> 70 + 50 / 2 = 60 value linear
# but in the diatance it multiply with each distance value with
orignal value gives me more accurate result

2. Alternative imputer (MICE) : When the data is Missiang at Random ,


More accuaret but disdavanyage is slow processing and memory consumpe more
here the data is impuation is done by the machine learning
algorthim here the data is filled by

Step 1 : Fill all the NaN values with mean of respective cols
Step 2 : Remove all col1 missing values
Step 3 : Predict the missing values cols using other cols
Step 4 : Remove all col2 missing values
Step 5 : Predict the missing values col2 using other cols
Step 6 : Remove all col3 value

from ever iteration there is two type of iteration one is the


mean valeu another regression or other model thing
make the substration on them example : iteration0(mean) -
itearation1(linear regression) = heer you have to make sure if the diffrence can be
0 or nearset to 0(0.23) then the we have predicted the accurate
value of column , iteration1(linear regression) - itearation2(logistic regression)
do it again and till the accurate the impuation is not done .

10. Outliers detection and removal :


A. Z-Score method : when to use when the data points are normally distrubuted
: Formula is xi' = xi - mean / std
if value outside the range -3 to +3 then it consider as
outliers
We can do it with two ways a) Trimming b) Capping

code of ploting normal distrubution :


>>> plt.figure(figsize = (16,5))
>>> plt.subplot(1,2,1)
>>> sns.displot(data['cgf'])

>>> plt.subplot(1,2,2)
>>> sns.displot(data['placement_exam_marks'])
>>> plt.show()

>>> # Trimming
>>> new_df = df[(df['cgf']<8.80) & (df['cgf']>5.11)]
>>> new_df

>>> # Calculating the Z score


>>> df['cgpa_zscore'] = (df['cgpa'] - df['cgpa'].mean()) /
df['cgpa'].std()

Note : only works in normal distrubution .

B. IQR Method : when data is skewed we can perform when The IQR method is
simple and effective, but its accuracy might vary depending on your data.
You can adjust the "1.5 times" multiplier to be more or less
strict in identifying outliers.
Just remember, outlier detection should always consider the
context and purpose of your analysis.

C. Using percentile method : The IQR method is simple and effective, but its
accuracy might vary depending on your data.
You can adjust the "1.5 times" multiplier to be more
or less strict in identifying outliers.
Just remember, outlier detection should always
consider the context and purpose of your analysis. ???

11. Feature Construction : is done by manually on basis of domen knowledge .


a. Feature spliting method : In machine learning, we do something similar,
but with data instead of toys.
We have different features (like size, color, and
material), and we want to split our data into groups based on those features.
This helps us understand our data better and make
predictions.

b. Curse of Dimensionality : When you take less feature or more features the
performance of model will be not optimal
Ex : Image , Textv data problem take place
Why it happen because when you have more dimension then the data will
be sparse so
Performnce decreses and Compuation will be disadvantge
Solution : Dimensitionality reduction
1. Feature selection (backward and forward
selection)
2. Feature extraction (PCA and LDA and tsne)

A) PCA : it is technique that tranform higher dimension to lower


dimension
Benfits : @ Size of Dimension is less so faster compuation
@ Visulation

varianvce - statsicatl technique that gives how much data is


spreaded is directly proporaltional
in pca to maximusize the varince gives me better performance on
pca

Covariance -> find the realtionship between x and y axis


Covariance matrix -> is special matrix and find all the axis
spread out and relation ship between the aixs
matrices -> are the linear tranformation that changes the
cornadinte system changes
egin vetor - > it doesnt change the vector direction on doing
transformation but it maginutide changes
egin values -> how much magniuted changes on egin vector
egin decompostion of Covariance matrix ->

Steps to do PCA
1.) Mean Centerning
2.) Find covarince matrix
3.) Find the egin value / vector
that in 3d vector get there 3 covarince a1(pc1) , a2(pc2) ,
a3(pc3)
for 3d to 1d -> pc1 only
for 3d to 2d -> pc1 + pc2
The fundamental is that the

Finding the optimal principal compontent

Disadvantges : data is plot as circle , two data cluster as


mirrior image , pattern

12. Linear Regression : Types


a. Simple -> 1 input and 1 output
b. Mutiple -> more than 1 input columns and 1 output
c. Polynomial ->
d. Regurlization

a. Simple Linear Regression : The maths is done in notebook


Sracth code of Gradient Deicent used in higher dimension of SGD
Regression

Code :
class MeraLR:
def __init__(self):
self.m = None
self.b = None

def fit(self,X_train , y_train):


num = 0
den = 0
for i in range(X_train.shape[0]):
num = num + ((X_train[i] - X_train.mean()) *
(y_train[i] - y_train.mean()))
dem = den + ((X_train[i] - X_train.mean()) *
(X_train[i] - X_train.mean()))

self.m = num / dem


self.b = y_train.mean() + (self.m * X_train.mean())
print(self.m)
print(self.b)

def predict(self , X_test):


return self.m * X_test + slef.b

import numpy as np
import pandas as pd

# Importing data and reading using panda futher we divided data as x


and y

X = df.iloc[:,0],values
y = df.iloc[:,1],values

# Spliting test , train data

lr = MeraLR()
lr.fit(X_train, y_train) => Gives m and b of line
lr.predict(X_test)

b. Multiple Linear Regression : In done in Notebook


c. Polynomial Regression :

13. Gradent Descent :


a. Batch : it is slow and due to taken from all rows then update single
for n dimension dataset

Code :
>>>
class GDREgression :

def __init__(self , learning_rate = 0.01 , epochs = 100):


self.coef_ = None
self.intercept_ = None
self.lr = learning_rate
self.epochs = epochs

def fit (self , X_train , y_train):


self.intercept_ = 0
self.coef_ = np.ones(X_train.shape[1])

for i in range(self.epochs):
# update all the coef and the intercept
y_hat = np.dot(X_train , self.coef_) + self.intercept_
intercept_der = -2 * np.mean(y_train - y_hat)
self.intercept_ = self.intercept_ - (self.lr *
intercept_der)

coef_der = - 2 * np.dot((y_train - y_hat) , X_train)


self.coef_ = self.coef_ - (self.lr * coef_der)
print(self.intercept_,self.coef_)

def predict(self,X_test):
return np.dot(X_test, self.coef_) +self.intercept_

Stochastic Regression :
Code :
>>>
class SGDREgression :

def __init__(self , learning_rate = 0.01 , epochs = 100):


self.coef_ = None
self.intercept_ = None
self.lr = learning_rate
self.epochs = epochs

def fit (self , X_train , y_train):


self.intercept_ = 0
self.coef_ = np.ones(X_train.shape[1])

for i in range(self.epochs):
for j in range(X_train.shape[0]):
idx = np.random.randint(0,X_train.shape[0])
# In this there is no matrix it will be scaler value
y_hat = np.dot(X_train[idx],self.coef_) +
self.intercept_

intercept_der = -2 * (y_train[idx] - y_hat)


self.intercept_ = self.intercept_ - (self.lr *
intercept_der)

coef_der = - 2 * np.dot((y_train[idx] - y_hat) ,


X_train[idx])
self.coef_ = self.coef_ - (self.lr * coef_der)

def predict(self,X_test):
return np.dot(X_test, self.coef_) +self.intercept_

Mini Batch : Make batches complete one batche or epoch then it will complete
the update

Code :
>>>

class MBGDREgression :

def __init__(self ,batch_size, learning_rate = 0.01 , epochs = 100):


self.coef_ = None
self.intercept_ = None
self.lr = learning_rate
self.epochs = epochs
self.batch_size = batch_size

def fit (self , X_train , y_train):


self.intercept_ = 0
self.coef_ = np.ones(X_train.shape[1])

for i in range(self.epochs):
for j in range(int(X_train.shape[0] / self.batch_size)):

idx =
random.sample(range(X_train.shape[0]),self.batch_size)

y_hat = np.dot(X_train[idx] , self.coef_) +


self.intercept_
intercept_der = -2 * np.mean(y_train[idx] - y_hat)
self.intercept_ = self.intercept_ - (self.lr *
intercept_der)

coef_der = - 2 * np.dot((y_train[idx] - y_hat) ,


X_train[idx])
self.coef_ = self.coef_ - (self.lr * coef_der)

print(self.intercept_,self.coef_)

def predict(self,X_test):
return np.dot(X_test, self.coef_) +self.intercept_

Learning Schedule : We make vary with learning schedule when the epoch
increseaes

14. Bias Variance Trade Off : when model cant perform well on training set then it
is called bias , When the line is plot of machine learning and get the high
diffrence in the actual vs predicted then it is called varaince

Overfitting : When in the training data model works better but not in test
(Low Bias and High Viarance )
Uderfitting : When in the training data model cant perform in that (High Bias
and Low Varaince)

Solution of this is : Reguralization , Bagging and Boosting


15. What is regularization means that when you apply some added information to
machine learning model that overcome the overfitting
There are 3 types
a. Ridge (L1)
b. Laso (L2)
c. Elastic

Code of Rigde ->


class MeraRidge:
def __init__(self,alpha = 0.1):
self.alpha = alpha
self.m = none
self.b = none

def fit(self,x_train,y_train):
num = 0
dem = 0

for i in range(X_train.shape[0]):
num = num + (y_train[i] - y_train.mean())*(X_train[i] -
X_train[i].mean())
dem = dem + (y_train[i] - y_train.mean())*(X_train[i] -
X_train[i].mean()) + self.a

self.m = num/dem
self.b = y_train.mean() - (self.m*X_train.mean())
print(self.m , self.b)

def predicte(X_test):
pass

for nData columns that gives me weights

16. Lasso

You might also like