Advanced Regression
Advanced Regression
In the previous course on Regression, you have seen - how to predict a dependent
variable from one or more independent variable.
The dependent variable was numeric and error assumed to be normally distributed.
What if your dependent variable is discrete (0 or 1) or just count data and the
error terms do not follow normal distribution?
Do you think your prediction would be good with the Linear Model ?
Not Accurate When count data(number of footfalls, number of pages visited etc ..)
is involved.
GLM
To overcome the some limitations of Linear Models , we can go for Generalized
Linear Models(GLMs).
In GLMs the modeling is done on the scale in which the data was recorded.
GLM Components
GLMs comprise of 3 components
Random Component that explains the data distribution that describes Randomness /
Errors.
GLM Representation
The First Equation describes the Random Component, here it is the Gaussian
Distribution
The second equation is the systematic component which has the covariates and the
coefficients . This is the Linear Predictor
The third equation links the random component to the Link Function
The above set of equations are a generic representation of the Generalized Linear
Model.
Poisson Regression
You will understand Logistic and Poisson Regression with some applications
You will first learn the statistical aspects and later understand it from a Machine
Learning perspective
Regression Coefficients
The coefficients β0 , β1 and β2 are selected in such a way that
Odds Ratio
Odds = p(y=1) / p(y=0)
The Logit
The output is a probability value. To separate the 1 and 0 you have to identify a
threshold value.
Values above the threshold will be marked 1 and below will be marked 0.
ROC Curve
sensitivity = TP/(TP+FN)
specificity = TN/(TN+FP)
You have learnt how the Random Componets are linked to the predictor using Link
Function
A dataset is created with scores a team got and Won or lost that respective game
This is to illustrate how the score is helping us predict the binary outcome
win/loose
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
from __future__ import print_function
Scores = [(200,1),(100,0),(150,1),(320,1),(270,1),(134,0),(322,1),(140,0),(210,0),
(199,0)]
Labels = ['Score','Win']
df = pd.DataFrame.from_records(Scores, columns=Labels)
glm_binom = sm.GLM(df.Win, df.Score, family=sm.families.Binomial())
res = glm_binom.fit()
print(res.summary())
Sample Output
The value of the score coef tells us how it is able to tell us to what extent it is
able to predict the likilihood of winning a game .
Sample Data
In the previous example you have seen how to fit a GLM using statsmodels package.
In this example you will learn how to fit a Logistic Regression using scikit
learn .
The previous example was a statistical perspective. The current example will give a
machine learning perspective.
The above code creates a sample dataset for a binary classification problem.
2 features are created and the 2 classes are created for the given features
%matplotlib inline
import matplotlib.pyplot as plt
plt.scatter(X[:, 0], X[:, 1], marker='o', c=y,
linewidth=0, edgecolor=None)
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()
The above code explains how the plot should be created for the input data.
Splitting the Data
The code above explains how to fit a logistic regression model and view the
classification report.
Results of Model
Precision is similar to accuracy but looks at only the positively predicted data.
Confusion Matrix
from sklearn.metrics import confusion_matrix
confusion_matrix(y_test, y_clf)
array([[14, 1],
[ 0, 18]])
Based on the numbers we can interpret that the model is able to clearly separate
the data into 2 classes.
The model is also able to designate the individual numbers that do not belong to a
specific class as negative.
use the above commands to load the dataset and the required variables
When the error terms do not follow normal distribution , we go for other types of
Regression
GLM
Link Function : g(μ)=β0+β1x1+β2x2+…+βkxk = xTiβ
Systematic component: Any set of X = (X1, X2, … Xk) are independent variables.
Link Function
Identity link: μ=β0+β1x1
In some ocassions the identity link function is used in Poisson regression. Here
the random component is the Poisson distribution.
For simplicity, with a single dependent variable, we can write: log(μ)=α+βx. This
is equivalent to:μ=exp(α+βx)=exp(α)exp(βx)
Interpreting Parameters
Interpreting the estimated parameter.
exp(β) = with every unit increase in X, the predictor variable has **multiplicative
effect **of exp(β) on the mean of Y, that is μ
If β = 0, then exp(β) = 1, and the expected count, μ = E(y) = exp(α), and Y and X
are not related.
If β > 0, then exp(β) > 1, and the expected count μ = E(y) is exp(β) times larger
than when X = 0
If β < 0, then exp(β) < 1, and the expected count μ = E(y) is exp(β) times smaller
than when X = 0
import pandas as pd
import statsmodels.api as sm
For the following code we will be needing pandas , numpy and statsmodels libraries
dataset = pd.DataFrame({'A':np.random.rand(100)*1000,
'B':np.random.rand(100)*100,
'C':np.random.rand(100)*10,
'target':np.random.randint(0, 5, 100)})
X = dataset[['A','B','C']]
X['constant'] = 1
y = dataset['target']
size = 1e5
nbeta = 3
The code splits the dependent and the independent variables. The required
parameters are also set here.
Model Fitting
fam = sm.families.Poisson()
pois_res = pois_glm.fit()
pois_res.summary()
Using the above code , we are fitting the Model and viewing the results.
Advanced Models
In this topic you will learn some advance regression models
You will understand when to and when not to apply a specific regression model
In Bayesian the output is also a value but it also returns the entire probability
distribution
Here, the predicted value is returned and the variance value is also returned.
With value as the mean and the variance value as the standard deviation the
probability distribution can be constructed
With value as the mean and the variance value as the standard deviation the
probability distribution can be constructed
Works well if the number of features and observations in the dataset are comparable
Cons
It is really time-consuming
CART Algorithm
Classification and Regression Trees are a set of non-linear learning algorithms
which can be used for numerical as well as categorical features
Here the tree has a set of nodes that split the branch into children
In turn each of the branches can go into another node or just stay as a leaf along
with the forecasted value or the predicted class
Why Trees ?
Performing the prediction task is quick
The principal task is traversal along the the tree from the root node to the leaf
nodes and at each point check if the respective feature is above or below the
threshold
In each of the given nodes a search is performed along all the features across all
levels in that feature
The combination that contains the best variance is marked and selected as the best
mean_absolute_error(y_test, regr.predict(X_test))
The syntax is similar to applying any regression model using scikit learn.
They can be used for both categorical and numeric datatypes without performing any
kind of normalization
Cons
The final combination is a non linear model containing a set of linear models.
At the end of training , during prediction , each of the models perform their
respective prediction , the results are all taken , averaged and then the ensemble
prediction is performed.
Bagging Tip
The training and the prediction happens at individual model level. This gives
flexibility to parallelize the operation on multiple CPUs.
Bagging in Python
from sklearn.ensemble import BaggingRegressor
bagging = BaggingRegressor(SGDRegressor(), n_jobs=-1,
n_estimators=1000, random_state=101,
max_features=0.8)
bagging.fit(X_train, y_train)
mean_absolute_error(y_test, bagging.predict(X_test))
from sklearn.ensemble import RandomForestRegressor
regr = RandomForestRegressor(n_estimators=100,
n_jobs=-1, random_state=101)
regr.fit(X_train, y_train)
mean_absolute_error(y_test, regr.predict(X_test))
The sample code above explains how to implement bagging using python .
Boosting
Boosting is another way of combining multiple learning models
In boosting the models are in a sequence , cascaded with each other , the output of
one is the input of another
Boosting Algorithm
During training , the output of one model is predicted
New model is trained on that error set and inserted at final stage of the cascaded
and trained models
The output value from one stage is the value predicted combined with the learning
rate times by the output prediction from the current stage
regr = GradientBoostingRegressor(n_estimators=500,
learning_rate=0.01,
random_state=101)
regr.fit(X_train, y_train)
We can build very good and robust models combining weak models
The robustness in the solution is created by the stochastic or random nature of the
model
Cons
Time taken for training is very high . There is a high memory footprint
The steps in model building can be tricky because of the stochastic nature
Application Areas
In this course so far you have seen different types of regression models , now you
will learn in what kind scenarios are all these models applied.
Prediction Problems
Ranking Problems
A Regression Problem
Consider a dataset from the Music Industry.
The descriptors of a particular song are given and the year the song was produced
is given.
Can this data be modeled as a Regression Problem to predict the year given the
descriptors ?
Problem Approach
For the question raised in the previous card, the answer is yes we can predict the
year of production based on the descriptors .
Once the features are extracted a model can be trained with Features as inputs and
year of production as output
The model can be evaluated using Mean Absolute Error between actual and predicted
values
Classification Problem
The previous problem can also be modeled as a Multi Class Classification problem.
The output will belong to one of the classes from the range of years provided.
Mean absolute error can be used for validating the accuracy of the prediction.
Ranking Problem
Consider a dataset with some features related to a car along with a price.
Insurance companies would want to assess if the car is riskier or not to sell / buy
on a given scale.
In this scenario , we can go for label ranking loss , a metric that indicates the
strength ranking
Mean Absolute and Mean Standard Errors are not applicable in this scenario.
In time series analysis , the data at time t+k can be the target and data at time t
can be the feature. The concept of auto regression is applied in these scenarios.
You have understood the limitations of Linear Models and how they can be overcome
with Generalized Linear Models
Poisson Regression, its representation and how to apply this in a real world
application