[go: up one dir, main page]

0% found this document useful (0 votes)
10 views28 pages

Rahul Narayanan - Generalizedlinearmodel

The document provides an overview of Generalized Linear Models (GLMs), detailing their definition, components, and applications in regression and classification. It explains the differences between normal and binomial distributions, as well as the use of link functions in establishing relationships between variables. Additionally, it includes examples of simple linear regression and logistic regression to illustrate the concepts discussed.

Uploaded by

morilloatilio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views28 pages

Rahul Narayanan - Generalizedlinearmodel

The document provides an overview of Generalized Linear Models (GLMs), detailing their definition, components, and applications in regression and classification. It explains the differences between normal and binomial distributions, as well as the use of link functions in establishing relationships between variables. Additionally, it includes examples of simple linear regression and logistic regression to illustrate the concepts discussed.

Uploaded by

morilloatilio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Generalized Linear Model

By
Rahul Narayanan
Agenda

 Refresher

 Definition of Generalized Linear Model

 What is a Normal Distribution

 What is a Linear Model

 Linear Modelling for Regression (Simple Linear Regression)

 Linear Modelling for Classification (Logistic Regression)

 Generalizing Linear Modelling of classification and Regression using GLM

 Some of GLMs
Refresher :

Types of ML :
GLM

Supervised Unsupervised Reinforcement

Response/output/Dependent variable Example

• Yes/No
Classification Categorical (or) discrete • Survived/Dead
• Lion/Tiger/Cheetah etc.

• 100.70
Regression Continuous • 25 -∞ to +∞
• -75.25
Quiz :

Question 1 :

Suppose you are working on a weather Prediction model, and you would like to predict whether or not it will be raining
At 5pm tomorrow

Is this a Classification or a Regression problem ?

Ans : Classification

It will rain - 1

It will not rain - 0


Quiz :

Question 2 :

The HR department of an organization wants to have a salary prediction tool by which they want to decide on the salary
of a new employee based on his/her experience

Is this a Classification or a Regression problem ?

Ans : Regression

Independent variable --> Experience in years

Dependent variable --> Salary


Quiz :

Question 3 :
Weight of car Engine capacity Mileage
(kg) (Litre) (kmpl)
890 1.2 21
1200 1.6 19
920 2.2 15
700 1.0 22

Is this a Classification or a Regression problem ?

Ans : Regression

Independent variable --> Weight of the car, Engine capacity

Dependent variable --> Mileage


Generalized Linear Model

Definition :
The Generalized Linear Model expands the General Linear Model that allows Dependent variable to have a
linear relationship with the independent variable via a specified link function. Moreover the model allows for
the dependent variable to have a non-normal distribution.

There are three components to a GLM :

1. Random Component

2. Systematic Component

3. Link Function
Normal Distribution

Definition :
A Normal Distribution is an arrangement of dataset in which most of the values cluster in the middle(around
the mean) and the rest of the values falls away from the mean.

Example
Height of human
5.2 5.8
Salaries of Employees

4.9 6.1
4.6 5.5 6.4

-3 σ -2 σ -1 σ µ 1σ 2σ 3σ

68.2%

95.4%

99.7%
Linear Model

Definition :
A Linear model is one in which a constant change in input/Independent variable results in a constant change
in output/Dependent variable.
+2 +2 +2 +2

X 1 3 5 7 9
Y 10 20 30 40 50

+10 +10 +10 +10


+2 +2 +2 +2

X 1 3 5 7 9
Y 4.8 10 15.3 20.2 25.3

+5.2 +5.3 +4.9 +5.1 ≈5


Linear Modelling
x y 12 Equation of line:
y = 2x
0 0 10 y = mx + b
8
1 2
6

Y
2 4 y-intercept
4
slope
3 6 2

4 8 0
0 1 2 3 4 5 6
X
5 10
?

(+)ve Infinite Slope

(-)ve No Slope y = 2x + 0
Simple Linear Regression (Linear Modelling technique for Regression)

Problem:
As a Hotel owner you want to predict the tip amount($) of a meal for any given bill
amount. Therefore one evening you collect data for six meals.

Meal # Tip amount ($)

1 5
Unfortunately, when you begin to look at your data,
2 17
you realize you only collected data for tip amount
3 11
and not the meal amount (total bill). So this is the
4 8
best data you have.
5 14

6 5
Simple Linear Regression (contd)
18
Meal # Tip amount ($)
16
1 5 14 +7
12 +4
2 17 +1
best fit line
10

TIP AMOUNT
3 11 8 -2
-5 -5
6
4 8
4

5 14 2

0
6 5 0 1 2 3 4 5 6 7
MEAL #

ȳ = 10
Sum of squared error (SSE) = (-5) ² + 7² + 1² + (-2) ² + 4² + (-5) ²
y = 0x + 10
= 120
Simple Linear Regression (contd)
18
Total Bill Amount ($) Tip amount ($)
16
34 5 14

12 y = 0x + 10
108 17
10

TIP AMOUNT
64 11 8

6
88 8
4

99 14 2

0
51 5 20 30 40 50 60 70 80 90 100 110 120
BILL AMOUNT

y = 0x + 10
Simple Linear Regression (contd)
18
Total Bill Amount ($) Tip amount ($)
16
34 5 14

12
108 17
10

TIP AMOUNT
64 11 8

6
88 8
4

99 14 2

0
51 5 20 30 40 50 60 70 80 90 100 110 120
BILL AMOUNT

y = 0.08x + 6.2
Simple Linear Regression (contd)
18
Total Bill Amount ($) Tip amount ($)
16
34 5 14

12
108 17
10

TIP AMOUNT
64 11 8

6
88 8
4

99 14 2

0
51 5 20 30 40 50 60 70 80 90 100 110 120
BILL AMOUNT

y = 0.11x + 1.8
Simple Linear Regression (contd)
18
Total Bill Amount ($) Tip amount ($)
16
34 5 14

12
108 17
10

TIP AMOUNT
64 11 8

6
88 8
4

99 14 2

0
51 5 20 30 40 50 60 70 80 90 100 110 120
BILL AMOUNT

y = 0.14x – 0.81
• By Tuning the slope and intercept we make a best fit of line for our data SSE = 30.075
• How do you tune ? By using Gradient Descent Algorithm

Ho do we interpret y = 0.14x – 0.81


Logistic Regression (Linear Modelling technique for Classification)
Problem:
We have collected a sample dataset of people’s age and whether they subscribed to
a magazine or not. Let’s come up with a model where given a persons’ age we have to
predict whether he will subscribe to the magazine or not.

Age in years Subscribed


Can I use the same technique of regression(fitting a
line) that we learned so far to solve this?
18 0
No
22 0
Why ?
27 1
• Data is categorical in nature
31 1 • Non-Normal Distribution [Binomial distribution]
• No linear relationship between age and subscription
24 0

42 1 But Let’s try


Subscribed -1 Not Subscribed - 0
Logistic Regression
Age in years Subscribed 1.6
1.4
18 0
1.2
22 0 1
27 1 0.8

SUBSCRIBED
31 1 0.6 y = mx + b
0.4
24 0
0.2
42 1 0
10 15 20 25 30 35 40 45 50
Age in years Probability (p) AGE

18 0.23
22 0.30
27 0.72
31 0.81
How do we solve
24 0.29
this ?
42 0.88

38 1.47 X
17 -0.20 X
Trick Intuition

1.6 1.6
1.4 1.4
1.2 1.2
1 1
0.8 0.8
SUBSCRIBED

SUBSCRIBED
0.6 0.6
0.4 0.4
0.2 0.2
0 0
10 15 20 25 30 35 40 45 50 10 15 20 25 30 35 40 45 50
AGE AGE
Trick 1
1.6 1.6
1.4 1.4
1.2 1.2
1 1
0.8 0.8
SUBSCRIBED

SUBSCRIBED
0.6 0.6
0.4 0.4
0.2 0.2
0 0
10 15 20 25 30 35 40 45 50 10 15 20 25 30 35 40 45 50
AGE AGE

y = mx + b which ranges from -∞ to +∞ y = emx + b which ranges from 0 to +∞


How do you ensure non - negativity of a number

• Absolute value of a number |-5|  +ve


• Squaring a number (-5) ²  +ve
• Exponential form of a number e⁻⁵  +ve
Trick 2
1.6 1.6
1.4 1.4
1.2 1.2
1 1
0.8 0.8
SUBSCRIBED

SUBSCRIBED
0.6 0.6 0.5
0.4 0.4
0.2 0.2
0 0
10 15 20 25 30 35 40 45 50 10 15 20 25 30 35 40 45 50
AGE AGE

y = emx + b which ranges from 0 to +∞ y = emx + b / 1 + emx + b which ranges from 0 to 1


How do you ensure any number to be <=1 This is called a Sigmoid Function
• By dividing a number that is greater than it
5/(5+1) = 0.833  <=1
E(Y) => P = emx + b / 1 + emx + b
Linear Model Constraint

Linear Modelling technique for Regression

• Normal Distribution E(Y) = 0.14x – 0.81


i.e We can explain the prediction as for
• E(Y) = mx + b every $1 the bill amount increases, we
would expect the tip amount to increase
by $0.14 or about 15-cents
This is the most
important
constraint of a
Linear Modelling technique for Classification Linear model
• Binomial Distribution

• E(Y) = emx + b / 1 + emx + b i.e We cannot explain the prediction as a


Linear combination of Independent
variables
• E(Y) ≠ mx + b
Generalized Linear Model

Framework for Generalization

Random Component
Explains the distribution of our
Dependent Variable
Link Function
Establishes Relationship
between Random &
Systematic component
Systematic Component
Explains Dependent variable as a
Linear combination of
Independent variable
Solve Linear Model Constraint using GLM

Linear Modelling technique for Regression


Link Function
• Normal Distribution E(Y) = 0.14x – 0.81
i.e We can explain the prediction as for
• E(Y) = mx + b every $1 the bill amount increases, we Identity Function
would expect the tip amount to increase ɪ(E(Y)) = mx + b
by $0.14 or about 15-cents

Linear Modelling technique for Classification

• Binomial Distribution
i.e We cannot explain the prediction as a Logit Function
Linear combination of Independent
• E(Y) ≠ mx + b variables Logit(E(Y)) = mx + b

• E(Y) = emx + b / 1 + emx + b


Generalized Linear Model

Definition :
The Generalized Linear Model expands the General Linear Model that allows Dependent variable to have a
linear relationship with the independent variable via a specified link function. Moreover the model allows for
the dependent variable to have a non-normal distribution.

There are three components to a GLM :

1. Random Component

2. Systematic Component

3. Link Function
Some of the Generalized Linear Models

 Logistic Regression
• Logit(E(Y)) = mx + b

 Probit Regression
• Probit(E(Y)) = mx + b

 Poisson Regression
• log(E(Y)) = mx + b

 Linear Regression
• E(Y) = mx + b

• ɪ(E(Y)) = mx + b
References

 http://www.statisticshowto.com/probability-and-statistics/normal-distributions/
 https://machinelearningmastery.com/simple-linear-regression-tutorial-for-machine-learning/
 https://www.analyticsvidhya.com/blog/2015/11/beginners-guide-on-logistic-regression-in-r/
 https://www.youtube.com/watch?v=zAULhNrnuL4
 https://www.youtube.com/watch?v=W3OaWyHEPv0
Thank You

You might also like