0% found this document useful (0 votes)

8 views35 pages

Lecture 3

Uploaded by

harischandraprasad07

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views35 pages

Lecture 3

Uploaded by

harischandraprasad07

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Machine Learning: Parametric Models

Dr. Kritesh K. Gupta

Assistant Professor

Amrita School of Artificial Intelligence,

Amrita Vishwa Vidyapeetham, Coimbatore, India
02nd July 2025
Correlation of the Parameters with Response

Y Y Y

x x x

• Cov(X1,X2) = 0 • Cov(X1,X2) > 0 • Cov(X1,X2) < 0

• X1 and X2 are not correlated • X1 and X2 are positively • X1 and X2 are negatively
correlated correlated

2
Regression Model: Linear Regression

3
Simple Linear Regression
X (Weight) Y (Height)
74 170
New weight Model Predicted height

80 180
Best fit
75 175.5
height line
- -
- - yi

 Error  min 
Errori  yi  yi yi

weight
4
Mathematical understanding
y    0  1.x 0  0
height

y   predicted height
 dy  0  0
x  weight 1  slope  
height
 dx 

weight

weight 5
How best fit line is formed?
Cost function- height

1 n
J (  0,  1)  [  ( yi  yi ) 2 ] 
n i 1
Objective  {to get a best fit line}

Error 
weight

Optimal selection of 0 and 1

6
Lets apply this y
x y yx
1 1 -
2 2 - 3

3 3 -
2
y   0  1.x


0  0 1

𝛽 =0 𝛽 = 0.5 𝛽 =1
x= 1 y  0  0 1 y   0  0.5  1 y   0  1 1 1 2 3 x
0  0.5 1
x= 2 y  0  0  2 y  0  0.5  2 y  0  1 2
0 1 2
x= 3 y  0  0  3 y   0  0.5  3 y   0  1 3
0  1.5 3
7
What’s happening to the cost function? 5
4

J(β1) 3
n
1
J (  0,  1)  [  ( yi  yi ) 2 ]  2
n i 1
1

0 0.5 1 1.5 2
β1
𝛽 =0 𝛽 = 0.5 𝛽 =1

yx J(β1) yx J(β1) yx J(β1)

x= 1 0 0.5 J ( 1 ) 1
1 1 J ( 1 )
J ( 1 )   (1  0) 2  (2  0)2  (3  0) 2    (1  0.5)2  (2  1) 2  (3  1.5) 2 
x= 2 0 3 1 3 2 1
14   (1  1) 2  (2  2) 2  (3  3) 2 
  4.66 1 3
3   0.25  1  2.25  1.166 0
x= 3 0 1.5 3 3
8
Convergence algorithm (optimize the change of βj )
gradient dJ (  j ) Repeat until convergence
d j
{
Initial weight (βj)
J (  j )
 j :  j  ( )
(  j )
}
learning rate (α)
when slope  ve :
(  j ) new   j  ( ve)
(  j ) new  (  j )old

when slope  ve :
(  j )new   j  (ve)
(  j )new  (  j )old

9
Multiple Linear Regression
y   0  1.x


y    0  1.x1   2 .x2
0  Intercept / Bias
1 ,  2  Slope / Weights
What is a generic equation for multiple regression?
10
https://medium.com/analytics-vidhya/implementing-gradient-descent-for-multi-linear-regression-from-scratch-3e31c114ae12
Y

Ridge Regression: L2 Regularization

11
Ridge Regression: (For overcoming Overfitting)
1 n 2
J (  0,  1)    ( yi  yi )   0(in case of SLR )
Training
Y ˆ
Test
yˆi   0   1 x  n i 1 

Overfitting

1 n  n
J (  0,  1)    ( yi  yˆi ) 2     ( slope)2
 n i 1  i 1

L2 Regularization
X

High Training Accuracy: Low Bias

Low Test Accuracy : High Variance
12
λ vs. slope

   slope 
13
Lasso Regression: (For Feature Selection)
L1 Regularization
1 n  n
J (  0,  1)    ( yi  yˆi ) 2     slope
 n i 1  i 1

14
ElasticNet Regression
1 n  n n
J (  0,  1)    ( yi  yˆi ) 2   1  slope  2  ( slope) 2
 n i 1  i 1 i 1
For feature selection Penalty for overfitting

15
Classification Model: Logistic Regression

16
Can Linear Regression Solve Classification Problem?
X Y: O/P
Y
Study hours (Pass: 1, Fail: 0)
1
2 0
3 0 0.5
4 0
5
Present
1 0
6 position 1
7 1

X
1 2 3 4 5 6 7 8 9 10 11 12

 y  1, if y  0.5

 y  0, if y  0.5
17
Can Linear Regression Solve Classification Problem?
X Y: O/P
Y
Study hours (Pass: 1, Fail: 0)
1
2 0
3 0 0.5
4 0
5
Present
1 0
6 position 1
7 1
12 1
X
Challenges: 1 2 3 4 5 6 7 8 9 10 11 12
1. Not robust in handling outliers
2. Responses > 1 and < 0

Hence, we need a mechanism that can restrict the predicted responses between 0 to 1.
18
What can solve these challenges?
1
Y y  0  1 x  ( y)  y
1 e
1
0.5
0 Present
position

 ( y)
X
1 2 3 4 5 6 7 8 9 10 11 12

y 19
Sigmoid Function Nonlinearity
When z is large and positive When z is large and negative
when z  5 when z  5

Present
position

20
Cost function
Linear Regression Logistic Regression

1 m
 1 m
J (  0 , 1 )   ( yi  yi )
J (  0 , 1 )   ( yi  yi )
2
2 ˆ
m i 1 m i 1
  1
yi  0  1 x yˆi   ( yi )  
 yi
1 e

21
Why gradient descent doesn’t work in Logistic
Regression?

Present
position

22
Why Mean Squared Error cannot be used as cost
function for Logistic Regression?
1
If the true label y is 1 and the predicted probability z
 0.2
1 e

Squaring the difference :

Present
position

The squaring operation intensifies the impact of misclassifications, especially when

predicted class is close to 0 or 1.

23
Log Loss or Cross Entropy Function
The log of corrected probabilities, in logistic regression, is obtained by taking the
natural log (base e) of the predicted probabilities.
m
J   yi log( yˆi )  (1  yi ) log(1  yˆi )
i 1

Present  log(1  yˆ ) if y  0
position loss  
 log( yˆ ) if y  1

24
Log Loss and Model Performance
m
J   yi log( yˆi )  (1  yi ) log(1  yˆi )
i 1

sample True Predicted Log loss

1 10
Label (yi) probability (ŷi)
Total Log Loss   Log Lossi  0.353
1 1 0.8 -log(0.8) =0.223 10 i 1
2 0 Present 0.3 -log(0.7) = 0.357
• Gradient of Log Loss:
3 1 position 0.7 -log(0.7)= 0.357
4 0 0.2 -log(0.8) 0.223 Log Lossi
5 1 0.9 -log(0.9) = 0.105

6 0 0.1 -log(0.9) = 0.105
• Parameter update:
7 1 0.4 -log(0.4) = 0.916
Log Lossi
8 0 0.6 -log(0.4) = 0.916  
  
9 1 0.85 -log(0.85) = 0.162 
10 0 0.15 -log(0.85) = 0.162

25
Multi-class classification (One vs. Rest [OVR])
X Y O1 O2 O3
- - 1 0 0
- - 0 1 0
Y M3
- - 0 0 1
- - 1 0 0
M2 M1

Model M1 I/P (X,Y), O/P: {O1}

Model M2 I/P (X,Y), O/P: {O2}
Present
Model M3 I/P (X,Y), O/P: {O3}
position

Unknown features (X,Y)

X 0.25 0.20 0.55
0.75 0.15 0.10

26
Performance Measures

27
Performance measure for Regression: R-squared

Y
yi  yˆi yi
SS residual yi  y
R  1
2

SStotal
y
 1
 ( y  yˆ )
i i
2
yˆ i

(y  y)
i
2

28
Performance measure for Regression: Adjusted R-squared
Number of independent features (P) R-squared
1. Size of the house 0.75 (1  R 2
)( N  1)
1. Size of the house 0.80 Adjusted R 2  1 
2. Number of rooms N  P 1
1. Size of the house 0.85 N:No.of data points
2. Number of rooms
3. Location P: No. of Indpendent features
1. Size of the house 0.87
2. Number of rooms
3. Location
4. Gender

29
Performance measure for Regression: Mean Squared Error (MSE)
n 2
1
Advantages Disadvantages

n i 1
( y  yˆ )

Differentiable

It has one local or Its sensitivity to

global minima outliers

Fast convergence
to minima

30
Performance measure for Regression: Mean Absolute Error (MAE)
1 n
Advantages Disadvantages

n i 1
( y  yˆ )

Convergence
Robust to takes time
outliers (sub-
gradient)

31
Performance metrics for Classification
Actual
• Confusion Matrix 1 0
X1 X2 Y Yhat
1

Predicted
- - 0 1
- - 1 1
- - 0 0 0
- Present- 1 1
- position- 1 1
- - 0 1
- - 1 0
Actual
1 0
• Accuracy
1
TP  TN
Predicted
TP FP
Accuracy 
TP  FP  TN  FN
0 FN TN
32
Performance metrics for Classification
Actual
• Precision 1 0

1 TP FP

Predicted
TP
Precison 
TP  FP
0 FN TN
Present
position
• Recall

Actual
TP 1 0
Recall 
TP  FN 1

Predicted
TP FP

0 FN TN
33
Use Cases Actual
1 0
• Example 1: spam [1] vs. not spam [0]
1 TP FP

Predicted
A mail is not a spam, but model predicted spam
False Positives Cannot be afforded. FP 
A mail is a spam, but model predicted not spam 0 FN TN
False Negatives Does not cause much damage.
Present
position
• Example 2: disease prediction
A patient has diabetes [1], but model predicted not diabetes[0]
Actual
False Negatives Cannot be afforded. FN  1 0
A patient don’t have diabetes, but model predicted diabetes
1

Predicted
False positves TP FP
Does not cause much damage.

0 FN TN

34
Performance metrics for Classification
• F-Score Precision×Recall
(1   )
2

Precision+Recall
In the cases where reducing FP and FN both are necessary, we use = 1
Precision×Recall
Present F1 score  2 
position Precision+Recall
In the cases where reducing FP is more important than FN, we use = 0.5
Precision×Recall
F 0.5 score  (1  0.25) 
Precision+Recall
In the cases where reducing FN is more important than FP, we use = 2
Precision×Recall
F 2 score  5 
Precision+Recall 35

Linear Regression
No ratings yet
Linear Regression
89 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Linear Regression
No ratings yet
Linear Regression
91 pages
MLDAP Module2
No ratings yet
MLDAP Module2
32 pages
Smai Lecture 06 Regression
No ratings yet
Smai Lecture 06 Regression
46 pages
ML - LAB - BE CSE (DS) Final
No ratings yet
ML - LAB - BE CSE (DS) Final
110 pages
Teit ML2
No ratings yet
Teit ML2
11 pages
Unit 2
No ratings yet
Unit 2
18 pages
S&ML Unit 5 - Q & A
No ratings yet
S&ML Unit 5 - Q & A
15 pages
Linear Regression
No ratings yet
Linear Regression
62 pages
MECH4403 LR Week04
No ratings yet
MECH4403 LR Week04
25 pages
2a Linear Regression 18may
No ratings yet
2a Linear Regression 18may
28 pages
Linear Regression A Foundational ML Algorithm
No ratings yet
Linear Regression A Foundational ML Algorithm
10 pages
Wk05 Machine Learning
No ratings yet
Wk05 Machine Learning
6 pages
Machine Learning Unit2
No ratings yet
Machine Learning Unit2
31 pages
FML Unit2
No ratings yet
FML Unit2
13 pages
Linear Regression Explained
No ratings yet
Linear Regression Explained
26 pages
AAI Lecture 10 SP 25
No ratings yet
AAI Lecture 10 SP 25
37 pages
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
No ratings yet
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
78 pages
Linear Regression
No ratings yet
Linear Regression
5 pages
Lecture Notes 5 Linear Regression
No ratings yet
Lecture Notes 5 Linear Regression
11 pages
3 Unit - Dspu
No ratings yet
3 Unit - Dspu
23 pages
Chapter2 - Optimisation
No ratings yet
Chapter2 - Optimisation
7 pages
Today: - Calculus
No ratings yet
Today: - Calculus
61 pages
Unit-2 ML
No ratings yet
Unit-2 ML
39 pages
Linear Regression Concepts - A4
No ratings yet
Linear Regression Concepts - A4
6 pages
ML - Lec 4-Introduction To Regression
No ratings yet
ML - Lec 4-Introduction To Regression
65 pages
Bias and Variance Tradeoff:: High Bias Underfitting Low Training & Testing
No ratings yet
Bias and Variance Tradeoff:: High Bias Underfitting Low Training & Testing
12 pages
Linear Regression Lab Guide
100% (1)
Linear Regression Lab Guide
8 pages
Modern Pridictive Modelling (Regression)
No ratings yet
Modern Pridictive Modelling (Regression)
12 pages
Machine Learning Basics for Beginners
No ratings yet
Machine Learning Basics for Beginners
53 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Linear Regression
No ratings yet
Linear Regression
9 pages
Simple Linear Regression Definition: Two Variables Independent Variable Dependent Variable Equation
No ratings yet
Simple Linear Regression Definition: Two Variables Independent Variable Dependent Variable Equation
9 pages
ML Unit-2
No ratings yet
ML Unit-2
123 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
20 pages
Lecture 3.1
No ratings yet
Lecture 3.1
21 pages
Predictive Analytics
No ratings yet
Predictive Analytics
46 pages
SumitBurnwal ML
No ratings yet
SumitBurnwal ML
13 pages
02 Regression and Classification Problems
No ratings yet
02 Regression and Classification Problems
7 pages
Supervised Learning. wk3
No ratings yet
Supervised Learning. wk3
18 pages
6 ML Updated
No ratings yet
6 ML Updated
23 pages
Lect03 CSN382
No ratings yet
Lect03 CSN382
31 pages
Unit 2
No ratings yet
Unit 2
136 pages
04 LinearModels
No ratings yet
04 LinearModels
28 pages
10 Linear Regression
No ratings yet
10 Linear Regression
61 pages
FAM Unit6
No ratings yet
FAM Unit6
32 pages
Linear Regression Algorithm
No ratings yet
Linear Regression Algorithm
16 pages
Linear Regression
No ratings yet
Linear Regression
130 pages
ML MU Unit 3RegressionTechniquespdf 2025 02-07-10!56!37
No ratings yet
ML MU Unit 3RegressionTechniquespdf 2025 02-07-10!56!37
115 pages
Linear Regression
No ratings yet
Linear Regression
28 pages
Regression
No ratings yet
Regression
16 pages
02 LR
No ratings yet
02 LR
11 pages
ML EasySol
No ratings yet
ML EasySol
62 pages
Guardian Your Health Companion Final
No ratings yet
Guardian Your Health Companion Final
3 pages
Document 1754460111693
No ratings yet
Document 1754460111693
4 pages
Amrita
No ratings yet
Amrita
5 pages
Lecture 9
No ratings yet
Lecture 9
14 pages
Geo AI
No ratings yet
Geo AI
10 pages
MIS Report - 2025
No ratings yet
MIS Report - 2025
5 pages
Advances in Computational Intelligence 17th Mexican International Conference on Artificial Intelligence MICAI 2018 Guadalajara Mexico October 22 27 2018 Proceedings Part II Ildar Batyrshin 2024 scribd download
100% (7)
Advances in Computational Intelligence 17th Mexican International Conference on Artificial Intelligence MICAI 2018 Guadalajara Mexico October 22 27 2018 Proceedings Part II Ildar Batyrshin 2024 scribd download
65 pages
Analyst M8 U5
No ratings yet
Analyst M8 U5
15 pages
Project 1
No ratings yet
Project 1
182 pages
Stock Price Prediction
No ratings yet
Stock Price Prediction
12 pages
8 Inspirational Applications of Deep Learning
No ratings yet
8 Inspirational Applications of Deep Learning
17 pages
Deep Neural Network
No ratings yet
Deep Neural Network
24 pages
Aenexz Tech AI Curriculum 8 Week Intensive Program
No ratings yet
Aenexz Tech AI Curriculum 8 Week Intensive Program
8 pages
AI Microproject Report 2023
No ratings yet
AI Microproject Report 2023
13 pages
AI Scheme-2024
No ratings yet
AI Scheme-2024
16 pages
Pop Kuis 01 - Attempt Review
No ratings yet
Pop Kuis 01 - Attempt Review
14 pages
Deep Learning Paradigms & Challenges
No ratings yet
Deep Learning Paradigms & Challenges
136 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
5 pages
Google Gemini AI Guide How - (Z-Library)
100% (6)
Google Gemini AI Guide How - (Z-Library)
43 pages
Prestige Merged
No ratings yet
Prestige Merged
10 pages
ATAL Selected FDPs AY 2023 24
100% (2)
ATAL Selected FDPs AY 2023 24
15 pages
(2025년 예상) 상일고등학교 (경기 부천시) 1-1 기말 공통영어 족보 (능률 (민병천) ) (Q) 1회
No ratings yet
(2025년 예상) 상일고등학교 (경기 부천시) 1-1 기말 공통영어 족보 (능률 (민병천) ) (Q) 1회
15 pages
Class 11 AI
No ratings yet
Class 11 AI
9 pages
Ai Chapter 5
No ratings yet
Ai Chapter 5
2 pages
IDS - UNIT-2 - Notes Part1 - Introduction To Data Science and Prob Concept
No ratings yet
IDS - UNIT-2 - Notes Part1 - Introduction To Data Science and Prob Concept
66 pages
Blue Print
No ratings yet
Blue Print
1 page
ML/AI Learning Guide for Students
No ratings yet
ML/AI Learning Guide for Students
5 pages
Artificial Intelligence in Medicine 1.the Virtual Branch
No ratings yet
Artificial Intelligence in Medicine 1.the Virtual Branch
13 pages
The Brittleness of Expert Systems Became A Major Concern
No ratings yet
The Brittleness of Expert Systems Became A Major Concern
1 page
From Development To Data Science: A Complete Roadmap
No ratings yet
From Development To Data Science: A Complete Roadmap
11 pages
AIS160 Artificial Intelligence Textbook
No ratings yet
AIS160 Artificial Intelligence Textbook
36 pages
009 Intro To Computer Vision
No ratings yet
009 Intro To Computer Vision
17 pages
NLP Course for CS & Linguistics Students
No ratings yet
NLP Course for CS & Linguistics Students
6 pages
Charan - Resume 1
No ratings yet
Charan - Resume 1
2 pages

Lecture 3

Uploaded by

Lecture 3

Uploaded by

Machine Learning: Parametric Models

Dr. Kritesh K. Gupta

Amrita School of Artificial Intelligence,

• Cov(X1,X2) = 0 • Cov(X1,X2) > 0 • Cov(X1,X2) < 0

Optimal selection of 0 and 1

yx J(β1) yx J(β1) yx J(β1)

Ridge Regression: L2 Regularization

High Training Accuracy: Low Bias

Squaring the difference :

The squaring operation intensifies the impact of misclassifications, especially when

sample True Predicted Log loss

Model M1 I/P (X,Y), O/P: {O1}

Unknown features (X,Y)

It has one local or Its sensitivity to

You might also like