100% found this document useful (2 votes)

157 views20 pages

Machine Learning: Bilal Khan

Machine learning regression analysis is a technique used to model relationships between variables. It allows predicting continuous variable outcomes like sales amounts based on predictor variables like advertisement spending. Common regression types include linear regression for linear relationships, logistic regression for binary classification, and polynomial regression to model nonlinear data using linear models. Support vector regression and decision tree regression are also supervised learning methods applicable to regression problems.

Uploaded by

Osama Inayat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

157 views20 pages

Machine Learning: Bilal Khan

Uploaded by

Osama Inayat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 20

Machine Learning

Follow the link to find voice over PPT

https://drive.google.com/file/d/1f3-7wBUw8VfFCxdNjS0Y_YAvwsuFl-dz/view?usp=sharing
Bilal Khan
Regression Analysis in Machine learning
Regression analysis is a statistical method to model the relationship between a
dependent (target) and independent (predictor) variables with one or more
independent variables.

More specifically, Regression analysis helps us to understand how the value of the
dependent variable is changing corresponding to an independent variable when
other independent variables are held fixed.

It predicts continuous/real values such as temperature, age, salary, price, etc.

Example: Suppose there is a marketing company A, who does various
advertisement every year and get sales on that. The below list shows the
advertisement made by the company in the last 5 years and the corresponding
sales:

Now, the company wants to do the advertisement of $200 in the year 2021 and
wants to know the prediction about the sales for this year. So to solve such
type of prediction problems in machine learning, we need regression analysis.
Regression is a supervised learning technique which helps in finding the correlation
between variables and enables us to predict the continuous output variable based
on the one or more predictor variables. It is mainly used for prediction,
forecasting, time series modeling, and determining the causal-effect
relationship between variables.
In Regression, we plot a graph between the variables which best fits the given
datapoints, using this plot, the machine learning model can make predictions about
the data. In simple words, "Regression shows a line or curve that passes
through all the datapoints on target-predictor graph in such a way that
the vertical distance between the datapoints and the regression line is
minimum." The distance between datapoints and line tells whether a model has
captured a strong relationship or not.

Some examples of regression can be as:

• Prediction of rain using temperature and other factors

• Determining Market trends
• Prediction of road accidents due to rash driving.
Terminologies Related to the Regression Analysis

• Dependent Variable: The main factor in Regression analysis which we want to

predict or understand is called the dependent variable. It is also called target
variable.
• Independent Variable: The factors which affect the dependent variables or
which are used to predict the values of the dependent variables are called
independent variable, also called as a predictor.
• Outliers: Outlier is an observation which contains either very low value or very
high value in comparison to other observed values. An outlier may hamper the
result, so it should be avoided.
• Multicollinearity: If the independent variables are highly correlated with each
other than other variables, then such condition is called Multicollinearity. It
should not be present in the dataset, because it creates problem while ranking
the most affecting variable.
• Underfitting and Overfitting: If our algorithm works well with the training
dataset but not well with test dataset, then such problem is called Overfitting.
And if our algorithm does not perform well even with training dataset, then such
problem is called underfitting.
Why do we use Regression Analysis?

As mentioned above, Regression analysis helps in the prediction of a continuous

variable. There are various scenarios in the real world where we need some future
predictions such as weather condition, sales prediction, marketing trends, etc., for
such case we need some technology which can make predictions more accurately.
So for such case we need Regression analysis which is a statistical method and
used in machine learning and data science. Below are some other reasons for using
Regression analysis:

• Regression estimates the relationship between the target and the independent
variable.
• It is used to find the trends in data.
• It helps to predict real/continuous values.
• By performing the regression, we can confidently determine the most
important factor, the least important factor, and how each factor is
affecting the other factors.
Types of Regression

There are various types of regressions which are used in data science and machine
learning. Each type has its own importance on different scenarios, but at the core,
all the regression methods analyze the effect of the independent variable on
dependent variables. Here we are discussing some important types of regression
which are given below:
Linear Regression

• Linear regression is a statistical regression method which is used for predictive

analysis.
• It is one of the very simple and easy algorithms which works on regression and
shows the relationship between the continuous variables.
• It is used for solving the regression problem in machine learning.
• Linear regression shows the linear relationship between the independent variable
(X-axis) and the dependent variable (Y-axis), hence called linear regression.
• If there is only one input variable (x), then such linear regression is
called simple linear regression. And if there is more than one input variable,
then such linear regression is called multiple linear regression.
• The relationship between variables in the linear regression model can be
explained using the below image. Here we are predicting the salary of an
employee on the basis of the year of experience.
Here is the mathematical equation for Linear regression:
Y= aX+b
Here, Y = dependent variables (target variables),
X= Independent variables (predictor variables),
a and b are the linear coefficients
Some popular applications of linear regression are:
• Analyzing trends and sales estimates
• Salary forecasting
• Real estate prediction
• Arriving at ETAs in traffic.
Logistic Regression

• Logistic regression is another supervised learning algorithm which is used to

solve the classification problems. In classification problems, we have
dependent variables in a binary or discrete format such as 0 or 1.
• Logistic regression algorithm works with the categorical variable such as 0 or 1,
Yes or No, True or False, Spam or not spam, etc.
• It is a predictive analysis algorithm which works on the concept of probability.
• Logistic regression is a type of regression, but it is different from the linear
regression algorithm in the term how they are used.
• Logistic regression uses sigmoid function or logistic function which is a
complex cost function. This sigmoid function is used to model the data in logistic
regression.

The function can be represented as:

f(x)= Output between the 0 and 1 value.

x= input to the function
e= base of natural logarithm.
When we provide the input values (data) to the function, it gives the S-curve as
follows:

It uses the concept of threshold levels, values above the threshold level are
rounded up to 1, and values below the threshold level are rounded up to 0.

There are three types of logistic regression:

• Binary(0/1, pass/fail)
• Multi(cats, dogs, lions)
• Ordinal(low, medium, high)
Polynomial Regression
• Polynomial Regression is a type of regression which models the non-linear
dataset using a linear model.
• It is similar to multiple linear regression, but it fits a non-linear curve between
the value of x and corresponding conditional values of y.
• Suppose there is a dataset which consists of datapoints which are present in a
non-linear fashion, so for such case, linear regression will not best fit to those
datapoints. To cover such datapoints, we need Polynomial regression.
• In Polynomial regression, the original features are transformed into
polynomial features of given degree and then modeled using a linear
model. Which means the datapoints are best fitted using a polynomial line.
• The equation for polynomial regression also derived from linear regression
equation that means Linear regression equation Y= b0+ b1x, is transformed into
Polynomial regression equation Y= b0+b1x+ b2x2+ b3x3+.....+ bnxn.
• Here Y is the predicted/target output, b0, b1,... bn are the regression
coefficients. x is our independent/input variable.
• The model is still linear as the coefficients are still linear with quadratic
Support Vector Regression
Support Vector Machine is a supervised learning algorithm which can be used for
regression as well as classification problems. So if we use it for regression
problems, then it is termed as Support Vector Regression.
Support Vector Regression is a regression algorithm which works for continuous
variables.

Below are some keywords which are used in Support Vector Regression:

• Kernel: It is a function used to map a lower-dimensional data into higher

dimensional data.
• Hyperplane: In general SVM, it is a separation line between two classes, but in
SVR, it is a line which helps to predict the continuous variables and cover most
of the datapoints.
• Boundary line: Boundary lines are the two lines apart from hyperplane, which
creates a margin for datapoints.
• Support vectors: Support vectors are the datapoints which are nearest to the
hyperplane and opposite class.
In SVR, we always try to determine a hyperplane with a maximum margin, so that
maximum number of datapoints are covered in that margin. The main goal of
SVR is to consider the maximum datapoints within the boundary lines and
the hyperplane (best-fit line) must contain a maximum number of
datapoints. Consider the below image:

Here, the blue line is called hyperplane, and the other two lines are known as
boundary lines.
Decision Tree Regression
• Decision Tree is a supervised learning algorithm which can be used for solving both
classification and regression problems.
• It can solve problems for both categorical and numerical data
• Decision Tree regression builds a tree-like structure in which each internal node
represents the "test" for an attribute, each branch represent the result of the test, and
each leaf node represents the final decision or result.
• A decision tree is constructed starting from the root node/parent node (dataset), which
splits into left and right child nodes (subsets of dataset). These child nodes are further
divided into their children node, and themselves become the parent node of those nodes.
Consider the below image:
Above image showing the example of Decision Tee regression, here, the model is
trying to predict the choice of a person between Sports cars or Luxury car.
• Random forest is one of the most powerful supervised learning algorithms which
is capable of performing regression as well as classification tasks.
• The Random Forest regression is an ensemble learning method which combines
multiple decision trees and predicts the final output based on the average of
each tree output.

The combined decision trees are called as base models, and it can be represented
more formally as:

g(x)= f0(x)+ f1(x)+ f2(x)+....

• Random forest uses Bagging or Bootstrap Aggregation technique of

ensemble learning in which aggregated decision tree runs in parallel and do not
interact with each other.
• With the help of Random Forest regression, we can prevent Overfitting in the
model by creating random subsets of the dataset.
Ridge Regression

• Ridge regression is one of the most robust versions of linear regression in which
a small amount of bias is introduced so that we can get better long term
predictions.
• The amount of bias added to the model is known as Ridge Regression
penalty. We can compute this penalty term by multiplying with the lambda to
the squared weight of each individual features.
• The equation for ridge regression will be:

• A general linear or polynomial regression will fail if there is high collinearity

between the independent variables, so to solve such problems, Ridge regression
can be used.
• Ridge regression is a regularization technique, which is used to reduce the
complexity of the model. It is also called as L2 regularization.
• It helps to solve the problems if we have more parameters than samples.
Lasso Regression

• Lasso regression is another regularization technique to reduce the complexity of

the model.
• It is similar to the Ridge Regression except that penalty term contains only the
absolute weights instead of a square of weights.
• Since it takes absolute values, hence, it can shrink the slope to 0, whereas Ridge
Regression can only shrink it near to 0.
• It is also called as L1 regularization. The equation for Lasso regression will be:

Daily RMC Production Report
No ratings yet
Daily RMC Production Report
68 pages
Data Science - Sem6
100% (3)
Data Science - Sem6
118 pages
Statistical Methods For Decision Making (SMDM) Project Report
100% (2)
Statistical Methods For Decision Making (SMDM) Project Report
22 pages
Information Pages Kseboa Diary 2021
No ratings yet
Information Pages Kseboa Diary 2021
88 pages
202102-KNye-LKarbo-How To Speak Gen Z 2021
No ratings yet
202102-KNye-LKarbo-How To Speak Gen Z 2021
26 pages
Design Concepts
No ratings yet
Design Concepts
36 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
26 pages
Is Is
No ratings yet
Is Is
50 pages
Weekly Digest Africa
No ratings yet
Weekly Digest Africa
11 pages
Sajjad DS
100% (2)
Sajjad DS
97 pages
ABB Ring Main Unit For The Secondary Distribution Network
67% (3)
ABB Ring Main Unit For The Secondary Distribution Network
80 pages
Predictive Analytics Complete Notes
100% (1)
Predictive Analytics Complete Notes
82 pages
Unit - 4 Machine Learning
100% (1)
Unit - 4 Machine Learning
84 pages
DSGO 2019 Official Notes
No ratings yet
DSGO 2019 Official Notes
75 pages
Evolution of Computer Devices: Grade 12 Competency Level 2.2 Anuradha Dissanayake
No ratings yet
Evolution of Computer Devices: Grade 12 Competency Level 2.2 Anuradha Dissanayake
16 pages
DS Mod 1 To 2 Complete Notes
No ratings yet
DS Mod 1 To 2 Complete Notes
63 pages
Data Preprocessing
No ratings yet
Data Preprocessing
77 pages
Topic 1 Etw3482
100% (2)
Topic 1 Etw3482
69 pages
Bijapur REZ Evacuation Scheme
No ratings yet
Bijapur REZ Evacuation Scheme
4 pages
10.swift Securitas - Profile
No ratings yet
10.swift Securitas - Profile
23 pages
BIG - Fintech Academy 2020
No ratings yet
BIG - Fintech Academy 2020
11 pages
Data Mining Project Shivani Pandey
100% (1)
Data Mining Project Shivani Pandey
40 pages
Etfe Presentation
No ratings yet
Etfe Presentation
12 pages
Portable Toilet Quotation - PD Modular Inc.
No ratings yet
Portable Toilet Quotation - PD Modular Inc.
22 pages
Data Science
100% (1)
Data Science
14 pages
Logistic Regression
100% (1)
Logistic Regression
21 pages
Class Xi Python
100% (2)
Class Xi Python
138 pages
By Microsoft Website: DURATION: 6 Weeks Amount Paid: Yes: Introduction To Data Science
100% (1)
By Microsoft Website: DURATION: 6 Weeks Amount Paid: Yes: Introduction To Data Science
21 pages
International Conference On Telecom Technology and Management ! ICTTM 2015!
No ratings yet
International Conference On Telecom Technology and Management ! ICTTM 2015!
16 pages
Diploma Thesis Sample
No ratings yet
Diploma Thesis Sample
2 pages
Linear Regression PDF
100% (1)
Linear Regression PDF
32 pages
3 Pricelist
No ratings yet
3 Pricelist
30 pages
Computer Lab
100% (1)
Computer Lab
7 pages
Exploratory Data Analysis of Titanic Survival Prediction Using Machine Learning Techniques
No ratings yet
Exploratory Data Analysis of Titanic Survival Prediction Using Machine Learning Techniques
5 pages
Valvula de Mariposa 8X150 Bridada A216 Rex Valve
No ratings yet
Valvula de Mariposa 8X150 Bridada A216 Rex Valve
1 page
Introduction To STATISTICS-new
100% (1)
Introduction To STATISTICS-new
46 pages
Data Science PPT Module 1
100% (1)
Data Science PPT Module 1
24 pages
Regression Notes
100% (1)
Regression Notes
20 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
22 pages
Text
No ratings yet
Text
15 pages
Lab Exercise 4 - Color: In-Sight Spreadsheets Advanced In-Sight Spreadsheets Advanced
No ratings yet
Lab Exercise 4 - Color: In-Sight Spreadsheets Advanced In-Sight Spreadsheets Advanced
10 pages
EO Reactor Spool Installation Schedule - Rev.1
No ratings yet
EO Reactor Spool Installation Schedule - Rev.1
3 pages
Linear Regression Analysis. Statistics 2 Notes
No ratings yet
Linear Regression Analysis. Statistics 2 Notes
20 pages
Lecture 6 Data Preprocessing
No ratings yet
Lecture 6 Data Preprocessing
59 pages
ML Lab Manual 2018-19
No ratings yet
ML Lab Manual 2018-19
129 pages
Parallelism of Statistics and Machine Learning & Logistic Regression Versus Random Forest
100% (1)
Parallelism of Statistics and Machine Learning & Logistic Regression Versus Random Forest
72 pages
2 Tgeu
No ratings yet
2 Tgeu
1 page
Supervised Learning 1 PDF
100% (1)
Supervised Learning 1 PDF
162 pages
LDA 01 Linear Discriminant Analysis
No ratings yet
LDA 01 Linear Discriminant Analysis
65 pages
Historical Development of Science and Technology in The Philippines
No ratings yet
Historical Development of Science and Technology in The Philippines
4 pages
Machine Learning - Home - Coursera Quiz PDF
100% (1)
Machine Learning - Home - Coursera Quiz PDF
5 pages
Project 5 - Cars
100% (1)
Project 5 - Cars
22 pages
DataScience Unit1 (+notes)
No ratings yet
DataScience Unit1 (+notes)
56 pages
RoA PT. SCII Job No 07319.00309
No ratings yet
RoA PT. SCII Job No 07319.00309
3 pages
Classification and Prediction
No ratings yet
Classification and Prediction
126 pages
JIESHUN RFID Parking System (LONG RANGER READER INCLU) - 2
No ratings yet
JIESHUN RFID Parking System (LONG RANGER READER INCLU) - 2
1 page
Chapter 5.3-Mulitple Linear Regression
No ratings yet
Chapter 5.3-Mulitple Linear Regression
26 pages
Machine Learning
100% (5)
Machine Learning
56 pages
Machine Learning Report
No ratings yet
Machine Learning Report
58 pages
http192 168 1 1rpsys HTML
No ratings yet
http192 168 1 1rpsys HTML
1 page
CD # 0078 Pumps and Pumping Operations
100% (1)
CD # 0078 Pumps and Pumping Operations
2 pages
Form Recruitment HR Watsons
No ratings yet
Form Recruitment HR Watsons
3 pages
Cluster
100% (1)
Cluster
72 pages
Logistic Regression
100% (1)
Logistic Regression
14 pages
Chapter 3: Data Preprocessing
100% (1)
Chapter 3: Data Preprocessing
41 pages
Final Twitter - Sentiment - Analysis - Report
100% (1)
Final Twitter - Sentiment - Analysis - Report
14 pages
Data Science & Business Analytics: Post Graduate Program in
No ratings yet
Data Science & Business Analytics: Post Graduate Program in
16 pages
Python Machine Learning
100% (2)
Python Machine Learning
70 pages
I. The Types of Machine Learning
No ratings yet
I. The Types of Machine Learning
8 pages
Cementing
No ratings yet
Cementing
2 pages
Data Science PPT PD41
100% (1)
Data Science PPT PD41
8 pages
Support Vector Machine - Explanation
No ratings yet
Support Vector Machine - Explanation
12 pages
Day 5 Supervised Technique-Decision Tree For Classification PDF
100% (1)
Day 5 Supervised Technique-Decision Tree For Classification PDF
58 pages
Byju'S 7 P'S & STP Analysis: Product Section A Group 1
No ratings yet
Byju'S 7 P'S & STP Analysis: Product Section A Group 1
1 page
Data Science Theory: Analysis and Analytics
No ratings yet
Data Science Theory: Analysis and Analytics
14 pages
Linear Regression Chap01
100% (1)
Linear Regression Chap01
7 pages
Supervised Learning (Classification and Regression)
No ratings yet
Supervised Learning (Classification and Regression)
14 pages
Big Data Unit 2
No ratings yet
Big Data Unit 2
19 pages
Equity of Cybersecurity in the Education System: High Schools, Undergraduate, Graduate and Post-Graduate Studies.
From Everand
Equity of Cybersecurity in the Education System: High Schools, Undergraduate, Graduate and Post-Graduate Studies.
Joseph O. Esin
No ratings yet
Data Integration & Transformation
No ratings yet
Data Integration & Transformation
14 pages
Key Data Mining Tasks: 1. Descriptive Analytics
No ratings yet
Key Data Mining Tasks: 1. Descriptive Analytics
10 pages
Machine Learning Lab Manual 7
100% (1)
Machine Learning Lab Manual 7
8 pages
Simple Linear Regression - Assign3
No ratings yet
Simple Linear Regression - Assign3
8 pages
PCA Using Python
No ratings yet
PCA Using Python
18 pages
BigMart Sale Prediction Using Machine Learning
No ratings yet
BigMart Sale Prediction Using Machine Learning
2 pages
The Cricket Winner Prediction With Applications of ML and Data Analytics
No ratings yet
The Cricket Winner Prediction With Applications of ML and Data Analytics
18 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
15 pages
Great Learning Notes
No ratings yet
Great Learning Notes
1 page
Time Series
No ratings yet
Time Series
23 pages
Curse of Dimensionality
No ratings yet
Curse of Dimensionality
9 pages

Machine Learning: Bilal Khan

Uploaded by

Machine Learning: Bilal Khan

Uploaded by

Machine Learning

Follow the link to find voice over PPT

It predicts continuous/real values such as temperature, age, salary, price, etc.

Some examples of regression can be as:

• Prediction of rain using temperature and other factors

• Dependent Variable: The main factor in Regression analysis which we want to

As mentioned above, Regression analysis helps in the prediction of a continuous

• Linear regression is a statistical regression method which is used for predictive

• Logistic regression is another supervised learning algorithm which is used to

The function can be represented as:

f(x)= Output between the 0 and 1 value.

There are three types of logistic regression:

• Kernel: It is a function used to map a lower-dimensional data into higher

g(x)= f0(x)+ f1(x)+ f2(x)+....

• Random forest uses Bagging or Bootstrap Aggregation technique of

• A general linear or polynomial regression will fail if there is high collinearity

• Lasso regression is another regularization technique to reduce the complexity of

You might also like