Machine Learning: Bilal Khan
Machine Learning: Bilal Khan
More specifically, Regression analysis helps us to understand how the value of the
dependent variable is changing corresponding to an independent variable when
other independent variables are held fixed.
Now, the company wants to do the advertisement of $200 in the year 2021 and
wants to know the prediction about the sales for this year. So to solve such
type of prediction problems in machine learning, we need regression analysis.
Regression is a supervised learning technique which helps in finding the correlation
between variables and enables us to predict the continuous output variable based
on the one or more predictor variables. It is mainly used for prediction,
forecasting, time series modeling, and determining the causal-effect
relationship between variables.
In Regression, we plot a graph between the variables which best fits the given
datapoints, using this plot, the machine learning model can make predictions about
the data. In simple words, "Regression shows a line or curve that passes
through all the datapoints on target-predictor graph in such a way that
the vertical distance between the datapoints and the regression line is
minimum." The distance between datapoints and line tells whether a model has
captured a strong relationship or not.
• Regression estimates the relationship between the target and the independent
variable.
• It is used to find the trends in data.
• It helps to predict real/continuous values.
• By performing the regression, we can confidently determine the most
important factor, the least important factor, and how each factor is
affecting the other factors.
Types of Regression
There are various types of regressions which are used in data science and machine
learning. Each type has its own importance on different scenarios, but at the core,
all the regression methods analyze the effect of the independent variable on
dependent variables. Here we are discussing some important types of regression
which are given below:
Linear Regression
It uses the concept of threshold levels, values above the threshold level are
rounded up to 1, and values below the threshold level are rounded up to 0.
• Binary(0/1, pass/fail)
• Multi(cats, dogs, lions)
• Ordinal(low, medium, high)
Polynomial Regression
• Polynomial Regression is a type of regression which models the non-linear
dataset using a linear model.
• It is similar to multiple linear regression, but it fits a non-linear curve between
the value of x and corresponding conditional values of y.
• Suppose there is a dataset which consists of datapoints which are present in a
non-linear fashion, so for such case, linear regression will not best fit to those
datapoints. To cover such datapoints, we need Polynomial regression.
• In Polynomial regression, the original features are transformed into
polynomial features of given degree and then modeled using a linear
model. Which means the datapoints are best fitted using a polynomial line.
• The equation for polynomial regression also derived from linear regression
equation that means Linear regression equation Y= b0+ b1x, is transformed into
Polynomial regression equation Y= b0+b1x+ b2x2+ b3x3+.....+ bnxn.
• Here Y is the predicted/target output, b0, b1,... bn are the regression
coefficients. x is our independent/input variable.
• The model is still linear as the coefficients are still linear with quadratic
Support Vector Regression
Support Vector Machine is a supervised learning algorithm which can be used for
regression as well as classification problems. So if we use it for regression
problems, then it is termed as Support Vector Regression.
Support Vector Regression is a regression algorithm which works for continuous
variables.
Below are some keywords which are used in Support Vector Regression:
Here, the blue line is called hyperplane, and the other two lines are known as
boundary lines.
Decision Tree Regression
• Decision Tree is a supervised learning algorithm which can be used for solving both
classification and regression problems.
• It can solve problems for both categorical and numerical data
• Decision Tree regression builds a tree-like structure in which each internal node
represents the "test" for an attribute, each branch represent the result of the test, and
each leaf node represents the final decision or result.
• A decision tree is constructed starting from the root node/parent node (dataset), which
splits into left and right child nodes (subsets of dataset). These child nodes are further
divided into their children node, and themselves become the parent node of those nodes.
Consider the below image:
Above image showing the example of Decision Tee regression, here, the model is
trying to predict the choice of a person between Sports cars or Luxury car.
• Random forest is one of the most powerful supervised learning algorithms which
is capable of performing regression as well as classification tasks.
• The Random Forest regression is an ensemble learning method which combines
multiple decision trees and predicts the final output based on the average of
each tree output.
The combined decision trees are called as base models, and it can be represented
more formally as:
• Ridge regression is one of the most robust versions of linear regression in which
a small amount of bias is introduced so that we can get better long term
predictions.
• The amount of bias added to the model is known as Ridge Regression
penalty. We can compute this penalty term by multiplying with the lambda to
the squared weight of each individual features.
• The equation for ridge regression will be: