Regression Analysis in Machine Learning: Context
Regression Analysis in Machine Learning: Context
Context:
Now based on the available data, what if someone asks you how many
college graduates with master's degrees will there be in the year 2018?
It can be seen that the number of college graduates with master’s
degrees increases almost linearly with the year. So by simple visual
analysis, we can get a rough estimate of that number to be between 2.0
to 2.1 million. Let's look at the actual numbers. The graph below plots
the same variable from the year 2001 to the year 2018. It can be seen
that our predicted number was in the ballpark of the actual value.
Since it was a simpler problem (fitting a line to data), our mind was
easily able to do that. This process of fitting a function to a set of data
points is known as regression analysis.
Mathematically speaking
Now let’s talk about different ways in which we can carry out
regression. Based on the family-of-functions (f_beta), and the loss
function (l) used, we can categorize regression into the following
categories.
1. Linear Regression
2. Polynomial Regression
3. Ridge Regression
4. LASSO regression
The figure below tries to visualize this idea on the same example as
above. The data points are fit using both the Ridge and Lasso
regression and their corresponding fit and weighs are plotted in
ascending order. It can be seen that most of the weights in the LASSO
regression are really close to zero.
Mathematically speaking, LASSO regression solves the following
problem by modifying the loss function.
The constant alpha>0 is used to control the tradeoff between the fit
and the sparsity in the learned weights. A large value of alpha results in
poor fit but a sparser learned set of weights. On the other hand, a small
value of alpha results in a tight fit on training data points (might lead
to over-fitting) but with a less sparse set of weights.
5. ElasticNet Regression
6. Bayesian Regression
Consider the following example where the data points belong to one of
the two categories: {0 (red), 1 (yellow)} as shown in the scatter plot
below.
[left] Scatter plot of data points — [Right] Logistic regression trained on data
points plotted in blue
https://towardsdatascience.com/a-beginners-guide-to-regression-analysis-in-machine-learning-
8a828b491bbf