[go: up one dir, main page]

0% found this document useful (0 votes)
18 views26 pages

Logistic Regression

Logistic regression is a binary classification method that uses the sigmoid function to predict probabilities between 0 and 1, fitting an 'S' shaped curve instead of a linear regression line. Generalized Linear Models (GLMs) extend traditional regression by allowing for flexible, non-linear relationships and are applicable to various distributions, including logistic and linear regression. While GLMs offer advantages such as robustness and ease of use, they also have limitations, including assumptions about data distribution and potential overfitting.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views26 pages

Logistic Regression

Logistic regression is a binary classification method that uses the sigmoid function to predict probabilities between 0 and 1, fitting an 'S' shaped curve instead of a linear regression line. Generalized Linear Models (GLMs) extend traditional regression by allowing for flexible, non-linear relationships and are applicable to various distributions, including logistic and linear regression. While GLMs offer advantages such as robustness and ease of use, they also have limitations, including assumptions about data distribution and potential overfitting.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Logistic Regression

• Logistic regression is used for binary classification where we


use sigmoid function, that takes input as independent variables
and produces a probability value between 0 and 1.
Logistic regression Classification
Steps are:
1. Data preprocessing
2. Fitting LR to the training set
3. Predicting the test results
4. Test accuracy of the result
5. Visualize the result.
• Logistic regression predicts the output of a categorical
dependent variable. The outcome is categorical or discrete
value.
• It can be either Yes or No, 0 or 1, true or False, etc. but instead
of giving the exact value as 0 and 1, it gives the probabilistic
values which lie between 0 and 1.
• In Logistic regression, instead of fitting a regression line, we fit
an “S” shaped logistic function, which predicts two maximum
values (0 or 1).
Sigmoid Function
• The sigmoid function is a mathematical function used to map
the predicted values to probabilities.
• It maps any real value into another value within a range of 0
and 1. The value of the logistic regression must be between 0
and 1, which cannot go beyond this limit, so it forms a curve
like the “S” form.
• In logistic regression, we use the concept of the threshold
value, which defines the probability of either 0 or 1. Such as
values above the threshold value tends to 1, and a value below
the threshold values tends to 0.
Logistic Regression Equation
Generalized Linear Models

Generalized Linear Models (GLMs) are a class of regression


models that can be used to model a wide range of relationships
between a response variable and one or more predictor
variables.
Unlike traditional linear regression models, which assume a
linear relationship between the response and predictor
variables, GLMs allow for more flexible, non-linear relationships
by using a different underlying statistical distribution.
Features of GLMs
1.Flexibility: GLMs can model a wide range of relationships between
the response and predictor variables, including linear, logistic,
Poisson, and exponential relationships.
2.Model interpretability: GLMs provide a clear interpretation of the
relationship between the response and predictor variables, as well
as the effect of each predictor on the response.
3.Robustness: GLMs can be robust to outliers and other anomalies in
the data, as they allow for non-normal distributions of the response
variable.
4.Scalability: GLMs can be used for large datasets and complex
models, as they have efficient algorithms for model fitting and
prediction.
5.Ease of use: GLMs are relatively easy to understand and use, especially
compared to more complex models such as neural networks or decision
trees.
6.Hypothesis testing: GLMs allow for hypothesis testing and statistical
inference, which can be useful in many applications where it’s important to
understand the significance of relationships between variables.

7.Regularization: GLMs can be regularized to reduce overfitting and


improve model performance, using techniques such as Lasso, Ridge, or
Elastic Net regression.
8. Model comparison: GLMs can be compared using information criteria
such as AIC or BIC, which can help to choose the best model among a set
of alternatives.
Some of the disadvantages of GLMs

• Assumptions: GLMs make certain assumptions about the


distribution of the response variable, and these assumptions may
not always hold.
• Model specification: Specifying the correct underlying statistical
distribution for a GLM can be challenging, and incorrect specification
can result in biased or incorrect predictions.
• Overfitting: Like other regression models, GLMs can be prone to
overfitting if the model is too complex or has too many predictor
variables.
• Overall, GLMs are a powerful and flexible tool for modeling
relationships between response and predictor variables, and are
widely used in many fields, including finance, marketing, and
epidemiology.
• Limited flexibility: While GLMs are more flexible than
traditional linear regression models, they may still not be able
to capture more complex relationships between variables, such
as interactions or non-linear effects.
• Data requirements: GLMs require a sufficient amount of data to
estimate model parameters and make accurate predictions, and
may not perform well with small or imbalanced datasets.
• Model assumptions: GLMs rely on certain assumptions about
the distribution of the response variable and the relationship
between the response and predictor variables, and violation of
these assumptions can lead to biased or incorrect predictions.
Generalized linear models (GLMs) which explains how Linear
regression and Logistic regression are a member of a much broader
class of models. GLMs can be used to construct the models for
regression and classification problems by using the type of
distribution which best describes the data or labels given for training
the model.
Below given are some types of datasets and the corresponding
distributions which would help us in constructing the model
1.Binary classification data – Bernoulli distribution
2.Real valued data – Gaussian distribution
3.Count-data – Poisson distribution
• To understand GLMs we will begin by defining exponential
families. Exponential families are a class of distributions whose
probability density function(PDF) can be molded into the
following form:
Linear Regression Model: To show that Linear Regression is a
special case of the GLMs. It is considered that the output labels
are continuous values and are therefore a Gaussian
distribution. So, we have
The first equation above corresponds to the first assumption that
the output labels (or target variables) should be the member of
an exponential family.
Second equation corresponds to the assumption that
the hypothesis is equal the expected value or mean of the
distribution and lastly.
the third equation corresponds to the assumption that natural
parameter and the input parameters follow a linear relationship.
Logistic Regression Model: To show that Logistic Regression is
a special case of the GLMs. It is considered that the output
labels are Binary valued and are therefore a
Bernoulli distribution. So, we have
From the third assumption, it is proven that:
The function that maps the natural parameter to the canonical
parameter is known as the canonical response function (here,
the log-partition function) and the inverse of it is known as
the canonical link function.

Therefore by using the three assumptions mentioned before it


can be proved that the Logistic and Linear Regression belongs
to a much larger family of models known as GLMs.

You might also like