Logistic Regression
Linear Regression Algorithms Agenda
1. What is Logistic Regression ?
2. Linear Vs Logistic Regression
3. Theoretical Foundations
4. Model Training
5. Practical Implementation
6. Advanced Topics
7. Case Studies
8. Conclusion
A statistical method for binary classification problems, where the outcome is a categorical
variable with two possible outcomes.
Use cases: Spam detection, disease diagnosis, customer churn prediction, etc.
Linear Vs Logistic Regression
Logistic Regression
3.1 The Logistic Function
The logistic function, or sigmoid function, maps any real-valued number into a value
between 0 and 1.
Theoretical Foundations
Graphical representation: Plot the sigmoid function to show how it squashes values.
3.2. Odds and Log-Odds
Odds: The ratio of the probability of the event occurring to the probability of it not
occurring.
Log-Odds (Logit): The natural logarithm of the odds.
3.3. Logistic Regression Model p : Prob. of the positive class
W : The vector of weights
Model Representation >> X : is the feature vector,
b : is the bias.
3.4. Cost Function
Binary Cross-Entropy Loss >>
4.1 Gradient Descent Optimization Algorithm
To minimize the cost function by updating weights and bias.
Update Rules
& (α – Learning Rate)
4.2 Regularization
Model Training
Purpose : To prevent overfitting by penalizing large weights.
L1 Regularization: Adds a penalty proportional to the absolute value of weights.
L2 Regularization: Adds a penalty proportional to the square of weights.
4.3 Model Evaluation
Confusion Matrix : Table showing true positives, true negatives, false positives, and false negatives.
Performance metrics :
Accuracy, precision, Recall, F1 score, ROC curve & AUC Curve
5.1. Data Preparation
Feature Scaling: Normalize features to ensure effective learning.
Handling Missing Values: Strategies to deal with missing data.
Practical Implementation
5.2 Implementing Logistic Regression
Use scikit-learn libraries to implement Logistic Regression
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
model = LogisticRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
5.3 Hyper Parameter Tuning
Grid Search: Explore different hyperparameters to find the best model configuration.
Cross-Validation: Validate the model's performance on multiple subsets of data.
More Advanced techniques used
5.1 Multiclass Logistic Regression
One-vs-Rest (OvR): Decompose a multiclass problem into multiple binary classification problems.
Softmax Function: Generalization of the sigmoid function to handle multiple classes.
Advanced Topics
5.2 Regularization Techniques
Elastic Net Regularization: Combines L1 and L2 regularization.
Choosing Regularization Parameters: Impact on model complexity and performance.
5.3 Feature Engineering
Interaction Terms: Adding interaction features to capture relationships between features.
Polynomial Features: Creating polynomial terms for capturing non-linear relationships.
Case Studies and Applications Real World Examples
Medical Diagnosis: Predicting the presence of diseases.
Marketing: Customer churn prediction.
Finance: Credit scoring.
Challenges & Solutions
Imbalanced Data: Techniques such as `resampling or using different metrics to handle class
imbalance.
Overfitting: Strategies to mitigate overfitting, including cross-validation and regularization.