[go: up one dir, main page]

0% found this document useful (0 votes)
7 views26 pages

Classification and Regression

Classification
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views26 pages

Classification and Regression

Classification
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

MIT School of Computing

Department of Computer Science & Engineering

Third Year Engineering


23CSE3006 -MACHINE LEARNING
Class - T.Y. AIA (SEM-II)
Unit II: Supervised Machine Learning
Name Of the Course Coordinator:
Prof. Aarti Pimpalkar
Team Members
1. Prof. Dr. Nilima Kulkarni
2. Prof. Abhishek Das
3. Prof. Dattatray Kale
4. Prof. Nilesh Kulal

AY 2025-2026 SEM-II
INTRODUCTION TO CLASSIFICATION
Classification is used when you want to categorize data into different classes or groups. For
example, classifying emails as "spam" or "not spam" or predicting whether a patient has a
certain disease based on their symptoms. Here are some common types of classification
models:
1. Decision Tree Classification: Builds a tree where each node represents a test case for an
attribute, and branches represent possible outcomes.
2. Support Vector Machine: It can be used for both classification and regression tasks.
3. K-Nearest Neighbor (KNN): Classifies data points based on the 'k' nearest neighbors using
feature similarity.
INTRODUCTION TO REGRESSION
Regression algorithms predict a continuous value based on input data. This is
used when you want to predict numbers such as income, height, weight, or
even the probability of something happening (like the chance of rain). Some of
the most common types of regression are:
Simple Linear Regression: Models the relationship between one independent
variable and a dependent variable using a straight line.
Multiple Linear Regression: Predicts a dependent variable based on two or
more independent variables.
Logistic Regression: Models nonlinear relationships by fitting a curve to the
data.
CLASSIFICATION AND REGRESSION
• Classification and regression are two primary tasks in
supervised machine learning,
• where key difference lies in the nature of the output:
classification deals with discrete outcomes
• (e.g., yes/no, categories), while regression handles
continuous values (e.g., price, temperature).
• Both approaches require labeled data for training
but differ in their objectives—classification aims
to find decision boundaries that separate classes,
whereas regression focuses on finding the best-
fitting line to predict numerical outcomes.
• Understanding these distinctions helps in selecting
the right approach for specific machine learning
tasks.
CLASSIFICATION Vs REGRESSION
INTRODUCTION TO REGRESSION
Regression: Regression analysis is a form of predictive modelling
technique which investigates the relationship between a dependent and
independent variable.
INTRODUCTION TO LINEAR REGRESSION
• Linear regression is like drawing a straight line through data points
to predict future outcomes or understand the relationship between
two variables.
• It's used when we want to find a relationship between one thing we
want to predict (called the dependent variable) and one or more
things we use to make that prediction (called independent variables
or predictors).
LINEAR REGRESSION: WORKING
• Imagine you have a bunch of points on a graph.
• Linear regression finds the best-fitting line that goes through those
points.
• Once you have this line, you can use it to make predictions about
future points or understand how changes in one variable might affect
another.
INTRODUCTION TO REGRESSION
LINEAR REGRESSION: APPLICATIONS
• Predicting house prices based on factors like size, number of rooms,
location, etc.
• Forecasting sales based on advertising spending, seasonality, or other
factors.
• Understanding how temperature affects ice cream sales.
LINEAR REGRESSION: ADVANTAGES
• Simplicity: Easy to understand and implement.
• Interpretability: Provides insights into the relationship between
variables.
• Speed: Quick to train and make predictions.
LINEAR REGRESSION: DISADVANTAGES
• Assumes Linearity: Assumes that the relationship between variables
is linear, which might not always be the case.
• Sensitivity to Outliers: Outliers (extreme data points) can
significantly impact the model's performance.
• Limited Complexity: Cannot capture complex relationships
between variables without modifications (like polynomial
regression).
INTRODUCTION TO LOGISTIC REGRESSION
• Logistic regression is a type of machine learning algorithm used for
binary classification tasks,
• which means it predicts the probability of an input belonging to one
of two categories.
• It's called "regression" but actually works for classification.
INTRODUCTION TO LOGISTIC REGRESSION
Logistic Regression produces result in a binary format which is used to
predict the outcome of a categorical dependent variable. So the outcome
should be discrete/categorical such as:
LOGISTIC REGRESSION CURVE
INTRODUCTION TO LOGISTIC REGRESSION
EXAMPLE
LOGISTIC REGRESSION
LOGISTIC REGRESSION: WORKING
• It models the relationship between a dependent binary variable
(target) and one or more independent variables (features).
• Utilizes the logistic function (sigmoid) to transform predictions into
probabilities between 0 and 1.
• The model makes predictions by calculating the probability that an
input belongs to a particular class.
LOGISTIC REGRESSION: APPLICATIONS
• Medical Diagnosis: Predicting if a patient has a disease based on
symptoms.
• Marketing: Determining if a customer will buy a product.
• Credit Risk Assessment: Evaluating the risk of default for loans.
• Image Segmentation: Identifying objects in images as part of
computer vision tasks.
LOGISTIC REGRESSION: ADVANTAGES
• Simplicity: Easy to implement and understand.
• Efficiency: Computationally inexpensive and performs well on small
to medium-sized datasets.
• Interpretability: Provides insight into the importance of features on
the outcome.
LOGISTIC REGRESSION: DISADVANTAGES
• Linear Assumption: Assumes a linear relationship between features
and outcomes, which may not hold in real-world scenarios.
• Limited Complexity: Not suitable for complex patterns in data.
• Sensitivity to Outliers: Influenced by outliers that skew the model's
predictions.
LINEAR Vs LOGISTIC REGRESSION
MULTIPLE LINEAR REGRESSION
Linear regression is a statistical method used for predictive analysis. It models the relationship between a
dependent variable and a single independent variable by fitting a linear equation to the data.
Multiple Linear Regression extends this concept by modelling the relationship between a dependent variable and
two or more independent variables. This technique allows us to understand how multiple features collectively
affect the outcomes.
Steps for Multiple Linear Regression
Steps to perform multiple linear regression are similar to that of simple linear Regression but difference comes in
the evaluation process. We can use it to find out which factor has the highest influence on the predicted output and
how different variables are related to each other. Equation for multiple linear regression is:
y=β0+β1X1+β2X2+⋯+βnXn
Where: y is the dependent variable
•X1,X2,⋯XnX1,X2,⋯Xn are the independent variables
•β0β0 is the intercept
•β1,β2,⋯βnβ1,β2,⋯βn are the slopes
The goal of the algorithm is to find the best fit line equation that can predict the values based on the independent
variables. A regression model learns from the dataset with known X and y values and uses it to predict y values for
unknown X.
MULTIPLE LINEAR REGRESSION
How Does It Work?
•The algorithm finds the best coefficients (b1, b2, …)
•by minimizing the sum of squared errors between actual Y and predicted Y.
•This method is called the Least Squares Method.

Example Imagine we want to predict a student’s final exam score (Y) using two factors:
•X1 = Hours of Study
•X2 = Number of Practice Tests Taken

Suppose after training the model, we get this equation:


Score=20+5X1+3X2
Interpretation:
•Intercept (20): Even if a student studies 0 hours and takes 0 practice tests, they may score
20 marks (base level).
•Coefficient (5): Each additional hour of study increases the score by 5 points.
•Coefficient (3): Each additional practice test increases the score by 3 points.
Prediction Example : •The red points are the actual data (Hours of Study, Prac
•If a student studies for 4 hours and takes 2 practice tests: Tests → Exam Score).
•The colored plane is the regression surface (the
Score=20+(5×4)+(3×2)=20+20+6=46
equivalent of the regression line in 3D).
So, the predicted score = 46 marks.

You might also like