[go: up one dir, main page]

0% found this document useful (0 votes)
635 views20 pages

Supervised Learning

The document discusses supervised learning and ensemble techniques in machine learning, focusing on regression methods such as simple linear regression, multiple linear regression, and logistic regression. It explains the concepts, applications, advantages, and disadvantages of linear regression, highlighting its role in predictive analysis and its relationship with independent variables. Additionally, it differentiates linear regression from logistic regression, emphasizing their respective uses in predicting continuous and categorical outcomes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
635 views20 pages

Supervised Learning

The document discusses supervised learning and ensemble techniques in machine learning, focusing on regression methods such as simple linear regression, multiple linear regression, and logistic regression. It explains the concepts, applications, advantages, and disadvantages of linear regression, highlighting its role in predictive analysis and its relationship with independent variables. Additionally, it differentiates linear regression from logistic regression, emphasizing their respective uses in predicting continuous and categorical outcomes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Department of Artificial Intelligence and Data Science T.

Kalaiselvi

231ADC501T
Machine Learning Techniques
UNIT II
SUPERVISED LEARNING AND ENSEMBLE TECHNIQUES
Regression –Simple Linear Regression –Multiple Linear Regression – Logistic
Regression –Classification – K Nearest Neighbours Classifier – Naïve Bayes Classifier –
Support Vector Machine – Ensemble Techniques – Decision Trees – Random Forest –
Bagging – Boosting.

231ADC501T Machine Learning Techniques


Department of Artificial Intelligence and Data Science T.Kalaiselvi

UNIT II

SUPERVISED LEARNING AND ENSEMBLE TECHNIQUES

Regression –Simple Linear Regression –Multiple Linear Regression – Logistic Regression –Classification
– K Nearest Neighbours Classifier – Naïve Bayes Classifier – Support Vector Machine – Ensemble
Techniques – Decision Trees – Random Forest –Bagging – Boosting.

2.1 Regression:
Regression refers to methods used to model and analyze the relationship between a
dependent variable and one or more independent variables. It helps predict or explain the value
of the dependent variable based on the independent variables.
Regression is a type of supervised learning technique in machine learning that models
the relationship between a dependent variable (target/output) and one or more independent
variables (features/inputs). The goal of regression is to predict a continuous output.

📘 Key Concepts
Input: Features (e.g., age, income, hours studied)

Output: Continuous value (e.g., price, salary, temperature)

Objective: Minimize the error between predicted and actual values.

Linear Regression:

A Machine Learning model that outputs continuous value is called Regression.

Linear regression is a statistical method used in machine learning to model the relationship between a
dependent variable and one or more independent variables. It models relationships by fitting a linear
equation to observed data, often serving as a starting point for more complex algorithms and is widely
used in predictive analysis.
• Linear regression is a statistical regression method which is used for predictive analysis.
• It is one of the very simple and easy algorithms which works on regression and shows the
relationship between the continuous variables.
• It is used for solving the regression problem in machine learning.
• Linear regression shows the linear relationship between the independent variable (X-axis) and
the dependent variable (Y-axis), hence called linear regression.
• If there is only one input variable (x), then such linear regression is called simple linear
regression. And if there is more than one input variable, then such linear regression is called multiple
linear regression.
• The relationship between variables in the linear regression model can be explained using the
below image. Here we are predicting the salary of an employee on the basis of the year of experience.
Essentially, linear regression models the relationship between a dependent variable (the outcome you
want to predict) and one or more independent variables (the input features you use for prediction) by
finding the best-fitting straight line through a set of data points. This line, called the regression line,
represents the relationship between the dependent variable (the outcome we want to predict) and the
independent variable(s) (the input features we use for prediction). The equation for a simple linear
regression line is defined as:

231ADC501T Machine Learning Techniques


Department of Artificial Intelligence and Data Science T.Kalaiselvi

y=mx+c

where y is the dependent variable, x is the independent variable, m is the slope of the line, and c is the
y-intercept. This equation provides a mathematical model for mapping inputs to predicted outputs,
with the goal of minimizing the differences between predicted and observed values, known as residuals.
By minimizing these residuals, linear regression produces a model that best represents the data

Conceptually, linear regression can be visualized as drawing a straight line through points on a graph
to determine if there is a relationship between those data points. The ideal linear regression model for
a set of data points is the line that best approximates the values of every point in the data set.

Some popular applications of linear regression are:


• Analyzing trends and sales estimates
• Salary forecasting
• Real estate prediction
• Arriving at ETAs in traffic.
Linear Regression in Machine Learning
Linear regression is one of the easiest and most popular Machine Learning algorithms. It is a statistical
method that is used for predictive analysis. Linear regression makes predictions for continuous/real or
numeric variables such as sales, salary, age, product price, etc.
Linear regression algorithm shows a linear relationship between a dependent (y) and one or more
independent (y) variables, hence called as linear regression. Since linear regression shows the linear
relationship, which means it finds how the value of the dependent variable is changing according to the
value of the independent variable. The linear regression model provides a sloped straight line
representing the relationship between the variables.
Consider the below image:

Mathematically, we can represent a linear regression as:


y= a0+a1x+ ε
Here,
Y= Dependent Variable (Target Variable)

231ADC501T Machine Learning Techniques


Department of Artificial Intelligence and Data Science T.Kalaiselvi

X= Independent Variable (predictor Variable)


a0= intercept of the line (Gives an additional degree of freedom)
a1 = Linear regression coefficient (scale factor to each input value).
ε = random error
The values for x and y variables are training datasets for Linear Regression model representation.

Types of Linear Regression


Linear regression can be further divided into two types of the algorithm:

2.2 Simple Linear Regression:

If a single independent variable is used to predict the value of a numerical dependent


variable, then such a Linear Regression algorithm is called Simple Linear Regression. Simple linear
regression models the relationship between a single independent variable and a dependent variable
using a straight line. The equation for simple linear regression is: y = m x + c where y is the dependent
variable, x is the independent variable, m is the slope of the line, and c is the y-intercept. This method is
a straightforward way to get clear insights when dealing with single-variable scenarios. Consider a
doctor trying to understand how patient height affects weight. By plotting each variable on a graph and
finding the best-fitting line using simple linear regression, the doctor could predict a patient’s weight
based on their height alone.

Simple Linear Regression is a type of Regression algorithms that models the relationship
between a dependent variable and a single independent variable. The relationship shown by a Simple
Linear Regression model is linear or a sloped straight line, hence it is called Simple Linear Regression.
The key point in Simple Linear Regression is that the dependent variable must be a continuous/real
value. However, the independent variable can be measured on continuous or categorical values.

Simple Linear regression algorithm has mainly two objectives:


• Model the relationship between the two variables. Such as the relationship between Income and
expenditure, experience and Salary, etc.
• Forecasting new observations. Such as Weather forecasting according to temperature, Revenue
of a company according to the investments in a year, etc.

Simple Linear Regression Model:


The Simple Linear Regression model can be represented using the below equation:
y= a0+a1x+ ε
Where,
a0= It is the intercept of the Regression line (can be obtained putting x=0)
a1= It is the slope of the regression line, which tells whether the line is increasing or decreasing.
ε = The error term. (For a good model it will be negligible)

2.3 Multiple Linear Regression

Linear regression is a statistical method used for predictive analysis. It models the relationship between
a dependent variable and a single independent variable by fitting a linear equation to the data.

231ADC501T Machine Learning Techniques


Department of Artificial Intelligence and Data Science T.Kalaiselvi

Multiple Linear Regression extends this concept by modelling the relationship between a dependent
variable and two or more independent variables. This technique allows us to understand how multiple
features collectively affect the outcomes.

Steps for Multiple Linear Regression


Steps to perform multiple linear regression are similar to that of simple linear Regression but difference
comes in the evaluation process. We can use it to find out which factor has the highest influence on the
predicted output and how different variables are related to each other.

Equation for multiple linear regression is:

y=β0+β1X1+β2X2+⋯+βnXn

Where:
Y is the dependent variable
X1,X2,⋯Xn are the independent variables
β0 is the intercept
β1,β2,⋯βn are the slopes
The goal of the algorithm is to find the best fit line equation that can predict the values based on the
independent variables. A regression model learns from the dataset with known X and y values and uses
it to predict y values for unknown X.
“Multiple Linear Regression is one of the important regression algorithms which models the linear
relationship
between a single dependent continuous variable and more than one independent variable.”

Example:
Prediction of CO2 emission based on engine size and number of cylinders in a car.

Some key points about MLR:


• For MLR, the dependent or target variable(Y) must be the continuous/real, but the predictor or
independent variable may be of continuous or categorical form.
• Each feature variable must model the linear relationship with the dependent variable.
• MLR tries to fit a regression line through a multidimensional space of data-points.

MLR equation:
In Multiple Linear Regression, the target variable(Y) is a linear combination of multiple predictor
variables x1, x2, x3, ...,xn. Since it is an enhancement of Simple Linear Regression, so the same is applied
for the multiple linear regression equation, the equation becomes:
Y= b<sub>0</sub>+b<sub>1</sub>x<sub>1</sub>+ b<sub>2</sub>x<sub>2</sub>+
b<sub>3</sub>x<sub> 3</sub>+...... bnxn ............... (a)
Where,
Y= Output/Response variable
b0, b1, b2, b3 , bn....= Coefficients of the model.
x1, x2, x3, x4,...= Various Independent/feature variable

Assumptions for Multiple Linear Regression:


• A linear relationship should exist between the Target and predictor variables.
• The regression residuals must be normally distributed.

231ADC501T Machine Learning Techniques


Department of Artificial Intelligence and Data Science T.Kalaiselvi

• MLR assumes little or no multicollinearity (correlation between the independent variable) in


data.
Multiple Linear regression:
If more than one independent variable is used to predict the value of a numerical dependent variable,
then such a Linear Regression algorithm is called Multiple Linear Regression.

As an example, consider a real estate agent who wants to estimate house prices. The agent could use a
simple linear regression based on a single variable, like the size of the house or the zip code, but this
model would be too simplistic, as housing prices are often driven by a complex interplay of multiple
factors. A multiple linear regression, incorporating variables like the size of the house, the
neighborhood, and the number of bedrooms, will likely provide a more accurate prediction model.

Working of Linear regression

Linear regression works by finding the best-fitting line through a set of data points. This process
involves:

1 Selecting the model: In the first step, the appropriate linear equation to describe the relationship
between the dependent and independent variables is selected.

2 Fitting the model: Next, a technique called Ordinary Least Squares (OLS) is used to minimize the sum
of the squared differences between the observed values and the values predicted by the model. This is
done by adjusting the slope and intercept of the line to find the best fit. The purpose of this method is
to minimize the error, or difference, between the predicted and actual values. This fitting process is a
core part of supervised machine learning, in which the model learns from the training data.

3 Evaluating the model: In the final step, the quality of fit is assessed using metrics such as R-squared,
which measures the proportion of the variance in the dependent variable that is predictable from the
independent variables. In other words, R-squared measures how well the data actually fits the
regression model.

This process generates a machine learning model that can then be used to make predictions based on
new data.

Applications of linear regression in ML


In machine learning, linear regression is a commonly used tool for predicting outcomes and
understanding relationships between variables across various fields. Here are some notable examples
of its applications:

Forecasting consumer spending


Income levels can be used in a linear regression model to predict consumer spending. Specifically,
multiple linear regression could incorporate factors such as historical income, age, and employment
status to provide a comprehensive analysis. This can assist economists in developing data-driven
economic policies and help businesses better understand consumer behavioral patterns.

Analyzing marketing impact

231ADC501T Machine Learning Techniques


Department of Artificial Intelligence and Data Science T.Kalaiselvi

Marketers can use linear regression to understand how advertising spend affects sales revenue. By
applying a linear regression model to historical data, future sales revenue can be predicted, allowing
marketers to optimize their budgets and advertising strategies for maximum impact.

Predicting stock prices


In the finance world, linear regression is one of the many methods used to predict stock prices. Using
historical stock data and various economic indicators, analysts and investors can build multiple linear
regression models that help them make smarter investment decisions.

Forecasting environmental conditions


In environmental science, linear regression can be used to forecast environmental conditions. For
example, various factors like traffic volume, weather conditions, and population density can help
predict pollutant levels. These machine learning models can then be used by policymakers, scientists,
and other stakeholders to understand and mitigate the impacts of various actions on the environment.

Advantages of linear regression in ML


Linear regression offers several advantages that make it a key technique in machine learning.

Simple to use and implement


Compared with most mathematical tools and models, linear regression is easy to understand and apply.
It is especially great as a starting point for new machine learning practitioners, providing valuable
insights and experience as a foundation for more advanced algorithms.

Computationally efficient
Machine learning models can be resource-intensive. Linear regression requires relatively low
computational power compared to many algorithms and can still provide meaningful predictive
insights.

Interpretable results
Advanced statistical models, while powerful, are often hard to interpret. With a simple model like linear
regression, the relationship between variables is easy to understand, and the impact of each variable is
clearly indicated by its coefficient.

Foundation for advanced techniques


Understanding and implementing linear regression offers a solid foundation for exploring more
advanced machine learning methods. For example, polynomial regression builds on linear regression
to describe more complex, non-linear relationships between variables.

Disadvantages of linear regression in ML


While linear regression is a valuable tool in machine learning, it has several notable limitations.
Understanding these disadvantages is critical in selecting the appropriate machine learning tool.

Assuming a linear relationship


The linear regression model assumes that the relationship between dependent and independent
variables is linear. In complex real-world scenarios, this may not always be the case. For example, a
person’s height over the course of their life is nonlinear, with the quick growth occurring during
childhood slowing down and stopping in adulthood. So, forecasting height using linear regression could
lead to inaccurate predictions.

231ADC501T Machine Learning Techniques


Department of Artificial Intelligence and Data Science T.Kalaiselvi

Sensitivity to outliers
Outliers are data points that significantly deviate from the majority of observations in a dataset. If not
handled properly, these extreme value points can skew results, leading to inaccurate conclusions. In
machine learning, this sensitivity means that outliers can disproportionately affect the predictive
accuracy and reliability of the model.

Multicollinearity
In multiple linear regression models, highly correlated independent variables can distort the results, a
phenomenon known as multicollinearity. For example, the number of bedrooms in a house and its size
might be highly correlated since larger houses tend to have more bedrooms. This can make it difficult
to determine the individual impact of individual variables on housing prices, leading to unreliable
results.

Assuming a constant error spread


Linear regression assumes that the differences between the observed and predicted values (the error
spread) are the same for all independent variables. If this is not true, the predictions generated by the
model may be unreliable. In supervised machine learning, failing to address the error spread can cause
the model to generate biased and inefficient estimates, reducing its overall effectiveness.

Linear regression vs. logistic regression


Linear regression is often confused with logistic regression. While linear regression predicts outcomes
on continuous variables, logistic regression is used when the dependent variable is categorical, often
binary (yes or no). Categorical variables define non-numeric groups with a finite number of categories,
like age group or payment method. Continuous variables, on the other hand, can take any numerical
value and are measurable. Examples of continuous variables include weight, price, and daily
temperature.

Unlike the linear function used in linear regression, logistic regression models the probability of a
categorical outcome using an S-shaped curve called a logistic function. In the example of binary
classification, data points that belong to the “yes” category fall on one side of the S-shape, while the data
points in the “no” category fall on the other side. Practically speaking, logistic regression can be used to
classify whether an email is spam or not, or predict whether a customer will purchase a product or not.
Essentially, linear regression is used for predicting quantitative values, whereas logistic regression is
used for classification tasks.
Linear Regression Line:
A linear line showing the relationship between the dependent and independent variables is called a
regression line.
A regression line can show two types of relationship:
• Positive Linear Relationship:
If the dependent variable increases on the Y-axis and independent variable increases on X-axis, then
such a relationship is termed as a Positive linear relationship.

• Negative Linear Relationship:


If the dependent variable decreases on the Y-axis and independent variable increases on the X-axis, then
such a relationship is called a negative linear relationship.

Finding the best fit line:

231ADC501T Machine Learning Techniques


Department of Artificial Intelligence and Data Science T.Kalaiselvi

When working with linear regression, our main goal is to find the best fit line that means the error
between predicted values and actual values should be minimized. The best fit line will have the least
error. The different values for weights or the coefficient of lines (a0, a1) gives a different line of
regression, so we need to calculate the best values for a0 and a1 to find the best fit line, so to calculate
this we use cost function. Cost functio no
The different values for weights or coefficient of lines (a0, a1) gives the different line of regression, and
the cost function is used to estimate the values of the coefficient for the best fit line.
• Cost function optimizes the regression coefficients or weights. It measures how a linear
regression model is performing.
• We can use the cost function to find the accuracy of the mapping function, which maps the input
variable to the output variable. This mapping function is also known as Hypothesis function.

For Linear Regression, we use the Mean Squared Error (MSE) cost function, which is the average of
squared error occurred between the predicted values and actual values. It can be written as:
For the above linear equation, MSE can be calculated as:

Where,
N=Total number of observation
Yi = Actual value
(a1xi+a0)= Predicted value.

Residuals: The distance between the actual value and predicted values is called residual. If the observed
points are far from the regression line, then the residual will be high, and so cost function will high. If
the scatter points are close to the regression line, then the residual will be small and hence the cost
function.
A cost function is an important parameter that determines how well a machine learning model
performs for a given dataset. It calculates the difference between the expected value and
predicted value and represents it as a single real number.

In machine learning, once we train our model, then we want to see how well our model is
performing. Although there are various accuracy functions that tell you how your model is
performing, but will not give insights to improve them. So, we need a function that can find when
the model is most accurate by finding the spot between the undertrained and overtrained model.
In simple, "Cost function is a measure of how wrong the model is in estimating the relationship
between X(input) and Y(output) Parameter." A cost function is sometimes also referred to as Loss
function, and it can be estimated by iteratively running the model to compare estimated
predictions against the known values of Y.
The main aim of each ML model is to determine parameters or weights that can minimize the cost
function.
It means for getting the optimal solution; we need a Cost function. It calculated the difference
between the actual values and predicted values and measured how wrong our model in the
prediction was. By minimizing the value of the cost function, we can get the optimal solution.

Types of Cost Function

Regression Cost Function : Mean Error, Mean Square Error, Mean Absolute Error
Binary Classification cost Functions : Cross Entropy
Multi-class Classification Cost Function

231ADC501T Machine Learning Techniques


Department of Artificial Intelligence and Data Science T.Kalaiselvi

Gradient Descent:
• Gradient descent is used to minimize the MSE by calculating the gradient of the cost function.
• A regression model uses gradient descent to update the coefficients of the line by reducing the
cost function.
• It is done by a random selection of values of coefficient and then iteratively update the values to
reach the minimum cost function.

Gradient Descent: Minimizing the cost function

As we discussed in the above section, the cost function tells how wrong your model is? And each
machine learning model tries to minimize the cost function in order to give the best results. Here
comes the role of Gradient descent.
"Gradient Descent is an optimization algorithm which is used for optimizing the cost function or
error in the model." It enables the models to take the gradient or direction to reduce the errors
by reaching to least possible error. Here direction refers to how model parameters should be
corrected to further reduce the cost function. The error in your model can be different at different
points, and you have to find the quickest way to minimize it, to prevent resource wastage.
Gradient descent is an iterative process where the model gradually converges towards a
minimum value, and if the model iterates further than this point, it produces little or zero changes
in the loss. This point is known as convergence, and at this point, the error is least, and the cost
function is optimized.

Model Performance:
The Goodness of fit determines how the line of regression fits the set of observations. The process of
finding the best model out of various models is called optimization. It can be achieved by below method:

1. R-squared method:
• R-squared is a statistical method that determines the goodness of fit.
• It measures the strength of the relationship between the dependent and independent variables
on a scale of 0-100%.
• The high value of R-square determines the less difference between the predicted values and
actual values and hence represents a good model.
• It is also called a coefficient of determination, or coefficient of multiple determination for
multiple regression.
• It can be calculated from the below formula:

Assumptions of Linear Regression


Below are some important assumptions of Linear Regression. These are some formal checks while
building a Linear Regression model, which ensures to get the best possible result from the given dataset.
• Linear relationship between the features and target:
Linear regression assumes the linear relationship between the dependent and independent variables.
• Small or no multicollinearity between the features:
Multicollinearity means high-correlation between the independent variables. Due to multicollinearity,
it may difficult to find the true relationship between the predictors and target variables. Or we can say,
it is difficult to determine which predictor variable is affecting the target variable and which is not. So,
the model assumes either little or no multicollinearity between the features or independent variables.
• Homoscedasticity Assumption:

231ADC501T Machine Learning Techniques


Department of Artificial Intelligence and Data Science T.Kalaiselvi

Homoscedasticity is a situation when the error term is the same for all the values of independent
variables. With homoscedasticity, there should be no clear pattern distribution of data in the scatter
plot.
• Normal distribution of error terms:
Linear regression assumes that the error term should follow the normal distribution pattern. If error
terms are not normally distributed, then confidence intervals will become either too wide or too
narrow, which may cause difficulties in finding coefficients.
It can be checked using the q-q plot. If the plot shows a straight line without any deviation, which means
the error is normally distributed.
• No autocorrelations: The linear regression model assumes no autocorrelation in error terms. If
there will be any correlation in the error term, then it will drastically reduce the accuracy of the model.
Autocorrelation usually occurs if there is a dependency between residual errors.

2. 4 Logistic Regression:
• Logistic regression is another supervised learning algorithm which is used to solve the
classification problems. In classification problems, we have dependent variables in a binary or discrete
format such as 0 or 1.
• Logistic regression algorithm works with the categorical variable such as 0 or 1, Yes or No, True
or False, Spam or not spam, etc.
• It is a predictive analysis algorithm which works on the concept of probability.
• Logistic regression is a type of regression, but it is different from the linear regression algorithm
in the term how they are used.
• Logistic regression uses sigmoid function or logistic function which is a complex cost function.
This sigmoid function is used to model the data in logistic regression. The function can be represented
as: f(x)= Output between the 0 and 1 value.
• x= input to the function
• e= base of natural logarithm.
When we provide the input values (data) to the function, it gives the S-curve as follows:

It uses the concept of threshold levels, values above the threshold level are rounded up to 1, and values
below the threshold level are rounded up to 0.

Types of Logistic Regression


Logistic regression can be classified into three main types based on the nature of the dependent
variable:

Binomial Logistic Regression:

This type is used when the dependent variable has only two possible categories. Examples include
Yes/No, Pass/Fail or 0/1. It is the most common form of logistic regression and is used for binary
classification problems.
Multinomial Logistic Regression: This is used when the dependent variable has three or more
possible categories that are not ordered. For example, classifying animals into categories like "cat,"
"dog" or "sheep." It extends the binary logistic regression to handle multiple classes.
Ordinal Logistic Regression:
This type applies when the dependent variable has three or more categories with a natural order or
ranking.

Assumptions of Logistic Regression

231ADC501T Machine Learning Techniques


Department of Artificial Intelligence and Data Science T.Kalaiselvi

Understanding the assumptions behind logistic regression is important to ensure the model is applied
correctly, main assumptions are:

Independent observations: Each data point is assumed to be independent of the others means there
should be no correlation or dependence between the input samples.
Binary dependent variables: It takes the assumption that the dependent variable must be binary,
means it can take only two values. For more than two categories SoftMax functions are used.
Linearity relationship between independent variables and log odds: The model assumes a linear
relationship between the independent variables and the log odds of the dependent variable which
means the predictors affect the log odds in a linear way.
No outliers: The dataset should not contain extreme outliers as they can distort the estimation of the
logistic regression coefficients.
Large sample size: It requires a sufficiently large sample size to produce reliable and stable results.

Understanding Sigmoid Function


1. The sigmoid function is a important part of logistic regression which is used to convert the raw output
of the model into a probability value between 0 and 1.

2. This function takes any real number and maps it into the range 0 to 1 forming an "S" shaped curve
called the sigmoid curve or logistic curve. Because probabilities must lie between 0 and 1, the sigmoid
function is perfect for this purpose.

3. In logistic regression, we use a threshold value usually 0.5 to decide the class label.

If the sigmoid output is same or above the threshold, the input is classified as Class 1.
If it is below the threshold, the input is classified as Class 0.
This approach helps to transform continuous input values into meaningful class predictions.

Working of Logistic Regression


Logistic regression model transforms the linear regression function continuous value output into
categorical value output using a sigmoid function which maps any real-valued set of independent
variables input into a value between 0 and 1. This function is known as the logistic function.

Suppose we have input features represented as a matrix:

and the dependent variable is Y having only binary value i.e 0 or 1.

231ADC501T Machine Learning Techniques


Department of Artificial Intelligence and Data Science T.Kalaiselvi

then, apply the multi-linear function to the input variables X.

Here xi is the ith observation of X, wi =[w1,w2,w3,⋯,wm] is the weights or Coefficient and


bis the bias term also known as intercept. Simply this can be represented as the dot
product of weight and bias.

Z= w.X+b

At this stage, z is a continuous value from the linear regression. Logistic regression then
applies the sigmoid function to z to convert it into a probability between 0 and 1 which can be
used to predict the class.

Now we use the sigmoid function where the input will be z and we find the probability
between 0 and 1. i.e. predicted y.

As shown above the sigmoid function converts the continuous variable data into the
probability i.e between 0 and 1.

σ(z) tends towards 1 as z → ∞


σ(z) tends towards 0 as z → - ∞
σ(z) is always bounded between 0 and 1
where the probability of being a class can be measured as:

P(y=1)=σ(z)
P(y=0)=1−σ(z)

231ADC501T Machine Learning Techniques


Department of Artificial Intelligence and Data Science T.Kalaiselvi

2.5 Classification
Classification teaches a machine to sort things into categories. It learns by looking at examples
with labels (like emails marked "spam" or "not spam"). After learning, it can decide which
category new items belong to, like identifying if a new email is spam or not. For example a
classification model might be trained on dataset of images labeled as either dogs or cats and it
can be used to predict the class of new and unseen images as dogs or cats based on their features
such as color, texture and shape.

Explaining classification in ml, horizontal axis represents the combined values of color and texture
features. Vertical axis represents the combined values of shape and size features.

Each colored dot in the plot represents an individual image, with the color indicating whether the model
predicts the image to be a dog or a cat.
The shaded areas in the plot show the decision boundary, which is the line or region that the model uses
to decide which category (dog or cat) an image belongs to. The model classifies images on one side of
the boundary as dogs and on the other side as cats, based on their features. Basically, machine looks at
the features in the image (like shape, color, or texture) and chooses which animal the picture is most
likely to be based on the training it received.

2.5.1 Types of Classification


When we talk about classification in machine learning, we’re talking about the process of sorting data
into categories based on specific features or characteristics. There are different types of classification
problems depending on how many categories (or classes) we are working with and how they are
organized. There are two main classification types in machine learning:

1. Binary Classification
This is the simplest kind of classification. In binary classification, the goal is to sort the data into two
distinct categories. Think of it like a simple choice between two options. Imagine a system that sorts
emails into either spam or not spam. It works by looking at different features of the email like certain
keywords or sender details, and decides whether it’s spam or not. It only chooses between these two
options.

2. Multiclass Classification

231ADC501T Machine Learning Techniques


Department of Artificial Intelligence and Data Science T.Kalaiselvi

Here, instead of just two categories, the data needs to be sorted into more than two categories. The
model picks the one that best matches the input. Think of an image recognition system that sorts
pictures of animals into categories like cat, dog, and bird.

3. Multi-Label Classification
In multi-label classification single piece of data can belong to multiple categories at once. Unlike
multiclass classification where each data point belongs to only one class, multi-label classification
allows datapoints to belong to multiple classes. A movie recommendation system could tag a movie as
both action and comedy. The system checks various features (like movie plot, actors, or genre tags) and
assigns multiple labels to a single piece of data, rather than just one. Multilabel classification is relevant
in specific use cases, but not as crucial for a starting overview of classification.

2.5.2 How does Classification in Machine Learning Work?


Classification involves training a model using a labeled dataset, where each input is paired with its
correct output label. The model learns patterns and relationships in the data, so it can later predict
labels for new, unseen inputs.

In machine learning, classification works by training a model to learn patterns from labeled data, so it
can predict the category or class of new, unseen data.

Working

 Data Collection: You start with a dataset where each item is labeled with the correct class (for
example, "cat" or "dog").
 Feature Extraction: The system identifies features (like color, shape, or texture) that help
distinguish one class from another. These features are what the model uses to make predictions.
 Model Training: Classification - machine learning algorithm uses the labeled data to learn how
to map the features to the correct class. It looks for patterns and relationships in the data.
 Model Evaluation: Once the model is trained, it's tested on new, unseen data to check how
accurately it can classify the items.
 Prediction: After being trained and evaluated, the model can be used to predict the class of new
data based on the features it has learned.

231ADC501T Machine Learning Techniques


Department of Artificial Intelligence and Data Science T.Kalaiselvi

 Model Evaluation: Evaluating a classification model is a key step in machine learning. It helps us
check how well the model performs and how good it is at handling new, unseen data. Depending
on the problem and needs we can use different metrics to measure its performance. If the quality
metric is not satisfactory, the ML algorithm or hyperparameters can be adjusted, and the model
is retrained. This iterative process continues until a satisfactory performance is achieved. In
short, classification in machine learning is all about using existing labeled data to teach the
model how to predict the class of new, unlabeled data based on the patterns it has learned.

2.5.3 Classification Algorithms

Now, for implementation of any classification model it is essential to understand Logistic


Regression, which is one of the most fundamental and widely used algorithms in machine
learning for classification tasks. There are various types of classifiers algorithms. Some of them
are :
Linear Classifiers: Linear classifier models create a linear decision boundary between classe s.
They are simple and computationally efficient.

Linear classification models


 Logistic Regression
 Support Vector Machines having kernel = 'linear'
 Single-layer Perceptron
 Stochastic Gradient Descent (SGD) Classifier

Non-linear Classification models:

Non-linear models create a non-linear decision boundary between classes. They can capture
more complex relationships between input features and target variable. Some of the non-linear
classification models are as follows:

 K-Nearest Neighbours
 Kernel SVM
 Naive Bayes
 Decision Tree Classification
 Ensemble learning classifiers:
 Random Forests,
 AdaBoost,
 Bagging Classifier,
 Voting Classifier,
 Extra Trees Classifier
 Multi-layer Artificial Neural Networks

2.5.4 Applications of Classification:

Classification algorithms are widely used in many real-world applications across various domains,
including:

 Email spam filtering

231ADC501T Machine Learning Techniques


Department of Artificial Intelligence and Data Science T.Kalaiselvi

 Credit risk assessment: Algorithms predict whether a loan applicant is likely to default by
analyzing factors such as credit score, income, and loan history. This helps banks make informed
lending decisions and minimize financial risk.
 Medical diagnosis : Machine learning models classify whether a patient has a certain condition
(e.g., cancer or diabetes) based on medical data such as test results, symptoms, and patient
history. This aids doctors in making quicker, more accurate diagnoses, improving patient care.
 Image classification : Applied in fields such as facial recognition, autonomous driving, and
medical imaging.
 Sentiment analysis: Determining whether the sentiment of a piece of text is positive, negative,
or neutral. Businesses use this to understand customer opinions, helping to improve products
and services.
 Fraud detection : Algorithms detect fraudulent activities by analyzing transaction patterns and
identifying anomalies crucial in protecting against credit card fraud and other financial crimes.
 Recommendation systems : Used to recommend products or content based on past user
behavior, such as suggesting movies on Netflix or products on Amazon. This personalization
boosts user satisfaction and sales for businesses.

2.6 K Nearest Neighbours Classifier

K-Nearest Neighbors (KNN) is a supervised machine learning algorithm generally used for
classification but can also be used for regression tasks. It works by finding the "k" closest data
points (neighbors) to a given input and makesa predictions based on the majority class (for
classification) or the average value (for regression). Since KNN makes no assumptions about the
underlying data distribution it makes it a non-parametric and instance-based learning method.
K-Nearest Neighbors is also called as a lazy learner algorithm because it does not learn from the
training set immediately instead it stores the dataset and at the time of classification it performs
an action on the dataset.
For example, consider the following table of data points containing two features:

231ADC501T Machine Learning Techniques


Department of Artificial Intelligence and Data Science T.Kalaiselvi

The new point is classified as Category 2 because most of its closest neighbors are blue
squares. KNN assigns the category based on the majority of nearby points. The image shows how
KNN predicts the category of a new data point based on its closest neighbours.

 The red diamonds represent Category 1 and the blue squares represent Category 2.
 The new data point checks its closest neighbors (circled points).
 Since the majority of its closest neighbors are blue squares (Category 2) KNN predicts the
new data point belongs to Category 2.

2.6.1 'K' in K Nearest Neighbour

In the k-Nearest Neighbours algorithm k is just a number that tells the algorithm how
many nearby points or neighbors to look at when it makes a decision.

Choosing the value of k for KNN Algorithm

 The value of k in KNN decides how many neighbors the algorithm looks at when making a
prediction.
 Choosing the right k is important for good results.
 If the data has lots of noise or outliers, using a larger k can make the predictions more
stable.
 But if k is too large the model may become too simple and miss important patterns and
this is called underfitting.
 So k should be picked carefully based on the data.
Example: Imagine you're deciding which fruit it is based on its shape and size. You compare it to
fruits you already know.

If k = 3, the algorithm looks at the 3 closest fruits to the new one.


If 2 of those 3 fruits are apples and 1 is a banana, the algorithm says the new fruit is an apple
because most of its neighbors are apples.

2.6.2 Distance Metrics Used in KNN Algorithm

KNN uses distance metrics to identify nearest neighbor, these neighbors are used for
classification and regression task. To identify nearest neighbor we use below distance metrics:

1. Euclidean Distance
Euclidean distance is defined as the straight-line distance between two points in a plane or space.
You can think of it like the shortest path you would walk if you were to go directly from one point
to another.

2. Manhattan Distance
This is the total distance you would travel if you could only move along horizontal and vertical
lines like a grid or city streets. It’s also called "taxicab distance" because a taxi can only drive along
the grid-like streets of a city.

231ADC501T Machine Learning Techniques


Department of Artificial Intelligence and Data Science T.Kalaiselvi

3. Minkowski Distance
Minkowski distance is like a family of distances, which includes both Euclidean and Manhattan
distances as special cases.

From the formula above, when p=2, it becomes the same as the Euclidean distance formula and
when p=1, it turns into the Manhattan distance formula. Minkowski distance is essentially a
flexible formula that can represent either Euclidean or Manhattan distance depending on the
value of p.

2.6.3 Working of KNN algorithm

Thе K-Nearest Neighbors (KNN) algorithm operates on the principle of similarity where it
predicts the label or value of a new data point by considering the labels or values of its K nearest
neighbors in the training dataset.
Step 1: Selecting the optimal value of K
K represents the number of nearest neighbors that needs to be considered while
making prediction.
Step 2: Calculating distance
To measure the similarity between target and training data points Euclidean distance
is used. Distance is calculated between data points in the dataset and target point.
Step 3: Finding Nearest Neighbors
The k data points with the smallest distances to the target point are nearest neighbors.
Step 4: Voting for Classification or Taking Average for Regression

When you want to classify a data point into a category like spam or not spam, the KNN
algorithm looks at the K closest points in the dataset. These closest points are called neighbors.
The algorithm then looks at which category the neighbors belong to and picks the one that
appears the most. This is called majority voting.
In regression, the algorithm still looks for the K closest points. But instead of voting for a class in
classification, it takes the average of the values of those K neighbors. This average is the predicted
value for the new point for the algorithm.

Naïve Bayes Classifier

Support Vector Machine

Ensemble Techniques

Decision Trees

231ADC501T Machine Learning Techniques


Department of Artificial Intelligence and Data Science T.Kalaiselvi

Random Forest

Bagging – Boosting.

231ADC501T Machine Learning Techniques

You might also like