[go: up one dir, main page]

0% found this document useful (0 votes)
47 views58 pages

MLT Unit 2 - Updated

Uploaded by

Suhani Garg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views58 pages

MLT Unit 2 - Updated

Uploaded by

Suhani Garg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 58

Machine Learning Techniques

(KAI-601)
Unit 2: Regression,
Bayesian Learning and
Support Vector Machine

Mr. Waseem Ahmed


Assistant Professor
CSE-AIML
ABES Engineering College
Regression in ML
• Regression is a supervised learning problem where we are given
examples forming a dataset, in which both the values of Input and
Output variables are given.
• Task is to learn a function to predict the value of Y for a given X.
• In regression Y is always continuous.
By using best fitting straight line
Independent and Dependent Variables
• Here ‘X’ is the independent
variable and ‘Y’ is the dependent
variable, whose value needs to be
predicted.
• For e.g.: Time may be X and the
Price of the product may be Y,
whose value changes with the time
variable X.
• The dependent variable is always
continuous.
• As in this case, the price of the
product is continuous as it can
have finite continuous values.
Positive and Negative slope
Linear Regression Line
Linear Regression Line
Understanding linear regression line
Understanding Linear Regression Algorithm
Understanding Linear Regression Algorithm
Logistic regression
• It analyzes relationships between variables.
• It assigns probabilities to discrete outcomes using the Sigmoid function, which
converts numerical results into an expression of probability between 0 and 1.0.
• Probability is either 0 or 1, depending on whether the event happens or not.
• For binary predictions, you can divide the population into two groups with a cut-off
of 0.5.
• Everything above 0.5 is considered to belong to group A, and everything below is
considered to belong to group B.
How Does the Logistic Regression Algorithm Work?
An organization wants to determine an employee’s salary increase based on their
performance.
For this purpose, a linear regression algorithm will help them decide.
Plotting a regression line by considering the
employee’s performance as the
independent variable, and the salary
increase as the dependent variable
will make their task easier.
What if the organization wants to know whether an employee would get a promotion or not based
on their performance?
Bayes Theorem

Prerequisite of Bayes Theorem


1. Experiment
An experiment is defined as the planned operation carried out under controlled
conditions such as tossing a coin, drawing a card and rolling a dice, etc.
2. Sample Space
During an experiment what we get as a result is called as possible outcomes and the
set of all possible outcomes of an event is known as sample space.
S1 = {1, 2, 3, 4, 5, 6}
S2 = {Head, Tail}
3. An event
An event is defined as a subset of sample space in an experiment.
Further, it is also called a set of outcomes.
Bayes Theorem
Consider the following example of tossing two coins.
If we toss two coins and look at all the different possibilities, we have the sample space
as {HH, HT, TH, TT}
Probability (X)= Number of favorable outcomes / Total number of possible outcomes
While calculating the math on probability, we usually denote probability as P. Some of
the probabilities in this event would be as follows:
•The probability of getting two heads = 1/4
•The probability of at least one tail = 3/4
•The probability of the second coin being head given the first coin is tail = 1/2
•The probability of getting two heads given the first coin is a head = 1/2
Bayes Theorem
Assume in our experiment of rolling a dice, there are two event A and B such that;
A = Event when an even number is obtained = {2, 4, 6}
B = Event when a number is greater than 4 = {5, 6}
Probability of the event A ''P(A)''= Number of favorable outcomes / Total number of
possible outcomes
P(E) = 3/6 =1/2 =0.5
Probability of the event B ''P(B)''= Number of favorable outcomes / Total number of
possible outcomes
=2/6=1/3=0.333
Bayes Theorem
Conditional Probability:
Conditional probability is defined as the probability of an event A, given that another
event B has already occurred (i.e. A conditional B). This is represented by P(A|B) and
we can define it as:
P(A|B) = P(A ∩ B) / P(B)
Naïve Bayes Classifier
Based on the Bayes theorem, the Naive Bayes Classifier gives the conditional probability of an event A given
event B.

Suppose we have a dataset of weather conditions and the corresponding target variable "Play".
So using this dataset we need to decide whether we should play or not on a particular day according to the
weather conditions.

So to solve this problem, we need to follow the below steps:

1.Convert the given dataset into frequency tables.


2.Generate a Likelihood table by finding the probabilities of given features.
3.Now, use the Bayes theorem to calculate the posterior probability.
Problem: If the weather is sunny, then the Player should play or not?
DataSet
Frequency table for the Weather Conditions:
Weather Yes No
Overcast 5 0
Rainy 2 2
Sunny 3 2
Total 10 4
Likelihood table weather condition:

Weather No Yes
Overcast 0 5 5/14= 0.35
Rainy 2 2 4/14=0.29
Sunny 2 3 5/14=0.35
All 4/14=0.29 10/14=0.71
Applying Bayes’ theorem:
P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)
P(Sunny|Yes)= 3/10= 0.3
P(Sunny)= 0.35
P(Yes)=0.71
So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60
P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)
P(Sunny|NO)= 2/4=0.5
P(No)= 0.29
P(Sunny)= 0.35
So P(No|Sunny)= 0.5*0.29/0.35 = 0.41
So as we can see from the above calculation that
P(Yes|Sunny)>P(No|Sunny)
Hence on a Sunny day, Player can play the game.
Bayesian Belief Networks
• Bayesian Belief Network is a graphical representation of different
probabilistic relationships among random variables.
• Bayesian Belief Network is a “Probabilistic Graphical Model” that
represents “Conditional Dependencies” between random variable
through a Directed Acyclic Graph (DAG).
• The probability in Bayesian Belief Network is derived based on a
condition: P(attribute/parent)
(Probability of an attribute, true over the parent attribute)
• Bayesian Belief Network is a classifier with no dependencies oßn attributes
i.e it is condition independent.
• The graph of BBN consists of nodes (variables) and arcs (Causal
Relationaship).
Bayesian Belief Networks
The BBN helps in modelling and reasoning capabilities about the
uncertainties hidden between these random variables with the help of
the dependencies captured via arcs in DAG.
Weather

Health Rainy
Umbrella
Sales
Tea
Green
Leaves
Bayesian Belief Networks
• BBN works on the Joint and Conditional Probability.
• Joint Probability is given as:
P(X1,X2,..,Xn)=∏P(Xi |Parents(Xi))
i=1,..n

• where P(Xi |Parents(Xi)) means the probability of each feature


with respect to its parent.
BBN
Find the probability that ‘P1 (JohnCalls)’ is true ,‘P2 (MaryCalls)’ is
true when the alarm ‘A’ rang, but no burglary ‘B’ and fire ‘F’ has
occurred.
Support Vector Machine
• SVM is a supervised machine learning algorithm, used to solve
regression and (mainly) classification problem statements.
• In ML, Vectors means the training examples or training data with
the help of which a classifier is constructed.
• In SVM, the subset of training data is used to represent the decision
boundary.
• The objective of the Support Vector Machine algorithm is to find
a hyperplane in an N-dimensional space (where N — the number of
features) that distinctly classifies the data points.
• To separate the two classes of data points, there are many possible hyperplanes
that could be chosen.
• Margin − defined as the gap between two lines on the closet data points of different
classes. It can be calculated as the perpendicular distance from the line to the support
vectors. Large margin is considered as a good margin and small margin is considered as a
bad margin.
• The objective is to find a plane that has the maximum margin, i.e. the maximum
distance between data points of both classes.
Hyperplanes and Support Vectors
• Hyperplanes are decision boundaries that help classify the data
points. Data points falling on either side of the hyperplane can be
attributed to different classes.
• The dimension of the hyperplane depends upon the number of
features.
• If the number of input features is 2, then the hyperplane is just a
line. If the number of input features is 3, then the hyperplane
becomes a two-dimensional plane.
Hyperplanes in 2D and 3D feature space
Support Vectors
• Support vectors are data points that are closer to the hyperplane and
influence the position and orientation of the hyperplane.
• Using these support vectors, we maximize the margin of the
classifier.
• Deleting the support vectors will change the position of the
hyperplane.
• These are the points that help us build our SVM.
SVM Slope/ Line equation
Straight Line equation is y=mx+c
Or ax+by+c=0
Or y=wx+b
and in case of SVM, we have this equation as y=wTx +b
And taking the equation of a straight line,
by =-ax-c. i.e: y= -ax/b – c/b
Where –a/b is the slope and, -c/b is the intercept.
Linear Kernel

If there are two kernels named x1 and x2, the linear kernel can be defined by the dot product of the
two vectors: K(x1, x2) = x1 . X2
Polynomial Kernel
• We can define a polynomial kernel with this equation:
K(x1, x2) = (x1 . x2 + 1)d

• Here, x1 and x2 are vectors and d represents the degree of the


polynomial.
• K(x1, x2) represents the decision boundary to separate the given classes.
Gaussian Kernel
• The Gaussian kernel is an example of a radial basis function
kernel.
• It can be represented with this equation:

• It is used when there is no prior knowledge of the data.


• The value of gamma varies from 0 to 1.
• We must provide the value of gamma in the code manually.
• The most preferred value for gamma is 0.1.
7

You might also like