0% found this document useful (0 votes)

30 views42 pages

Module 3

The document covers various aspects of learning theory, including types of learning, computational learning theory, and the design of learning systems. It discusses similarity-based learning techniques, regression analysis, and algorithms such as Find-S and Candidate Elimination. The content is structured into chapters that provide foundational knowledge and methodologies in artificial intelligence and machine learning.

Uploaded by

Mukunda T

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views42 pages

Module 3

Uploaded by

Mukunda T

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 42

21CS54-AI/ML Prof.

Thameeza

MODULE 3:

CHAPTER 3
3. Basics of Learning Theory
3.1 Introduction to Learning and its Types
3.2 Introduction to Computation Learning Theory
3.3 Design of a Learning System
3.4 Introduction to Concept Learning
3.4.1 Representation of a Hypothesis
3.4.2 Hypothesis Space
3.4.3 Heuristic Space Search
3.4.4 Generalization and Specialization
3.4.5 Hypothesis Space Search by Find-S Algorithm
3.4.6 Version Spaces

CHAPTER 4
4. Similarity-based Learning
4.1 Introduction to Similarity or Instance-based Learning
4.1.1 Differences Between Instance and Model-based Learning
4.2 Nearest-Neighbor Learning
4.3 Weighted K-Nearest-Neighbor Algorithm
4.4 Nearest Centroid Classifier
4.5 Locally Weighted Regression (LWR)

CHAPTER 5
5. Regression Analysis
5.1 Introduction to Regression
5.2 Introduction to Linearity, Correlation, and Causation
5.3 Introduction to Linear Regression
5.4 Validation of Regression Methods

Altedegree.co 1
m
21CS54-AI/ML Prof. Thameeza

2
CHAPTER 3
BASICS OF LEARNING THEORY

INTRODUCTION TO LEARNING AND ITS TYPES

 Learning is a process by which one can acquire knowledge and construct new ideas
or concepts based on the experiences.

 The standard definition of learning proposed by Tom Mitchell is that a program

can learn from E for the task T, and P improves with experience E.
 There are two kinds of problems – well-posed and ill-posed.
 Computers can solve only well- posed problems, as these have well-
defined specifications and have the following components inherent to it.
1. Class of learning tasks (T)
2. A measure of performance (P)
3. A source of experience (E)
 Let x- input, χ-input space, Y –is the output space. Which is the set of all
possible outputs, that is yes/no,
 Let D –dataset for n inputs. Consider, target function be: χ-> Y , that maps input to output.
 Objective: To pick a function, g: χ-> Y to appropriate hypothesis f.

Fig: Learning Environment

Learning model= Hypothesis set + Learning algorithm

 Let us assume a problem of predicting a label for a given input data. Let D be the input
dataset with both positive and negative examples. Let y be the output with class 0 or 1.
The simple learning model can be given as:

Altedegree.com
21CS54-AI/ML Prof. Thameeza

 This can be put into a single equation as follows

 This is called perception learning algorithm.

Classical and Adaptive ML systems. 3

 Classic machines examine data inputs according to a predetermined set of rules, finding
patterns and relationships that can be used to generate predictions or choices. Support
vector machines, decision trees, and logistic regression are some of the most used
classical machine- learning techniques.

 A class of machine learning techniques called adaptive machines, commonly

referred to as adaptive or deep learning, is created to automatically learn from data
inputs without being explicitly programmed.

 By learning hierarchical representations of the input, these algorithms are able to handle
more complex and unstructured data, such as photos, videos, and natural language.

 Adaptive ML is the next generation of traditional ML – the new, the improved, the better.
Even though traditional ML witnessed significant progress.

Learning Types
 There are different types of learning. Some of the different learning methods are
as follows:
1. Learn by memorization or learn by repetition also called as rote learning is done
by memorizing without understanding the logic or concept.
2. Learn by examples also called as learn by experience or previous knowledge acquired
at some time, is like finding an analogy, which means performing inductive learning from
observations that formulate a general concept.

Altedegree.co
m
21CS54-AI/ML Prof. Thameeza

3. Learn by being taught by an expert or a teacher, generally called as passive learning.

However, there is a special kind of learning called active learning where the learner car
interactively query a teacher/expert to label unlabelled data instances with the desired
outputs.

4. Learning by critical thinking, also called as deductive learning, deduces new facts
or conclusion from related known facts and information.

5. Self-learning, also called as reinforcement learning, is a self-directed learning that

normally learns from mistakes punishments and rewards.

6. Learning to solve problems is a type of cognitive learning where learning happens in the
mind and is possible by devising a methodology to achieve a goal. Here, the learner initially
is not aware of the solution or the way to achieve the goal but only knows the goal. The
learning happens either directly from the initial state by following the steps to achieve the
goal or indirectly by inferring the behavior.

7. Learning by generalizing explanations, also called as explanation-based learning (EBL), is

another learning method that exploits domain knowledge from experts to improve the
accuracy of learned concepts by supervised learning

INTRODUCTION TO COMPUTATION LEARNING THEORY

 There are many questions that have been raised by mathematicians and logicians over
the time taken by computers to learn. Some of the questions are as follows:

1. How can a learning system predict an unseen instance?

2. How do the hypothesis h is close to f, when hypothesis f itself is unknown?
3. How many samples are required?
4. Can we measure the performance of a learning system?
5. Is the solution obtained local or global?

 These questions are the basis of a field called 'Computational Learning Theory or in
short (COLT).
 Computational Learning Theory (COLT) is like a guide for understanding how
computers learn.
 It deals with questions such as how computers predict new things, measure their
learning performance, and handle unknown information.
 COLT has two key parts: Probably Approximate Learning (PAC), which looks at how
hard learning tasks are, and Vapnik-Chervonenkis (VC) dimensions, which focus on
computational capacity.
 In simpler terms, COLT helps us figure out how computers learn by breaking down
the process and measuring their abilities, using concepts from computer science,
artificial intelligence, and statistics.

DESIGN OF A LEARNING SYSTEM

 A system that is built around a learning algorithm is called a learning system. The
Altedegree.co
m
21CS54-AI/ML Prof. Thameeza
design of systems focuses on these steps:

Altedegree.co
m
21CS54-AI/ML Prof. Thameeza

1. Choosing a Training Experience

 Let us consider designing of a chess game.
 In direct experience, individual board states and correct moves of the chess game
are given directly.
 In indirect system, the move sequences and results are only given. The training
experience also depends on the presence of a supervisor who can label all valid moves for
a board state.
 In the absence of a supervisor, the game agent plays against itself and learns the good
moves, if the training samples cover all scenarios, or in other words, distributed enough
for performance computation.
 If the training samples and testing samples have the same distribution, the results would
be good.

2. Determine the Target Function

 The next step is the determination of a target function. In this step, the type of
knowledge that needs to be learnt is determined.
 In direct experience, a board move is selected and is determined whether it is a good
move or not against all other moves.
 If it is the best move, then it is chosen as:
B -> M,
 Where, B and M are legal moves. In indirect experience, all legal moves are accepted and
a score is generated for each. The move with largest score is then chosen and executed.

3. Determine the Target Function Representation

 The representation of knowledge may be a table, collection of rules or a neural
network. The linear combination of these factors can be coined as:
V = w0 + w1x1+ w2x2 + w3x3
 Where, x1, x2 & and x3 represent different board features and w0,w1, w2, and
w3 represent weights.

4. Choosing an Approximation Algorithm for the Target Function

 The focus is to choose weights and fit the given training samples effectively. The aim is
to reduce the error given as:

Altedegree.co
m
21CS54-AI/ML Prof. Thameeza

INTRODUCTION TO CONCEPT LEARNING

3.1.1 Representation of a Hypothesis

 For Example, (Tail = Short)^(Color=Black)……

 The set of hypothesis in the search space is called as hypothesis. Hypotheses are the
plural form of hypothesis.
 Generally ‘H’ is used to represent the hypothesis and ‘h’ is used to represent a
candidate hypothesis.
 “?” denotes that the attribute can take any value
 Null indicates that the attribute cannot take any value
 Single value denotes a specific single value from acceptable values of the attribute

Altedegree.co
m
21CS54-AI/ML Prof. Thameeza

3.1.2 Hypothesis Space

 Hypothesis space is the set of all possible hypotheses that approximates the target function
f.

 The subset of hypothesis space that is consistent with all-observed training instances is
called as Version Space.

3.1.3 Heuristic Space Search

 Heuristic search is a search strategy that finds an optimized hypothesis/solution to a

problem by iteratively improving the hypothesis/solution based on a given heuristic
function or a cost measure.

 Heuristic search methods will generate a possible hypothesis that can be a solution in
the hypothesis space or a path from the initial state

 Several commonly used heuristic search methods are hill climbing methods, constraint
satisfaction problems, best-first search, simulated-annealing, A * algorithm, and
genetic algorithms.

Generalization and Specialization

Searching the hypothesis space

 There are two ways of learning the hypothesis, consistent with all training instances
from the large hypothesis space.
1 .Specialization – General to Specific learning
2. Generalization – Specific to General learning

Altedegree.co
m 4
21CS54-AI/ML Prof. Thameeza

Generalization – Specific to General Learning 5

 This learning methodology will search through the hypothesis space for an
approximate hypothesis by generalizing the most specific hypothesis.

Specialization – General to Specific Learning

 This learning methodology will search through the hypothesis space for an approximate
hypothesis by specializing the most general hypothesis.

Altedegree.co
m
21CS54-AI/ML Prof. Thameeza

Hypothesis Space Search by Find-S Algorithm

 Find-S algorithm is guaranteed to converge to the most specific hypothesis in H that is
consistent with the positive instances in the training dataset. Obviously, it will also be
consistent with the negative instances.
 Thus, this algorithm considers only the positive instances and eliminates negative
instances while generating the hypothesis. It initially starts with the most specific
hypothesis. Altedegree.co
m
21CS54-AI/ML Prof. Thameeza

Limitations of Find-S Algorithm

Example: Consider the training dataset of 4 instances shown in table 3.2 , It contains the
details of the performance of students and their likelihood of getting a job offer or not in their
final semester. Apply the Find-S Algorithm

Altedegree.co
m
21CS54-AI/ML Prof. Thameeza

Version Spaces

The version space contains the subset of hypotheses from the hypothesis space that is consistent
with all training instances in the training dataset.

List-Then Eliminate Algorithm

 The principle idea of this learning algorithm is to initialize the version space to contain
all hypotheses and then eliminate any hypothesis that is found inconsistent with any
training instances.

 Initially, the algorithm starts with a version space to contain all hypotheses scanning
each training instance.

 The hypotheses that are inconsistent with the training instance are eliminated. Finally,
the algorithm outputs the list of remaining hypotheses that are all consistent.

 This algorithm works fine if the hypothesis space is finite but practically it is difficult
to deploy this algorithm. Hence, a variation of this idea is introduced in the Candidate
Elimination algorithm.

Altedegree.co
m
21CS54-AI/ML Prof. Thamee za

Candidate Elimination Algorithm

Generating Positive Hypothesis 'S'

 If it is a positive example, refine S to include the positive instance. We need to

generalize S to include the positive instance.
 The hypothesis is the conjunction of 'S and positive instance.
 When generalizing, for the first positive instance, add to S all minimal generalizations
such that S is filled with attribute values of the positive instance.
 For the subsequent positive instances scanned, check the attribute value of the
positive instance and S obtained in the previous iteration.
 If the attribute values of positive instance and S are different, fill that field value with a
'??. If the attribute values of positive instance and S are same, no change is required. If it
is a negative instance, it skips.

Generating Negative Hypothesis 'G'

 If it is a negative instance, refine G to exclude the negative instance. Then, prune G

to exclude all inconsistent hypotheses in G with the positive instance.

 The idea is to add to G all minimal specializations to exclude the negative instance and
be consistent with the positive instance. Negative hypothesis indicates general
hypothesis.

Altedegree.co
m
21CS54-AI/ML Prof. Thameeza

 If the attribute values of positive and negative instances are different, then fill that field
with positive instance value so that the hypothesis does not classify that negative
instance as true. If the attribute values of positive and negative instances are same, then
no need to update 'G' and fill that attribute value with a '?'.

Generating Version Space -

 We need to take the combination of sets in 'G' and check that with 'S'.

 When the combined set fields are matched with fields in 'S', ther only that is included
in the version space as consistent hypothesis.

Altedegree.co
m
21CS54-AI/ML Prof. Thameeza

The diagrammatic representation of deriving the version space is shown below:

Altedegree.co
m
21CS54-AI/ML Prof. Thameeza

CHAPTER 4

SIMILARITY-BASED LEARNING

Similarity or Instance-based Learning

 It’s a supervised learning technique that predicts the class label of a test instance
by gauging the similarity of this test instance with training instances.
 Similarity learning is a branch of machine learning that focuses on training models
to recognize the similarity or dissimilarity between data points.
 Its also called instance based learning/just in time learning
 This learning mechanism simply stores all data and uses it only when it needs to
classify an unseen instance
 Advantage: processing occurs only when a request to classify an unseen instance is given.
 Drawback: it requires large memory to store the data since a model is not
constructed initially with the training data
 Several distance metrics are used to estimate the similarity or dissimilarity
between instances required for clustering, nearest neighbor classification etc.
 Popular distance metrics used are hamming distance, Euclidean distance,
manhattan distance etc.

4.1.1 Difference between Instance-and Model-based Learning

Some examples of Instance-based Learning algorithms are:

a) KNN
b) Variants of KNN
c) Locally weighted regression
d) Learning vector quantization
e) Self-organizing maps
f) RBF networks

Altedegree.co
m
21CS54-AI/ML Prof. Thameeza

Nearest-Neighbor Learning

 A powerful classification algorithm used in pattern recognition.

 K nearest neighbors stores all available cases and classifies new cases based on
a similarity measure (e.g distance function)
 One of the top data mining algorithms used today.
 A non-parametric lazy learning algorithm (An Instance based Learning method).

 Used for both classification and regression problem:

Here, 2 classes of objects called C 1

and C2. When given a test instance
T, the category of this test instance is
determined by looking at the class of
k=3 nearest neighbors. Thus, the
class of this test instance T is
predicted as C2.

Altedegree.co
m
21CS54-AI/ML Prof. Thameeza

Altedegree.co
m
21CS54-AI/ML Prof. Thameeza
3

 Data normalization/standardization is required when data (features) have different ranges

or a wider range of possible values when computing distances and to transform all
features to a specific range.
 This is probably done to eliminate the influence of one feature over another (i.e., to give
all features equal chances).

Weighted k-Nearest-Neighbor Algorithm

 The Weighted k-NN is an extension of k-NN.
 It chooses the neighbors by using the weighted distance.
 The k-Nearest Neighbor (k-NN) algorithm has some serious limitations as its
performance is solely dependent on choosing the k nearest neighbors, the distance metric
used and the decision rule.
 However, the principle idea of Weighted k-NN is that k closest neighbors to the test
instance are assigned a higher weight in the decision as compared to neighbors that are
farther away from the test instance. The idea is that weights are inversely proportional
to distances.
 The selected k nearest neighbors can be assigned uniform weights, which means all
the instances in each neighborhood are weighted equally or weights can be assigned
by the inverse of their distance. In the second case, closer neighbors of a query point
will have a greater influe

Altedegree.co
m
21CS54-AI/ML Prof. Thameeza

Problem
Consider the same training dataset given in Table 4.1. Use Weighted k-NN and determine the
class.

Step 2: Sort the distances in the ascending order and select the first 3 nearest
training data instances to the test instance. The selected nearest neighbors are
shown in Table 4.6.

Altedegree.co
m
21CS54-AI/ML Prof. Thameeza

Nearest Centroid Classifier

The Nearest Centroids algorithm assumes that the centroids in the input feature space are
different for each target label. The training data is split into groups by class label, then the
centroid for each group of data is calculated. Each centroid is simply the mean value of each
of the input variables, so it is also called as Mean Difference classifier. If there are two
classes, then two centroids or points are calculated; three classes give three centroids, and so
on.

Example: Consider the sample data shown in Table 4.9 with two features r and y. The target
Classes are 'A' or 'B'. Predict the class using Nearest Centroid Classifier.

Altedegree.co
m
21CS54-AI/ML Prof. Thameeza

4.2 Locally Weighted Regression (LWR)

 Locally Weighted Regression (LWR) is a non-parametric supervised learning

algorithm that performs local regression by combining regression model with nearest
neighbor's model.
 LWR is also referred to as a memory-based method as it requires training data while
prediction but uses only the training data instances locally around the point of
interest.
 Using nearest neighbors algorithm, we find the instances that are closest to a test
instance and fit linear function to each of those 'K nearest instances in the local
regression model. The key idea is that we need to approximate the linear functions of all
'K neighbors that minimize the error such that the prediction line is no more linear but
rather it is a curve.
 Ordinary linear regression finds out a linear relationship between the input x and the
output y.
 Given training dataset I,

Altedegree.co
m
21CS54-AI/ML Prof. Thameeza

ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING (21CS54)

CHAPTER 5
REGRESSION ANALYSIS
5.1 Introduction to Regression

Regression analysis is a fundamental concept that consists of a set of machine learning methods
that predict a continuous outcome variable (y) based on the value of one or multiple predictor
variables (x).
OR

Regression analysis is a statistical method to model the relationship between a dependent

(target) and independent (predictor) variables with one or more independent variables.
Regression is a supervised learning technique which helps in finding the correlation between
variables.
It is mainly used for prediction, forecasting, time series modelling, and determining the causal-
effect relationship between variables.
Regression shows a line or curve that passes through all the datapoints on target-predictor
graph in such a way that the vertical distance between the datapoints and the regression line
is minimum." The distance between datapoints and line tells whether a model has captured a
strong relationship or not.
• Function of regression analysis is given

by: Y=f(x)
Here, y is called dependent variable and x is called independent variable.

Applications of Regression Analysis

 Sales of a goods or services
 Value of bonds in portfolio management
 Premium on insurance componies
 Yield of crop in agriculture
 Prices of real estate

5.2 INTRODUCTION TO LINEARITY, CORRELATION AND CAUSATION

A correlation is the statistical summary of the relationship between two sets of variables. It is
a core part of data exploratory analysis, and is a critical aspect of numerous advanced
machine learning techniques.

Correlation between two variables can be found using a scatter plot

There are different types of correlation:

Altedegree.co
m
21CS54-AI/ML Prof. Thameeza

ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING (21CS54)

Positive Correlation: Two variables are said to be positively correlated when their values
move in the same direction. For example, in the image below, as the value for X increases,
so does the value for Y at a constant rate.
Negative Correlation: Finally, variables X and Y will be negatively correlated when their
values change in opposite directions, so here as the value for X increases, the value for Y
decreases at a constant rate.
Neutral Correlation: No relationship in the change of variables X and Y. In this case, the
values are completely random and do not show any sign of correlation, as shown in the
following image:

Causation
Causation is about relationship between two variables as x causes y. This is called x implies
b. Regression is different from causation. Causation indicates that one event is the result of
the occurrence of the other event; i.e. there is a causal relationship between the two events.

Linear and Non-Linear Relationships

The relationship between input features (variables) and the output (target) variable is
fundamental. These concepts have significant implications for the choice of algorithms,
model complexity, and predictive performance.
Linear relationship creates a straight line when plotted on a graph, a Non-Linear relationship
does not create a straight line but instead creates a curve.
Example:

Linear-the relationship between the hours spent studying and the grades obtained in a
class. Non-Linear-

Linearity:
Linear Relationship: A linear relationship between variables means that a change in one
variable is associated with a proportional change in another variable. Mathematically, it can
be represented as y = a * x + b, where y is the output, x is the input, and a and b are
constants.

Altedegree.co
m
21CS54-AI/ML Prof. Thameeza

ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING (21CS54)

Linear Models: Goal is to find the best-fitting line (plane in higher dimensions) to the data
points. Linear models are interpretable and work well when the relationship between
variables is close to being linear.
Limitations: Linear models may perform poorly when the relationship between variables is
non-linear. In such cases, they may underfit the data, meaning they are too simple to capture
the underlying patterns.

Non-Linearity:
Non-Linear Relationship: A non-linear relationship implies that the change in one variable
is not proportional to the change in another variable. Non-linear relationships can take
various forms, such as quadratic, exponential, logarithmic, or arbitrary shapes.
Non-Linear Models: Machine learning models like decision trees, random forests, support
vector machines with non-linear kernels, and neural networks can capture non-linear
relationships. These models are more flexible and can fit complex data patterns.
Benefits: Non-linear models can perform well when the underlying relationships in the data
are complex or when interactions between variables are non-linear. They have the capacity to
capture intricate patterns.

Types of Regression

Altedegree.co
m
21CS54-AI/ML Prof. Thameeza

ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING (21CS54)

Linear Regression:
Single Independent Variable: Linear regression, also known as simple linear regression, is
used when there is a single independent variable (predictor) and one dependent variable
(target).
Equation: The linear regression equation takes the form: Y = β0 + β1X + ε, where Y is the
dependent variable, X is the independent variable, β0 is the intercept, β1 is the slope
(coefficient), and ε is the error term.

Purpose: Linear regression is used to establish a linear relationship between two variables
and make predictions based on this relationship. It's suitable for simple scenarios where there's
only one predictor.

Multiple Regression:
Multiple Independent Variables: Multiple regression, as the name suggests, is used when
there are two or more independent variables (predictors) and one dependent variable (target).
Equation: The multiple regression equation extends the concept to multiple predictors: Y = β0
+ β1X1 + β2X2 + ... + βnXn + ε, where Y is the dependent variable, X1, X2, ..., Xn are the
independent variables, β0 is the intercept, β1, β2, ..., βn are the coefficients, and ε is the error
term.
Purpose: Multiple regression allows you to model the relationship between the dependent
variable and multiple predictors simultaneously. It's used when there are multiple factors that
may influence the target variable, and you want to understand their combined effect and
make predictions based on all these factors.

Polynomial Regression:
Use: Polynomial regression is an extension of multiple regression used when the relationship
between the independent and dependent variables is non-linear.
Equation: The polynomial regression equation allows for higher-order terms, such as quadratic
or cubic terms: Y = β0 + β1X + β2X^2 + ... + βnX^n + ε. This allows the model to fit a curve
rather than a straight line.

Logistic Regression:
Use: Logistic regression is used when the dependent variable is binary (0 or 1). It models the
probability of the dependent variable belonging to a particular class.
Equation: Logistic regression uses the logistic function (sigmoid function) to model
probabilities: P(Y=1) = 1 / (1 + e^(-z)), where z is a linear combination of the independent
variables: z = β0 + β1X1 + β2X2 + ... + βnXn. It transforms this probability into a binary
outcome.

Lasso Regression (L1 Regularization):

Use: Lasso regression is used for feature selection and regularization. It penalizes the
absolute values of the coefficients, which encourages sparsity in the model.

Altedegree.co
m
21CS54-AI/ML Prof. Thameeza

ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING (21CS54)

Objective Function: Lasso regression adds an L1 penalty to the linear regression loss
function: Lasso = RSS + λΣ|βi|, where RSS is the residual sum of squares, λ is the regularization
strength, and |βi| represents the absolute values of the coefficients.

Ridge Regression (L2 Regularization):

Use: Ridge regression is used for regularization to prevent overfitting in multiple regression.
It penalizes the square of the coefficients.
Objective Function: Ridge regression adds an L2 penalty to the linear regression loss
function: Ridge = RSS + λΣ(βi^2), where RSS is the residual sum of squares, λ is the
regularization strength, and (βi^2) represents the square of the coefficients.

Limitations of Regression

5.3 INTRODUCTION TO LINEAR REGRESSION

Linear regression model can be created by fitting a line among the scattered data points. The
line is of the form:

Altedegree.co
m
21CS54-AI/ML Prof. Thameeza

ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING (21CS54)

Ordinary Least Square Approach

The ordinary least squares (OLS) algorithm is a method for estimating the parameters of a
linear regression model. Aim: To find the values of the linear regression model's parameters
(i.e., the coefficients) that minimize the sum of the squared residuals.
In mathematical terms, this can be written as: Minimize ∑(yi – ŷi)^2

where yi is the actual value, ŷi is the predicted value.

A linear regression model used for determining the value of the response variable, ŷ, can be
represented as the following equation.
y = b0 + b1x1 + b2x2 + … + bnxn + e
 where: y - is the dependent variable, b0 is the intercept, e
is the error term
 b1, b2, …, bn are the coefficients of the independent
variables x1, x2, …, xn
The coefficients b1, b2, …, bn can also be called the
coefficients of determination. The goal of the OLS method can
be used to estimate the unknown parameters (b1, b2, …, bn) by
minimizing the sum of squared residuals (RSS). The sum of
squared residuals is also termed the sum of squared error (SSE).
This method is also known as the least-squares method for regression or linear regression.
Mathematically the line of equations for points are:
y1=(a0+a1x1)+e1
y2=(a0+a1x2)+e2 and so
on
……. yn=(a0+a1xn)+en.

In general ei=yi - (a0+a1x1)

Altedegree.co
m
21CS54-AI/ML Prof. Thameeza

ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING (21CS54)

Linear Regression Example

Altedegree.co
m
21CS54-AI/ML Prof. Thameeza

ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING (21CS54)

Altedegree.co
m
21CS54-AI/ML Prof. Thameeza

ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING (21CS54)

Linear Regression in Matrix Form

Altedegree.co
m
21CS54-AI/ML Prof. Thameeza

ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING (21CS54)

5.4 VALIDATION OF REGRESSION METHODS

The regression should be evaluated using some metrics for checking the correctness. The
following metrics are used to validate the results of regression.

Altedegree.co
m
21CS54-AI/ML Prof. Thameeza

ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING (21CS54)

Coefficient of Determination
The coefficient of determination (R² or r-squared) is a statistical measure in a regression
model that determines the proportion of variance in the dependent variable that can be
explained by the independent variable.

The sum of the squares of the differences between the y-value of the data pair and the
average of y is called total variation. Thus, the following variation can be defined as,
The explained variation is given by, =∑( Ŷi – mean(Yi))2
The unexplained variation is given by, =∑( Yi - Ŷi )2
Thus, the total variation is equal to the explained variation and the unexplained variation.
The coefficient of determination r2 is the ratio of the explained and unexplained
variations.

Altedegree.co
m
21CS54-AI/ML Prof. Thameeza

Consider the following training set Table 5.4 for predicting the sales of the items.

Altedegree.co
m
21CS54-AI/ML Prof. Thameeza

THANK YOU

Altedegree.co
m

Module 3
No ratings yet
Module 3
41 pages
AIML Module-03
No ratings yet
AIML Module-03
40 pages
AIML Module - 03
No ratings yet
AIML Module - 03
34 pages
Module 03
No ratings yet
Module 03
54 pages
AIML Module - 03 21CS4
No ratings yet
AIML Module - 03 21CS4
34 pages
Aiml M3 C1
No ratings yet
Aiml M3 C1
59 pages
Intro to Machine Learning Basics
No ratings yet
Intro to Machine Learning Basics
134 pages
Intro to Learning Theory
No ratings yet
Intro to Learning Theory
35 pages
21CS54 Aiml Module3 PPT
No ratings yet
21CS54 Aiml Module3 PPT
102 pages
Mod3 - Learning Theory
No ratings yet
Mod3 - Learning Theory
10 pages
ML (Unit-1)
No ratings yet
ML (Unit-1)
17 pages
Module3 PPT
No ratings yet
Module3 PPT
132 pages
Module 3
No ratings yet
Module 3
70 pages
Machine Learning Basics for Beginners
No ratings yet
Machine Learning Basics for Beginners
63 pages
Ai and ML Module 3
No ratings yet
Ai and ML Module 3
12 pages
Unit 4
No ratings yet
Unit 4
20 pages
Learning
No ratings yet
Learning
16 pages
Machine Learning
No ratings yet
Machine Learning
99 pages
ML Unit 1 CS
100% (2)
ML Unit 1 CS
102 pages
Chapter - 5 Learning
No ratings yet
Chapter - 5 Learning
38 pages
21cs54 Aiml Module3
No ratings yet
21cs54 Aiml Module3
136 pages
Ai Unit5 Learning
No ratings yet
Ai Unit5 Learning
62 pages
1 Introduction To Machine Learning
No ratings yet
1 Introduction To Machine Learning
20 pages
Chapter 6 - Learning (Autosaved)
No ratings yet
Chapter 6 - Learning (Autosaved)
105 pages
Basics of Learning Theory
No ratings yet
Basics of Learning Theory
37 pages
Machine Learning Concise Notes
No ratings yet
Machine Learning Concise Notes
7 pages
Module 1
No ratings yet
Module 1
27 pages
Lesson 1 Introduction
No ratings yet
Lesson 1 Introduction
8 pages
CH 3 Concept Learning
No ratings yet
CH 3 Concept Learning
17 pages
01 Introduction ML
No ratings yet
01 Introduction ML
60 pages
ML 1
No ratings yet
ML 1
61 pages
ML Notes Unit 1-2
No ratings yet
ML Notes Unit 1-2
55 pages
ML RUSA Module 1 Intro
No ratings yet
ML RUSA Module 1 Intro
30 pages
Machine Learning for Students
No ratings yet
Machine Learning for Students
32 pages
Lab Manual - Anatomy-1
No ratings yet
Lab Manual - Anatomy-1
10 pages
MLT Unit 1 - Updated
No ratings yet
MLT Unit 1 - Updated
42 pages
Unit 4 Part1
No ratings yet
Unit 4 Part1
33 pages
Final UNIT-5-AI
No ratings yet
Final UNIT-5-AI
19 pages
ML Unit 1 Notes
No ratings yet
ML Unit 1 Notes
134 pages
ML Unit-I Chapter-I Introduction
No ratings yet
ML Unit-I Chapter-I Introduction
36 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
68 pages
Larning Introduction
No ratings yet
Larning Introduction
6 pages
Video Tutorial: Machine Learning 17CS73
100% (2)
Video Tutorial: Machine Learning 17CS73
27 pages
Unit 1
No ratings yet
Unit 1
6 pages
Unit 5 6 BTech Artificial Intelligence Notes 2 2 2 2
No ratings yet
Unit 5 6 BTech Artificial Intelligence Notes 2 2 2 2
20 pages
11 Learning
No ratings yet
11 Learning
25 pages
Unit 1
No ratings yet
Unit 1
98 pages
Last Time: - Web As A Graph - What Is Link Analysis
No ratings yet
Last Time: - Web As A Graph - What Is Link Analysis
78 pages
Machine Learning Overview
No ratings yet
Machine Learning Overview
54 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
20 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
79 pages
Ai&ml Unit 4
No ratings yet
Ai&ml Unit 4
21 pages
U1 - ML
No ratings yet
U1 - ML
5 pages
Ntroduction To Achine Earning 1.1 W M L ?
No ratings yet
Ntroduction To Achine Earning 1.1 W M L ?
19 pages
Unit-1-Introduction (Fundamentals of ML & AI) January 29, 2024
No ratings yet
Unit-1-Introduction (Fundamentals of ML & AI) January 29, 2024
80 pages
ML Unit-1
No ratings yet
ML Unit-1
70 pages
Montessori A Journey
No ratings yet
Montessori A Journey
51 pages
Working Notes - AB PSYCHOLOGY
No ratings yet
Working Notes - AB PSYCHOLOGY
3 pages
Maigret Hesitates Georges Simenon PDF Download
No ratings yet
Maigret Hesitates Georges Simenon PDF Download
39 pages
7 - Discussion - Process Cost System - Students
No ratings yet
7 - Discussion - Process Cost System - Students
2 pages
Classic Short Stories Collection
0% (1)
Classic Short Stories Collection
2 pages
Sap SD
No ratings yet
Sap SD
71 pages
Chapter 4 - Controversies in Philippine History
No ratings yet
Chapter 4 - Controversies in Philippine History
10 pages
Marin, J. M. (2009) - Mysticism in Quantum Mechanics The Forgotten Controversy. European Journal of Physics, 30 (4), 807. (c77)
100% (1)
Marin, J. M. (2009) - Mysticism in Quantum Mechanics The Forgotten Controversy. European Journal of Physics, 30 (4), 807. (c77)
17 pages
Can Excessive Use of Chemical Drugs Cause Anxiety - Google Search
No ratings yet
Can Excessive Use of Chemical Drugs Cause Anxiety - Google Search
1 page
Circuit Theory Lab Guide
No ratings yet
Circuit Theory Lab Guide
3 pages
Week 3 The Concept of Prophethood
No ratings yet
Week 3 The Concept of Prophethood
21 pages
Faculty List SRK University 2022 23
No ratings yet
Faculty List SRK University 2022 23
35 pages
Pearson - MYP Maths - Unit Plan - Sample - Y1 Ch7
No ratings yet
Pearson - MYP Maths - Unit Plan - Sample - Y1 Ch7
6 pages
07 Jan, 2025 Shift 2)
No ratings yet
07 Jan, 2025 Shift 2)
39 pages
Data Structures: Stacks, Queues & More
No ratings yet
Data Structures: Stacks, Queues & More
17 pages
Schedule For MCA+MSC Final Year Pro
No ratings yet
Schedule For MCA+MSC Final Year Pro
116 pages
COR5-20 Phantoms On Bright Sands
100% (1)
COR5-20 Phantoms On Bright Sands
52 pages
Stuart Goldsmith - Five Steps To Turbo Charge Your Dreams
No ratings yet
Stuart Goldsmith - Five Steps To Turbo Charge Your Dreams
3 pages
Mentoring and Sponsoring Keys To Success Maria Angela Capello PDF Version
100% (1)
Mentoring and Sponsoring Keys To Success Maria Angela Capello PDF Version
124 pages
Mte 09
No ratings yet
Mte 09
8 pages
The Education of Noble Girls in Medieval France - Jacobs-Pollez - Thesis
No ratings yet
The Education of Noble Girls in Medieval France - Jacobs-Pollez - Thesis
272 pages
Mock:BYJU'S Scholarship Test: Subject: Academic & Aptitude Time: 00:45 Hrs
No ratings yet
Mock:BYJU'S Scholarship Test: Subject: Academic & Aptitude Time: 00:45 Hrs
15 pages
L. J. Powell's New Book, "Vengeance at Dead Creek," Is A Gripping Novel That Follows A Sheriff's Deputy Turned Bounty Hunter As He Works To Save His Home From Destruction
No ratings yet
L. J. Powell's New Book, "Vengeance at Dead Creek," Is A Gripping Novel That Follows A Sheriff's Deputy Turned Bounty Hunter As He Works To Save His Home From Destruction
3 pages
War Crimes: AFP PNP
No ratings yet
War Crimes: AFP PNP
19 pages
Transliteration & Translation
No ratings yet
Transliteration & Translation
5 pages
P3 Big Practical Project Teacher Notes
No ratings yet
P3 Big Practical Project Teacher Notes
9 pages
Nursing Aptitude
0% (1)
Nursing Aptitude
2 pages
Ethical Leadership in Indonesian SOEs
No ratings yet
Ethical Leadership in Indonesian SOEs
24 pages
MODULE 1 LESSON 2 Understanding Morality and Values
No ratings yet
MODULE 1 LESSON 2 Understanding Morality and Values
21 pages
Understanding Circumstantial Evidence
No ratings yet
Understanding Circumstantial Evidence
2 pages