Module 3
Module 3
Thameeza
MODULE 3:
CHAPTER 3
3. Basics of Learning Theory
3.1 Introduction to Learning and its Types
3.2 Introduction to Computation Learning Theory
3.3 Design of a Learning System
3.4 Introduction to Concept Learning
3.4.1 Representation of a Hypothesis
3.4.2 Hypothesis Space
3.4.3 Heuristic Space Search
3.4.4 Generalization and Specialization
3.4.5 Hypothesis Space Search by Find-S Algorithm
3.4.6 Version Spaces
CHAPTER 4
4. Similarity-based Learning
4.1 Introduction to Similarity or Instance-based Learning
4.1.1 Differences Between Instance and Model-based Learning
4.2 Nearest-Neighbor Learning
4.3 Weighted K-Nearest-Neighbor Algorithm
4.4 Nearest Centroid Classifier
4.5 Locally Weighted Regression (LWR)
CHAPTER 5
5. Regression Analysis
5.1 Introduction to Regression
5.2 Introduction to Linearity, Correlation, and Causation
5.3 Introduction to Linear Regression
5.4 Validation of Regression Methods
Altedegree.co 1
m
21CS54-AI/ML Prof. Thameeza
2
CHAPTER 3
BASICS OF LEARNING THEORY
Let us assume a problem of predicting a label for a given input data. Let D be the input
dataset with both positive and negative examples. Let y be the output with class 0 or 1.
The simple learning model can be given as:
Altedegree.com
21CS54-AI/ML Prof. Thameeza
Classic machines examine data inputs according to a predetermined set of rules, finding
patterns and relationships that can be used to generate predictions or choices. Support
vector machines, decision trees, and logistic regression are some of the most used
classical machine- learning techniques.
By learning hierarchical representations of the input, these algorithms are able to handle
more complex and unstructured data, such as photos, videos, and natural language.
Adaptive ML is the next generation of traditional ML – the new, the improved, the better.
Even though traditional ML witnessed significant progress.
Learning Types
There are different types of learning. Some of the different learning methods are
as follows:
1. Learn by memorization or learn by repetition also called as rote learning is done
by memorizing without understanding the logic or concept.
2. Learn by examples also called as learn by experience or previous knowledge acquired
at some time, is like finding an analogy, which means performing inductive learning from
observations that formulate a general concept.
Altedegree.co
m
21CS54-AI/ML Prof. Thameeza
4. Learning by critical thinking, also called as deductive learning, deduces new facts
or conclusion from related known facts and information.
6. Learning to solve problems is a type of cognitive learning where learning happens in the
mind and is possible by devising a methodology to achieve a goal. Here, the learner initially
is not aware of the solution or the way to achieve the goal but only knows the goal. The
learning happens either directly from the initial state by following the steps to achieve the
goal or indirectly by inferring the behavior.
There are many questions that have been raised by mathematicians and logicians over
the time taken by computers to learn. Some of the questions are as follows:
These questions are the basis of a field called 'Computational Learning Theory or in
short (COLT).
Computational Learning Theory (COLT) is like a guide for understanding how
computers learn.
It deals with questions such as how computers predict new things, measure their
learning performance, and handle unknown information.
COLT has two key parts: Probably Approximate Learning (PAC), which looks at how
hard learning tasks are, and Vapnik-Chervonenkis (VC) dimensions, which focus on
computational capacity.
In simpler terms, COLT helps us figure out how computers learn by breaking down
the process and measuring their abilities, using concepts from computer science,
artificial intelligence, and statistics.
Altedegree.co
m
21CS54-AI/ML Prof. Thameeza
Altedegree.co
m
21CS54-AI/ML Prof. Thameeza
Altedegree.co
m
21CS54-AI/ML Prof. Thameeza
Hypothesis space is the set of all possible hypotheses that approximates the target function
f.
The subset of hypothesis space that is consistent with all-observed training instances is
called as Version Space.
Heuristic search methods will generate a possible hypothesis that can be a solution in
the hypothesis space or a path from the initial state
Several commonly used heuristic search methods are hill climbing methods, constraint
satisfaction problems, best-first search, simulated-annealing, A * algorithm, and
genetic algorithms.
There are two ways of learning the hypothesis, consistent with all training instances
from the large hypothesis space.
1 .Specialization – General to Specific learning
2. Generalization – Specific to General learning
Altedegree.co
m 4
21CS54-AI/ML Prof. Thameeza
This learning methodology will search through the hypothesis space for an
approximate hypothesis by generalizing the most specific hypothesis.
Altedegree.co
m
21CS54-AI/ML Prof. Thameeza
Example: Consider the training dataset of 4 instances shown in table 3.2 , It contains the
details of the performance of students and their likelihood of getting a job offer or not in their
final semester. Apply the Find-S Algorithm
Altedegree.co
m
21CS54-AI/ML Prof. Thameeza
Version Spaces
The version space contains the subset of hypotheses from the hypothesis space that is consistent
with all training instances in the training dataset.
The principle idea of this learning algorithm is to initialize the version space to contain
all hypotheses and then eliminate any hypothesis that is found inconsistent with any
training instances.
Initially, the algorithm starts with a version space to contain all hypotheses scanning
each training instance.
The hypotheses that are inconsistent with the training instance are eliminated. Finally,
the algorithm outputs the list of remaining hypotheses that are all consistent.
This algorithm works fine if the hypothesis space is finite but practically it is difficult
to deploy this algorithm. Hence, a variation of this idea is introduced in the Candidate
Elimination algorithm.
Altedegree.co
m
21CS54-AI/ML Prof. Thamee za
The idea is to add to G all minimal specializations to exclude the negative instance and
be consistent with the positive instance. Negative hypothesis indicates general
hypothesis.
Altedegree.co
m
21CS54-AI/ML Prof. Thameeza
If the attribute values of positive and negative instances are different, then fill that field
with positive instance value so that the hypothesis does not classify that negative
instance as true. If the attribute values of positive and negative instances are same, then
no need to update 'G' and fill that attribute value with a '?'.
We need to take the combination of sets in 'G' and check that with 'S'.
When the combined set fields are matched with fields in 'S', ther only that is included
in the version space as consistent hypothesis.
Altedegree.co
m
21CS54-AI/ML Prof. Thameeza
Altedegree.co
m
21CS54-AI/ML Prof. Thameeza
Altedegree.co
m
21CS54-AI/ML Prof. Thameeza
CHAPTER 4
SIMILARITY-BASED LEARNING
It’s a supervised learning technique that predicts the class label of a test instance
by gauging the similarity of this test instance with training instances.
Similarity learning is a branch of machine learning that focuses on training models
to recognize the similarity or dissimilarity between data points.
Its also called instance based learning/just in time learning
This learning mechanism simply stores all data and uses it only when it needs to
classify an unseen instance
Advantage: processing occurs only when a request to classify an unseen instance is given.
Drawback: it requires large memory to store the data since a model is not
constructed initially with the training data
Several distance metrics are used to estimate the similarity or dissimilarity
between instances required for clustering, nearest neighbor classification etc.
Popular distance metrics used are hamming distance, Euclidean distance,
manhattan distance etc.
a) KNN
b) Variants of KNN
c) Locally weighted regression
d) Learning vector quantization
e) Self-organizing maps
f) RBF networks
Altedegree.co
m
21CS54-AI/ML Prof. Thameeza
Nearest-Neighbor Learning
Altedegree.co
m
21CS54-AI/ML Prof. Thameeza
Altedegree.co
m
21CS54-AI/ML Prof. Thameeza
Altedegree.co
m
21CS54-AI/ML Prof. Thameeza
3
Altedegree.co
m
21CS54-AI/ML Prof. Thameeza
Problem
Consider the same training dataset given in Table 4.1. Use Weighted k-NN and determine the
class.
Step 2: Sort the distances in the ascending order and select the first 3 nearest
training data instances to the test instance. The selected nearest neighbors are
shown in Table 4.6.
Altedegree.co
m
21CS54-AI/ML Prof. Thameeza
Altedegree.co
m
21CS54-AI/ML Prof. Thameeza
Example: Consider the sample data shown in Table 4.9 with two features r and y. The target
Classes are 'A' or 'B'. Predict the class using Nearest Centroid Classifier.
Altedegree.co
m
21CS54-AI/ML Prof. Thameeza
Altedegree.co
m
21CS54-AI/ML Prof. Thameeza
Altedegree.co
m
21CS54-AI/ML Prof. Thameeza
Altedegree.co
m
21CS54-AI/ML Prof. Thameeza
Altedegree.co
m
21CS54-AI/ML Prof. Thameeza
CHAPTER 5
REGRESSION ANALYSIS
5.1 Introduction to Regression
Regression analysis is a fundamental concept that consists of a set of machine learning methods
that predict a continuous outcome variable (y) based on the value of one or multiple predictor
variables (x).
OR
by: Y=f(x)
Here, y is called dependent variable and x is called independent variable.
Altedegree.co
m
21CS54-AI/ML Prof. Thameeza
Positive Correlation: Two variables are said to be positively correlated when their values
move in the same direction. For example, in the image below, as the value for X increases,
so does the value for Y at a constant rate.
Negative Correlation: Finally, variables X and Y will be negatively correlated when their
values change in opposite directions, so here as the value for X increases, the value for Y
decreases at a constant rate.
Neutral Correlation: No relationship in the change of variables X and Y. In this case, the
values are completely random and do not show any sign of correlation, as shown in the
following image:
Causation
Causation is about relationship between two variables as x causes y. This is called x implies
b. Regression is different from causation. Causation indicates that one event is the result of
the occurrence of the other event; i.e. there is a causal relationship between the two events.
Linear-the relationship between the hours spent studying and the grades obtained in a
class. Non-Linear-
Linearity:
Linear Relationship: A linear relationship between variables means that a change in one
variable is associated with a proportional change in another variable. Mathematically, it can
be represented as y = a * x + b, where y is the output, x is the input, and a and b are
constants.
Altedegree.co
m
21CS54-AI/ML Prof. Thameeza
Linear Models: Goal is to find the best-fitting line (plane in higher dimensions) to the data
points. Linear models are interpretable and work well when the relationship between
variables is close to being linear.
Limitations: Linear models may perform poorly when the relationship between variables is
non-linear. In such cases, they may underfit the data, meaning they are too simple to capture
the underlying patterns.
Non-Linearity:
Non-Linear Relationship: A non-linear relationship implies that the change in one variable
is not proportional to the change in another variable. Non-linear relationships can take
various forms, such as quadratic, exponential, logarithmic, or arbitrary shapes.
Non-Linear Models: Machine learning models like decision trees, random forests, support
vector machines with non-linear kernels, and neural networks can capture non-linear
relationships. These models are more flexible and can fit complex data patterns.
Benefits: Non-linear models can perform well when the underlying relationships in the data
are complex or when interactions between variables are non-linear. They have the capacity to
capture intricate patterns.
Types of Regression
Altedegree.co
m
21CS54-AI/ML Prof. Thameeza
Linear Regression:
Single Independent Variable: Linear regression, also known as simple linear regression, is
used when there is a single independent variable (predictor) and one dependent variable
(target).
Equation: The linear regression equation takes the form: Y = β0 + β1X + ε, where Y is the
dependent variable, X is the independent variable, β0 is the intercept, β1 is the slope
(coefficient), and ε is the error term.
Purpose: Linear regression is used to establish a linear relationship between two variables
and make predictions based on this relationship. It's suitable for simple scenarios where there's
only one predictor.
Multiple Regression:
Multiple Independent Variables: Multiple regression, as the name suggests, is used when
there are two or more independent variables (predictors) and one dependent variable (target).
Equation: The multiple regression equation extends the concept to multiple predictors: Y = β0
+ β1X1 + β2X2 + ... + βnXn + ε, where Y is the dependent variable, X1, X2, ..., Xn are the
independent variables, β0 is the intercept, β1, β2, ..., βn are the coefficients, and ε is the error
term.
Purpose: Multiple regression allows you to model the relationship between the dependent
variable and multiple predictors simultaneously. It's used when there are multiple factors that
may influence the target variable, and you want to understand their combined effect and
make predictions based on all these factors.
Polynomial Regression:
Use: Polynomial regression is an extension of multiple regression used when the relationship
between the independent and dependent variables is non-linear.
Equation: The polynomial regression equation allows for higher-order terms, such as quadratic
or cubic terms: Y = β0 + β1X + β2X^2 + ... + βnX^n + ε. This allows the model to fit a curve
rather than a straight line.
Logistic Regression:
Use: Logistic regression is used when the dependent variable is binary (0 or 1). It models the
probability of the dependent variable belonging to a particular class.
Equation: Logistic regression uses the logistic function (sigmoid function) to model
probabilities: P(Y=1) = 1 / (1 + e^(-z)), where z is a linear combination of the independent
variables: z = β0 + β1X1 + β2X2 + ... + βnXn. It transforms this probability into a binary
outcome.
Altedegree.co
m
21CS54-AI/ML Prof. Thameeza
Objective Function: Lasso regression adds an L1 penalty to the linear regression loss
function: Lasso = RSS + λΣ|βi|, where RSS is the residual sum of squares, λ is the regularization
strength, and |βi| represents the absolute values of the coefficients.
Limitations of Regression
Altedegree.co
m
21CS54-AI/ML Prof. Thameeza
Altedegree.co
m
21CS54-AI/ML Prof. Thameeza
Altedegree.co
m
21CS54-AI/ML Prof. Thameeza
Altedegree.co
m
21CS54-AI/ML Prof. Thameeza
Altedegree.co
m
21CS54-AI/ML Prof. Thameeza
Altedegree.co
m
21CS54-AI/ML Prof. Thameeza
Coefficient of Determination
The coefficient of determination (R² or r-squared) is a statistical measure in a regression
model that determines the proportion of variance in the dependent variable that can be
explained by the independent variable.
The sum of the squares of the differences between the y-value of the data pair and the
average of y is called total variation. Thus, the following variation can be defined as,
The explained variation is given by, =∑( Ŷi – mean(Yi))2
The unexplained variation is given by, =∑( Yi - Ŷi )2
Thus, the total variation is equal to the explained variation and the unexplained variation.
The coefficient of determination r2 is the ratio of the explained and unexplained
variations.
Altedegree.co
m
21CS54-AI/ML Prof. Thameeza
Consider the following training set Table 5.4 for predicting the sales of the items.
Altedegree.co
m
21CS54-AI/ML Prof. Thameeza
Altedegree.co
m
21CS54-AI/ML Prof. Thameeza
THANK YOU
Altedegree.co
m