0% found this document useful (0 votes)

3 views19 pages

ML Basics Unit 5

i want answers for machine learning

Uploaded by

bhargavanarasimha42

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views19 pages

ML Basics Unit 5

i want answers for machine learning

Uploaded by

bhargavanarasimha42

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Sreyas Institute of Engineering and Technology

An Autonomous Institution
Approved by AICTE, Affiliated to JNTUH
Accredited by NAAC-A Grade, NBA (CSE, ECE & ME) & ISO 9001:2015 Certified

UNIT-V

Reinforcement Learning – Overview – Getting Lost Example - Markov Chain Monte Carlo Methods
– Sampling – Proposal Distribution – Markov Chain Monte Carlo – Hidden Markov Models

Reinforcement Learning
Reinforcement Learning (RL) is a branch of machine learning that focuses on how agents can
learn to make decisions through trial and error to maximize cumulative rewards. RL allows
machines to learn by interacting with an environment and receiving feedback based on their
actions. This feedback comes in the form of rewards or penalties.
Reinforcement learning is a reward/punishment-based learning technique where, a teacher or
critic is present not to guide just like in supervised learning, but punishes for the wrong actions
and rewards for the correct actions.

Reinforcement Learning revolves around the idea that an agent (the learner or decision-
maker) interacts with an environment to achieve a goal. The agent performs actions and
receives feedback to optimize its decision-making over time.
 Agent: The decision-maker that performs actions.
 Environment: The world or system in which the agent operates.
 State: The situation or condition the agent is currently in.
 Action: The possible moves or decisions the agent can make.
 Reward: The feedback or result from the environment based on the agent’s action.
How Reinforcement Learning Works?
The RL process involves an agent performing actions in an environment, receiving rewards or
penalties based on those actions, and adjusting its behavior accordingly. This loop helps the
agent improve its decision-making over time to maximize the cumulative reward.
Here’s a breakdown of RL components:

ML Basics Notes Page 1

Sreyas Institute of Engineering and Technology
An Autonomous Institution
Approved by AICTE, Affiliated to JNTUH
Accredited by NAAC-A Grade, NBA (CSE, ECE & ME) & ISO 9001:2015 Certified

 Policy: A strategy that the agent uses to determine the next action based on the
current state.
 Reward Function: A function that provides feedback on the actions taken, guiding the
agent towards its goal.
 Value Function: Estimates the future cumulative rewards the agent will receive from
a given state.
 Model of the Environment: A representation of the environment that predicts future
states and rewards, aiding in planning.
Reinforcement Learning Example: Navigating a Maze
Imagine a robot navigating a maze to reach a diamond while avoiding fire hazards. The goal is
to find the optimal path with the least number of hazards while maximizing the reward:
 Each time the robot moves correctly, it receives a reward.
 If the robot takes the wrong path, it loses points.
The robot learns by exploring different paths in the maze. By trying various moves, it
evaluates the rewards and penalties for each path. Over time, the robot determines the best
route by selecting the actions that lead to the highest cumulative reward.

The robot's learning process can be summarized as follows:

1. Exploration: The robot starts by exploring all possible paths in the maze, taking different
actions at each step (e.g., move left, right, up, or down).
2. Feedback: After each move, the robot receives feedback from the environment:
 A positive reward for moving closer to the diamond.
 A penalty for moving into a fire hazard.
3. Adjusting Behavior: Based on this feedback, the robot adjusts its behavior to maximize
the cumulative reward, favoring paths that avoid hazards and bring it closer to the
diamond.

ML Basics Notes Page 2

Sreyas Institute of Engineering and Technology
An Autonomous Institution
Approved by AICTE, Affiliated to JNTUH
Accredited by NAAC-A Grade, NBA (CSE, ECE & ME) & ISO 9001:2015 Certified

4. Optimal Path: Eventually, the robot discovers the optimal path with the least number of
hazards and the highest reward by selecting the right actions based on past experiences.
Getting Lost Example: Navigating a Maze
Imagine you’re lost in a maze (or a robot navigating a maze-like environment). The goal is to
find the exit as quickly as possible. In RL, this scenario is modeled as follows:
1. RL Components in the Maze Example
 Agent: You (or the robot) trying to find the exit.
 Environment: The maze, with walls, paths, and an exit.
 States: Each position in the maze (e.g., a specific intersection or coordinate). For
example, state S1 S_1 S1 might be “at the entrance,” and state Sn S_n Sn might be “at a
dead end.”
 Actions: Possible moves at each state (e.g., go left, right, forward, or backward).
 Rewards: Feedback from the environment:
o Positive reward (e.g., +100) for reaching the exit.
o Small negative reward (e.g., -1) for each step to encourage efficiency.
o Optional: Larger negative reward (e.g., -10) for hitting a wall or dead end.
 Policy: The strategy the agent learns to choose actions (e.g., “in state S1 S_1 S1, go right
with 80% probability”). Initially random, it improves over time.
2. RL Process in the Maze
The agent learns through trial-and-error:
1. Initialization: Starts at the maze entrance with no prior knowledge (random policy).
2. Interaction:
o In state S1 S_1 S1 (entrance), the agent chooses an action (e.g., go right).
o The environment responds with a new state (e.g., S2 S_2 S2, a new position) and a
reward (e.g., -1 for a step).
o If the agent hits a wall, it gets a negative reward (-10) and stays in the same state.
3. Learning:
o The agent updates its policy based on rewards using an RL algorithm (e.g., Q-
learning):
 Q-learning maintains a table of Q-values, estimating the expected future
reward for each state-action pair.
 Example: Q(S1,right) Q(S_1, \text{right}) Q(S1,right) increases if “going
right” leads to the exit faster.

ML Basics Notes Page 3

Sreyas Institute of Engineering and Technology
An Autonomous Institution
Approved by AICTE, Affiliated to JNTUH
Accredited by NAAC-A Grade, NBA (CSE, ECE & ME) & ISO 9001:2015 Certified

o Balances exploration (trying new paths) and exploitation (following known

good paths).
4. Iteration: Over many attempts (episodes), the agent learns an optimal policy to reach
the exit efficiently, minimizing steps and avoiding dead ends.
3. Example Walkthrough
 Episode 1: The agent randomly explores, hits several dead ends (-10 rewards), takes
many steps (-1 each), and eventually finds the exit (+100). Total reward: Low due to
inefficiency.
 Episode 10: The agent starts remembering paths that lead closer to the exit, avoiding
some dead ends. Q-values for good actions increase.
 Episode 100: The agent follows a near-optimal path, taking minimal steps to the exit,
maximizing cumulative reward.
Types of Reinforcements in RL
1. Positive Reinforcement
Positive Reinforcement is defined as when an event, occurs due to a particular behavior,
increases the strength and the frequency of the behavior. In other words, it has a positive
effect on behavior.
 Advantages: Maximizes performance, helps sustain change over time.
 Disadvantages: Overuse can lead to excess states that may reduce effectiveness.
2. Negative Reinforcement
Negative Reinforcement is defined as strengthening of behavior because a negative condition
is stopped or avoided.
 Advantages: Increases behavior frequency, ensures a minimum performance
standard.
 Disadvantages: It may only encourage just enough action to avoid penalties.

Markov Chain Monte Carlo Method (MCMC):

It is stochastic method containing random variables, transitioning from one state to another
state depending on certain assumptions and definite probabilistic rules.
These assumptions and probabilistic rules are defined by markov property
MCMC algorithms are designed to generate samples from probability distributions that are
difficult or impossible to sample from directly.
MCMC is a class of algorithms that uses markov chains to generate a sequence of samples
from probability distribution.
It combines the principle of Monte Carlo methods and Markov chain method.
ML Basics Notes Page 4
Sreyas Institute of Engineering and Technology
An Autonomous Institution
Approved by AICTE, Affiliated to JNTUH
Accredited by NAAC-A Grade, NBA (CSE, ECE & ME) & ISO 9001:2015 Certified

Monte Carlo Method:

MCMC builds upon the Monte Carlo method, which uses random sampling to solve
problems. In this case, the problem is drawing samples from a complex distribution.
Markov Chain:
The Markov property, a key concept in probability and statistics, describes a system where
the future state depends only on the present state, not the past. Essentially, it means the
system has no "memory" of its previous states. If the current state is known, the past is
irrelevant for predicting the future.

Mathematical Formulation of Markov Chain:

In mathematical terms, the Markov property can be expressed as:
P(Xt+1 = s | Xt = st, ..., X0 = s0) = P(Xt+1 = s | Xt = st)

This means that the probability of being in state 's' at time t+1, given all previous states, is the
same as the probability of being in state 's' at time t+1, given only the current state.
MCMC Algorithms:
MCMC works by constructing a Markov chain—a sequence of states where each state depends
only on the previous one—that have π(x) as its stationary distribution. By simulating this
chain for much iteration, the samples produced approximate the target distribution, allowing
you to estimate properties like means, variances, or probabilities.
Metropolis-Hastings (MH) Algorithm is popular MCMC method that generates samples
from a target distribution π(n) by constructing a Markov chain. It uses a proposal distribution
P(n∗∣n) to suggest new states and accepts or rejects them based on an acceptance probability,
ensuring the chain converges to the target distribution.

ML Basics Notes Page 5

Sreyas Institute of Engineering and Technology
An Autonomous Institution
Approved by AICTE, Affiliated to JNTUH
Accredited by NAAC-A Grade, NBA (CSE, ECE & ME) & ISO 9001:2015 Certified

Explanation about MH algorithm:

Step: 1 Initialization: Start with an initial state n0, number of iterations N , and burn-in period
B.
Step: 2 Proposal: Sample a candidate state n∗ n^* n∗ from the proposal distribution P(n∗∣n)
P(n^* | n) P(n∗∣n).
Step: 3 Output: Accepted Samples and Acceptance Rate
Acceptance Rate is which measures the fraction of proposed states that are accepted.
Step: 4 Initialize with a value ‘0’ as the acceptance counter
Step: 5 Iterations starts from ‘j’ to ‘N’
Step: 6 Propose a new sample n*:
The algorithm samples a candidate state n* from the proposal distribution:
n∗∼P(n∗∣n)
This step determines where the chain might move next. The choice of proposal distribution
(e.g., its spread or shape) affects how far the chain can jump.
Step 7: Calculate the acceptance probability A(n,n∗):
The acceptance probability incorporates the proposal distribution:

Here, P(n | n*) is the probability of proposing n given n* (reverse proposal), and P(n∗∣n) is
the probability of proposing n* given n.
If the proposal distribution is symmetric (i.e., P(n∣n∗)=P(n∗∣n)), the ratio P(n∗∣n)/P(n∣n∗)=1,
simplifying the acceptance probability to:

Steps 8–11: Accept or reject: The proposed state n* is accepted with probability A(n,n∗). If
rejected, the chain stays at n. The proposal distribution influences the acceptance rate
(computed in Step 16), as a poorly chosen proposal can lead to frequent rejections or slow
exploration.

ML Basics Notes Page 6

Sreyas Institute of Engineering and Technology
An Autonomous Institution
Approved by AICTE, Affiliated to JNTUH
Accredited by NAAC-A Grade, NBA (CSE, ECE & ME) & ISO 9001:2015 Certified

Proposal Distribution in MH
Several proposal distributions are used in the MH algorithm, involving:
Gaussian distribution or Normal Distribution:
This invariant, commonly employed in proposal distribution, is especially effective when the
target distribution is unimodal and symmetric.
The distribution is denoted as

P(n∗|n) = G(n∗, σ2), and σ2 a variance.

Uniform distribution (UD):

UD is utilized when the target distribution has bounded support and the proposal given
distribution is denoted as

P(n∗|n) = U(a, b),

with a and b as the upper and lower bounds of the support.

Cauchy distribution:
Suitable for target distributions with heavy tails. The distribution that we are talking above is
derived as

P(n∗|n) = Cauchy(n∗, γ), where γ is the scale parameter.

ML Basics Notes Page 7

Sreyas Institute of Engineering and Technology
An Autonomous Institution
Approved by AICTE, Affiliated to JNTUH
Accredited by NAAC-A Grade, NBA (CSE, ECE & ME) & ISO 9001:2015 Certified

Exponential distribution:
Suited for target distributions that are non-negative with a long right tail. The distribution
that we are considering is denoted by

P(n∗|n) = Exp(n∗, λ), where λ is known as rate parameter.

Student’s t-distribution:
Applied, when the target/desired distribution has heavy tails and the Gaussian distribution,
is not suitable. The distribution describe above is denoted by

P(n∗|n) = tν(n∗, μ, σ2),

where ν is the degrees of freedom, μ mean, and σ2 variance.

ML Basics Notes Page 8

Sreyas Institute of Engineering and Technology
An Autonomous Institution
Approved by AICTE, Affiliated to JNTUH
Accredited by NAAC-A Grade, NBA (CSE, ECE & ME) & ISO 9001:2015 Certified

The selection of the given above distribution hinges on the properties of the target
distribution (TD) and the desired acceptance rate.
Typically, the proposal distribution enable the generation of a diverse set of candidate states,
but not to an extent that results in an excessively low acceptance rate. A commonly employed
strategy involves refining the proposal distribution during the MH Algorithm by adjusting
parameters such as variance or scale until the acceptance rate aligns with the desired range.
Advantages of MCMC
1. Handles Complex Distributions: MCMC can sample from high-dimensional, complex,
or non-standard probability distributions (e.g., posterior distributions in Bayesian
inference) where analytical solutions or direct sampling are impractical.
2. Flexibility: Applicable to a wide range of problems, from statistical inference to
physics, as it doesn’t require specific distributional assumptions.
3. Approximates Posterior Distributions: In Bayesian statistics, MCMC estimates
posterior distributions by generating samples, enabling uncertainty quantification in
model parameters.
4. Scalable to High Dimensions: Effective in high-dimensional spaces (e.g., thousands of
parameters), common in modern machine learning and scientific modeling.
5. Convergence to True Distribution: Given enough iterations, MCMC converges to the
target distribution, ensuring theoretically sound results. Provides reliable samples for
inference, unlike heuristic methods that may not guarantee convergence.

Disadvantages of MCMC
1. Computational Cost: MCMC can be computationally intensive, requiring much iteration
to converge, especially for complex or high-dimensional distributions.
2. Convergence Issues: Convergence is not guaranteed in finite time, and diagnosing
convergence (e.g., using Gelman-Rubin statistics) can be challenging.

ML Basics Notes Page 9

Sreyas Institute of Engineering and Technology
An Autonomous Institution
Approved by AICTE, Affiliated to JNTUH
Accredited by NAAC-A Grade, NBA (CSE, ECE & ME) & ISO 9001:2015 Certified

3. Tuning Complexity: Requires careful tuning of parameters (e.g., proposal distribution

in Metropolis-Hastings, step size in Hamiltonian MCMC) to achieve efficient sampling.
4. Dependence on Initial Conditions: Poor starting points or burn-in periods can lead to
biased samples if not properly managed.
5. Sample Correlation: MCMC generates correlated samples (due to the Markov
property), which may reduce the effective sample size compared to independent
sampling methods.
Applications of MCMC
1. Bayesian Inference:
o Estimating posterior distributions for model parameters in machine learning
(e.g., Bayesian neural networks, Gaussian processes).
o Example: Inferring parameters in a probabilistic model for customer behavior
prediction.
2. Statistical Physics:
o Simulating systems with many degrees of freedom (e.g., Ising models for
magnetism, molecular dynamics).
o Example: Modeling phase transitions in materials.
3. Machine Learning:
o Used in probabilistic graphical models, topic modeling (e.g., Latent Dirichlet
Allocation), and Bayesian hyperparameter optimization.
o Example: Tuning hyperparameters for an RL agent’s policy in a reinforcement
learning task.
4. Bioinformatics:
o Inferring phylogenetic trees or protein structures from genetic data using
Bayesian methods.
o Example: Estimating evolutionary relationships between species.
5. Finance:
o Modeling uncertainty in asset prices or risk assessment in portfolio optimization.
o Example: Pricing complex derivatives using stochastic models.
6. Image Processing:
o Image denoising or reconstruction by sampling from posterior distributions of
pixel intensities.
o Example: Restoring blurred images in medical imaging.

Hidden Markov Model in Machine Learning

ML Basics Notes Page 10
Sreyas Institute of Engineering and Technology
An Autonomous Institution
Approved by AICTE, Affiliated to JNTUH
Accredited by NAAC-A Grade, NBA (CSE, ECE & ME) & ISO 9001:2015 Certified

It is a statistical model that is used to describe the probabilistic relationship between a

sequence of observations and a sequence of hidden states. Iike it is often used in situations
where the underlying system or process that generates the observations is unknown or
hidden, hence it has the name "Hidden Markov Model."
These are used to model sequential data and make predictions about the next state or
observation in the sequence.
An HMM consists of two types of variables: hidden states and observations.
 The hidden states are the underlying variables that generate the observed data, but
they are not directly observable.
 The observations are the variables that are measured and observed.
The relationship between the hidden states and the observations is modeled using a
probability distribution. The Hidden Markov Model (HMM) is the relationship between the
hidden states and the observations using two sets of probabilities: the transition probabilities
and the emission probabilities.
 The transition probabilities describe the probability of transitioning from one hidden
state to another.
 The emission probabilities describe the probability of observing an output given a
hidden state.
Hidden Markov Model Algorithm
The Hidden Markov Model (HMM) algorithm can be implemented using the following steps:
Step 1: Define the state space and observation space: The state space is the set of all
possible hidden states, and the observation space is the set of all possible observations.
Step 2: Define the initial state distribution: This is the probability distribution over the
initial state.
Step 3: Define the state transition probabilities: These are the probabilities of transitioning
from one state to another. This forms the transition matrix, which describes the probability of
moving from one state to another.
Step 4: Define the observation likelihoods: These are the probabilities of generating each
observation from each state. This forms the emission matrix, which describes the probability
of generating each observation from each state.
Step 5: Train the model: The parameters of the state transition probabilities and the
observation likelihoods are estimated using the Baum-Welch algorithm, or the forward-
backward algorithm. This is done by iteratively updating the parameters until convergence.

ML Basics Notes Page 11

Sreyas Institute of Engineering and Technology
An Autonomous Institution
Approved by AICTE, Affiliated to JNTUH
Accredited by NAAC-A Grade, NBA (CSE, ECE & ME) & ISO 9001:2015 Certified

Step 6: Decode the most likely sequence of hidden states: Given the observed data, the
Viterbi algorithm is used to compute the most likely sequence of hidden states. This can be
used to predict future observations, classify sequences, or detect patterns in sequential data.
Step 7: Evaluate the model: The performance of the HMM can be evaluated using various
metrics, such as accuracy, precision, recall, or F1 score.

Baum-Welch algorithm
The Baum-Welch algorithm is an expectation-maximization (EM) algorithm used to train
Hidden Markov Models (HMMs) by estimating their parameters (transition probabilities,
emission probabilities, and initial state probabilities) from unlabeled sequence data. It
iteratively refines these parameters to maximize the likelihood of the observed data.
The Baum-Welch Algorithm, also known as the Forward-Backward Algorithm, is a key
method for training Hidden Markov Models (HMMs). It estimates the model parameters (initial
probabilities, transition probabilities, and emission probabilities) from observed data, even
when the hidden states are unknown.

ML Basics Notes Page 12

Sreyas Institute of Engineering and Technology
An Autonomous Institution
Approved by AICTE, Affiliated to JNTUH
Accredited by NAAC-A Grade, NBA (CSE, ECE & ME) & ISO 9001:2015 Certified

Purpose of the Baum-Welch Algorithm

In an HMM, the parameters are:
 Initial Probabilities (πi ): Probability of starting in state i i i.
 Transition Probabilities (ai,j ): Probability of transitioning from state i i i to state j j j.
 Emission Probabilities (bi(o)): Probability of observing o o o in state i i i.
The Baum-Welch Algorithm solves the learning problem in HMMs: it adjusts these parameters
to maximize the likelihood of the observed sequence o1,o2,…,oT without knowing the hidden
state sequence. It’s an instance of the Expectation-Maximization (EM) algorithm, iteratively
improving the model parameters.
How the Baum-Welch Algorithm Works
The algorithm alternates between two steps—E-step (Expectation) and M-step
(Maximization)—until convergence. Below is step to step process from the image given:
1. Initialization:
o Start with initial guesses for ‘π’, ‘a’, and ‘b’. If no prior knowledge exists, set ‘π’ to
equal probabilities for all states, and randomize ‘a’ and ‘b’, as noted in the image.
2. While Updates Have Not Converged:
o E-Step (Expectation):
 Compute the forward probabilities (α) and backward probabilities (β)
for the observation sequence ‘ot’ (where t=1,…,T).
 For each time step ‘t’ and states ‘i,j’ compute the posterior probabilities:
 ξi,j(t): Probability of being in state ‘i ‘at time ‘t’ and state ‘j’ at time
‘t+1’, given the observations (Equation 16.20 in the image).

ML Basics Notes Page 13

Sreyas Institute of Engineering and Technology
An Autonomous Institution
Approved by AICTE, Affiliated to JNTUH
Accredited by NAAC-A Grade, NBA (CSE, ECE & ME) & ISO 9001:2015 Certified

 This step estimates the expected counts of state transitions and

emissions based on the current model parameters.
o M-Step (Maximization):
 Update the HMM parameters using the expected counts from the E-step:
 Update ‘πi’, the initial probabilities, for each state ‘i’ (Equation
16.17).
 Update ‘ai,j’ the transition probabilities, for each pair of states ‘i,j‘
(Equation 16.18).
 Update ‘bi(o)’, the emission probabilities, for each state ‘i’ and
possible observation ‘o’ (Equation 16.19).
 These updates maximize the expected likelihood of the observations
under the current posterior probabilities.
3. Convergence:
o Repeat the E-step and M-step until the parameters stabilize (i.e., changes are
negligible) or a maximum number of iterations is reached.

Viterbi algorithm:
The Viterbi Algorithm is a dynamic programming algorithm used to find the most likely
sequence of hidden states—called the Viterbi path—in a Hidden Markov Model (HMM), given a
sequence of observed events. It’s widely applied in areas like speech recognition, natural
language processing (e.g., part-of-speech tagging), and bioinformatics (e.g., gene prediction).
Below, I’ll introduce the algorithm, its purpose, and how it works, keeping the explanation
concise and building on your prior context about HMMs and the Viterbi Algorithm description
you provided.
Purpose of the Viterbi Algorithm
HMMs model systems where observations (e.g., words, sounds) are generated by underlying
hidden states (e.g., grammatical tags, phonemes) that follow a Markov process. The Viterbi
Algorithm solves the decoding problem in HMMs: it identifies the most probable sequence of
hidden states that could have produced a given sequence of observations, maximizing the joint
probability of the state sequence and observations.
HMM Viterbi Algorithm:

ML Basics Notes Page 14

Sreyas Institute of Engineering and Technology
An Autonomous Institution
Approved by AICTE, Affiliated to JNTUH
Accredited by NAAC-A Grade, NBA (CSE, ECE & ME) & ISO 9001:2015 Certified

How the Viterbi Algorithm Works: The algorithm uses dynamic programming to efficiently
compute the most likely state sequence without evaluating all possible sequences (which
would be exponential). It builds a trellis (a grid of probabilities over time and states) and
finds the optimal path through it. Here’s a step-by-step process
Initialization:
 For each state ‘i’, initialize δi,0 as the probability of starting in state ‘i’ and observing the
first observation ‘o(0)’:
δi,0=πibi(o(0))
where ‘πi‘ is the initial state probability, and bi(o(0)) is the emission probability of
observing o(0) in state ‘i’ . Set ϕi,0=0, which tracks the most likely previous state (not
needed at t=0).
**Forward Recursion (for each time step t)):
 For each possible state ‘s ‘at time ‘t’:
o Compute δs,t the highest probability of being in state ‘s’ at time ‘t’, considering all
possible previous states ‘i’:
δs,t=max⁡i(δi,t−1ai,s)bs(o(t))
where ai,s is the transition probability from state ‘i’ to state ‘s’, and bs(o(t)) is the
emission probability of observing o(t) in state ‘s’.
o Store the previous state ‘i ‘ that maximizes this probability:
ϕs,t=arg⁡max⁡i(δ i,t−1 ai,s)
Termination: At the final time step ‘T’, find the most likely end state qT ∗:
qT∗=argimaxδi,T
Backtracking: Trace back the most likely state sequence using the stored ‘ϕ ‘values:

ML Basics Notes Page 15

Sreyas Institute of Engineering and Technology
An Autonomous Institution
Approved by AICTE, Affiliated to JNTUH
Accredited by NAAC-A Grade, NBA (CSE, ECE & ME) & ISO 9001:2015 Certified

qt−1∗=ϕqt∗,t
Repeat this for t =T down to down to t=1 , building the sequence q0∗,q1∗,…,qT∗
Advantages of Hidden Markov Models (HMMs)
1. Effective for Sequential and Time-Series Data: Captures temporal dependencies and
transitions between states, which are common in real-world sequential data.
2. Probabilistic Framework: HMMs use a probabilistic approach to model uncertainty,
with well-defined algorithms (e.g., Forward-Backward, Viterbi, Baum-Welch) for
inference, decoding, and learning.
Provides a robust way to handle noisy or incomplete data, offering probabilities for
predictions rather than deterministic outputs.
3. Flexibility Across Domains: Applicable to diverse fields like speech processing (e.g.,
modeling phonemes), bioinformatics (e.g., gene prediction), and finance (e.g., stock price
modeling).
The general framework can be adapted to various sequential tasks with appropriate
state and observation definitions.
4. Efficient Algorithms: Algorithms like Viterbi (for finding the most likely state
sequence) and Baum-Welch (for parameter estimation) are computationally efficient for
moderate-sized problems.
Enables practical implementation even with large sequences, balancing accuracy and
speed.
5. Handles Variable-Length Sequences: HMMs can process sequences of varying lengths
without requiring fixed-size inputs, unlike some neural network models.
Suitable for real-world data where sequence lengths differ (e.g., varying sentence
lengths in NLP).
Disadvantages of Hidden Markov Models (HMMs)
1. Assumption of Markov Property: HMMs assume that the current state depends only
on the previous state (first-order Markov property), which may not capture long-term
dependencies in complex sequences.
In tasks like language modeling, where context spans multiple steps, HMMs may
underperform compared to models like LSTMs or Transformers.
2. Scalability Issues: Training HMMs on large datasets or with many hidden states can be
computationally expensive, as the Baum-Welch algorithm scales poorly with state space
size.

ML Basics Notes Page 16

Sreyas Institute of Engineering and Technology
An Autonomous Institution
Approved by AICTE, Affiliated to JNTUH
Accredited by NAAC-A Grade, NBA (CSE, ECE & ME) & ISO 9001:2015 Certified

Less suitable for very large or high-dimensional datasets compared to deep learning
models.
3. Local Optima in Training: The Expectation-Maximization (EM) approach used in
Baum-Welch can converge to local optima, leading to suboptimal model parameters.
May result in lower accuracy if initialization is poor or the model is complex.
4. Limited Expressiveness: HMMs struggle with highly non-linear or complex patterns in
data, as they rely on simple probabilistic transitions and emission distributions (often
Gaussian or discrete).
Deep learning models (e.g., RNNs, Transformers) often outperform HMMs in tasks
requiring complex feature interactions, like advanced NLP or image sequence analysis.
5. Requires Careful Design: Defining the number of hidden states and appropriate
emission distributions requires domain knowledge and experimentation.
Mis-specified models (e.g., too few or too many states) can lead to poor performance,
and tuning is not always straightforward
Applications of the HMM:
 Natural Language Processing (NLP): HMMs are employed across diverse natural language
processing tasks, including named entity recognition, part-of-speech tagging, and machine
translation. Their ability to capture sequential dependencies in language contributes to
the improved accuracy of these applications.
 Speech Recognition (SP): HMMs play a crucial role in contemporary speech recognition
systems. They adeptly capture the intricate relationship between phonemes and acoustic
features, facilitating precise speech recognition and transcription.
 Financial Time Series Analysis (FTSA): HMMs find utility in modeling and forecasting time
series data, encompassing exchange rates or stock market prices. Through the capture of
hidden states and transitions, HMMs provide valuable insights into market trends,
facilitating well-informed investment decisions

ML Basics Notes Page 17

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh

Reinforcement Learning 1
No ratings yet
Reinforcement Learning 1
14 pages
ML Unit-4
No ratings yet
ML Unit-4
10 pages
Unit-5 (AI)
No ratings yet
Unit-5 (AI)
21 pages
Reinforcement Learning-1
No ratings yet
Reinforcement Learning-1
13 pages
Unit 5 - Reinforcement Learning
No ratings yet
Unit 5 - Reinforcement Learning
15 pages
ML Unit 5 at VS
No ratings yet
ML Unit 5 at VS
29 pages
tiếng anhi
No ratings yet
tiếng anhi
7 pages
Reinforced Learning
No ratings yet
Reinforced Learning
25 pages
ML Unit 5
No ratings yet
ML Unit 5
29 pages
Reinforcement Learning, Q-Learning
No ratings yet
Reinforcement Learning, Q-Learning
20 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
10 pages
Unit-5 ML
No ratings yet
Unit-5 ML
18 pages
UNIT V Reinforcement Learning
No ratings yet
UNIT V Reinforcement Learning
8 pages
UNIT-5 Machine Learning
No ratings yet
UNIT-5 Machine Learning
31 pages
Unit 5 ML
No ratings yet
Unit 5 ML
15 pages
UNIT-V-Reinforcement Learning
No ratings yet
UNIT-V-Reinforcement Learning
4 pages
RL 1
No ratings yet
RL 1
29 pages
Unit-5 Mla
No ratings yet
Unit-5 Mla
22 pages
Reinforcement Learning Mastery Path
No ratings yet
Reinforcement Learning Mastery Path
18 pages
RL & DL Notes
No ratings yet
RL & DL Notes
43 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
29 pages
ML Unit-4 - RTU
No ratings yet
ML Unit-4 - RTU
18 pages
Unit 5 ML
No ratings yet
Unit 5 ML
49 pages
Reinforcement Learning - Personal Study Notes
No ratings yet
Reinforcement Learning - Personal Study Notes
12 pages
Reinforcement
No ratings yet
Reinforcement
9 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
Module 5
No ratings yet
Module 5
52 pages
CMPE257 - W10C13 - Reinforcement Learning
No ratings yet
CMPE257 - W10C13 - Reinforcement Learning
161 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
5 pages
Unit 5 Part1 RL Notes
No ratings yet
Unit 5 Part1 RL Notes
22 pages
MLT Unit-5 Notes
No ratings yet
MLT Unit-5 Notes
17 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
25 pages
Unit 5
No ratings yet
Unit 5
10 pages
RL & DL Notes
No ratings yet
RL & DL Notes
73 pages
Final
No ratings yet
Final
18 pages
ML Unit 5
No ratings yet
ML Unit 5
30 pages
ML Unit 5
No ratings yet
ML Unit 5
30 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
9 pages
Unit 5
No ratings yet
Unit 5
39 pages
ML 4
No ratings yet
ML 4
4 pages
RL Unit-1
No ratings yet
RL Unit-1
52 pages
Reinforcement Learning Notes ?
No ratings yet
Reinforcement Learning Notes ?
40 pages
Module 1
No ratings yet
Module 1
72 pages
Sections
No ratings yet
Sections
76 pages
Unit 4
No ratings yet
Unit 4
49 pages
Unit V
No ratings yet
Unit V
15 pages
Unit - 5
No ratings yet
Unit - 5
43 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
RL - 01 Introduction To Reinforcement Learning
No ratings yet
RL - 01 Introduction To Reinforcement Learning
62 pages
Module 01
No ratings yet
Module 01
66 pages
Unit 1 - Reinforcement Learning, Overfitting, Training, Validation Sets, Metrics, Bias and Variance
No ratings yet
Unit 1 - Reinforcement Learning, Overfitting, Training, Validation Sets, Metrics, Bias and Variance
16 pages
Unit 5
No ratings yet
Unit 5
45 pages
Introduction To Reinforcement Learning
100% (1)
Introduction To Reinforcement Learning
52 pages
Unit 6
No ratings yet
Unit 6
34 pages
Module 1
No ratings yet
Module 1
81 pages
ML Module V
No ratings yet
ML Module V
21 pages
2024 - Grade - 11 - Mapwork Task - Ermelo - Marking - Guide
No ratings yet
2024 - Grade - 11 - Mapwork Task - Ermelo - Marking - Guide
11 pages
Fossil Fuel Power Plant Boiler Combustion Controls: ISA-S77.41-1992
No ratings yet
Fossil Fuel Power Plant Boiler Combustion Controls: ISA-S77.41-1992
28 pages
Supply-Analysis-3 in Rizal
No ratings yet
Supply-Analysis-3 in Rizal
3 pages
Active Learning Tutorials
No ratings yet
Active Learning Tutorials
145 pages
Elias Codes
No ratings yet
Elias Codes
6 pages
Improving Blasting Operations Using Data
No ratings yet
Improving Blasting Operations Using Data
7 pages
Partial Differentiation Insights
No ratings yet
Partial Differentiation Insights
17 pages
Gha Tests Summary
No ratings yet
Gha Tests Summary
4 pages
Chapter 1. 1D Plasticity Models v1.0
No ratings yet
Chapter 1. 1D Plasticity Models v1.0
85 pages
1 Forensic Chemistry and Toxicology INTRODUCTION
No ratings yet
1 Forensic Chemistry and Toxicology INTRODUCTION
30 pages
Principal Properties of Building Materials
No ratings yet
Principal Properties of Building Materials
7 pages
Shame and Its Cure
100% (1)
Shame and Its Cure
27 pages
Gabion Wall Design for Engineers
No ratings yet
Gabion Wall Design for Engineers
21 pages
2005 - Ukmt
No ratings yet
2005 - Ukmt
17 pages
OR0451
No ratings yet
OR0451
3 pages
Vizawanepikodul
No ratings yet
Vizawanepikodul
3 pages
Belonging Social Cohesion and Fundamental British Values
No ratings yet
Belonging Social Cohesion and Fundamental British Values
16 pages
Module I Lecture 4
No ratings yet
Module I Lecture 4
14 pages
Othonos, Athos - Homeopathy - Materia Medica Vol. 1-Athos Othonos (2016)
No ratings yet
Othonos, Athos - Homeopathy - Materia Medica Vol. 1-Athos Othonos (2016)
313 pages
Joseph Pientka Briefing Document
75% (4)
Joseph Pientka Briefing Document
7 pages
Cartography 2.0: For People Who Make Interactive Maps: Cartographic Perspectives September 2009
No ratings yet
Cartography 2.0: For People Who Make Interactive Maps: Cartographic Perspectives September 2009
5 pages
12 Rules For Life Book Review
No ratings yet
12 Rules For Life Book Review
5 pages
Chemistry Coursework STPM
100% (1)
Chemistry Coursework STPM
6 pages
Circuit Elements
No ratings yet
Circuit Elements
81 pages
Little Prince Answers
No ratings yet
Little Prince Answers
13 pages
Forensic QD Long Quiz No. 1
No ratings yet
Forensic QD Long Quiz No. 1
3 pages
Chilika Lake Is The Home of Biodiversity, Odisha - Article
No ratings yet
Chilika Lake Is The Home of Biodiversity, Odisha - Article
2 pages
Biophilic Design Benefits in Architecture
No ratings yet
Biophilic Design Benefits in Architecture
5 pages
Crack Bridging Ability of Liquid-Applied Waterproofing Membrane
No ratings yet
Crack Bridging Ability of Liquid-Applied Waterproofing Membrane
3 pages
Finding The Refractive Index of A Glass Block
75% (4)
Finding The Refractive Index of A Glass Block
6 pages

ML Basics Unit 5

Uploaded by

ML Basics Unit 5

Uploaded by

Sreyas Institute of Engineering and Technology

ML Basics Notes Page 1

The robot's learning process can be summarized as follows:

ML Basics Notes Page 2

ML Basics Notes Page 3

o Balances exploration (trying new paths) and exploitation (following known

Markov Chain Monte Carlo Method (MCMC):

Monte Carlo Method:

Mathematical Formulation of Markov Chain:

ML Basics Notes Page 5

Explanation about MH algorithm:

ML Basics Notes Page 6

P(n∗|n) = G(n∗, σ2), and σ2 a variance.

Uniform distribution (UD):

P(n∗|n) = U(a, b),

with a and b as the upper and lower bounds of the support.

P(n∗|n) = Cauchy(n∗, γ), where γ is the scale parameter.

ML Basics Notes Page 7

P(n∗|n) = Exp(n∗, λ), where λ is known as rate parameter.

P(n∗|n) = tν(n∗, μ, σ2),

where ν is the degrees of freedom, μ mean, and σ2 variance.

ML Basics Notes Page 8

ML Basics Notes Page 9

3. Tuning Complexity: Requires careful tuning of parameters (e.g., proposal distribution

Hidden Markov Model in Machine Learning

It is a statistical model that is used to describe the probabilistic relationship between a

ML Basics Notes Page 11

ML Basics Notes Page 12

Purpose of the Baum-Welch Algorithm

ML Basics Notes Page 13

 This step estimates the expected counts of state transitions and

ML Basics Notes Page 14

ML Basics Notes Page 15

ML Basics Notes Page 16

ML Basics Notes Page 17

You might also like