[go: up one dir, main page]

0% found this document useful (0 votes)
38 views27 pages

AIML Model Answerkey

The document discusses various concepts in machine learning and artificial intelligence, including examples of real-world problems, differences between search algorithms, uncertainty sources, supervised vs unsupervised learning, and cryptarithmetic problems. It also covers optimization techniques like stochastic gradient descent and the gradient descent algorithm, along with Bayesian networks and their applications. Additionally, it highlights ensemble learning types and the importance of activation functions in neural networks.

Uploaded by

R GAYATHRI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views27 pages

AIML Model Answerkey

The document discusses various concepts in machine learning and artificial intelligence, including examples of real-world problems, differences between search algorithms, uncertainty sources, supervised vs unsupervised learning, and cryptarithmetic problems. It also covers optimization techniques like stochastic gradient descent and the gradient descent algorithm, along with Bayesian networks and their applications. Additionally, it highlights ensemble learning types and the importance of activation functions in neural networks.

Uploaded by

R GAYATHRI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 27

PART A (10 X 2 = 20 Marks)

Give example for real world end toy problems.


 Problem: Classify 28×28 pixel grayscale images of handwritten digits (0–9).

 Input: Image data (784 features if flattened).

 Output: Predicted digit (0 to 9).

 Activation functions used:

 ReLU in hidden layers (to introduce non-linearity).


 Softmax in the output layer (to produce probabilities over 10 classes).

Differentiate Blind Search and Heuristic Search.


Feature Blind Search (Uninformed Heuristic Search
Search) (Informed Search)
Definition Searches the solution space Uses problem-specific
without any domain- knowledge (heuristics) to
specific knowledge guide the search
Knowledge Used No additional information Uses heuristics to estimate
other than problem cost or distance to the goal
definition
Examples Breadth-First Search (BFS), A* Search, Greedy Best-
Depth-First Search (DFS), First Search
Uniform Cost Search

Given that P(A)=0.3,P(A|B)=0.4 and P(B)=0.5, Compute P(B|A).


Why does uncertainty arise?
Uncertainty arises because we lack complete knowledge about a system, event, or
environment. This can happen for several reasons, depending on the context (e.g., science,
AI, economics). Here are the main sources of uncertainty:

1. Incomplete Information

We don’t always have access to all relevant data.

2. Measurement Errors

Data collected from sensors or instruments can have noise or inaccuracies.

3. Randomness and Variability

Some systems are inherently random or probabilistic.

4. Model Limitations

Models used to predict or simulate reality are simplifications and can’t capture every
nuance.

5. Ambiguity or Vagueness
Language or definitions may be unclear or open to multiple interpretations.

6. Unknown Future Events

The future is not deterministic in many situation

What is the main key difference between supervised and unsupervised


machine learning?

Supervised Learning

 Definition: The model is trained on a labeled dataset, where each input has a
corresponding output (target).
 Goal: Learn a mapping from inputs to known outputs.
 Example: Predicting house prices based on labeled data like size, location, and
known prices.
 Common Algorithms: Linear regression, decision trees, SVM, neural networks.

Unsupervised Learning

 Definition: The model is trained on unlabeled data, with no explicit output labels.
 Goal: Discover hidden patterns or groupings in the data.
 Example: Grouping customers by purchasing behavior without knowing customer
segments in advance.
 Common Algorithms: K-means clustering, PCA, hierarchical clustering,
autoencoders.

You’ve built a random forest model with 10000 trees. You got delighted
after getting training error as 0.00. But, the validation error is 34.23.
What is going on? Haven’t you trained your model perfectly?
Training Error = 0.00

 Your model has memorized the training data perfectly — it's so complex and deep
(10,000 trees!) that it fits every single pattern, even noise.
 Random forests are ensembles of decision trees, and trees are prone to overfitting
when unrestricted. More trees = more capacity to memorize.

Validation Error = 34.23

 The model doesn’t generalize well to unseen data.


 It learned the noise in the training set rather than the true underlying patterns.

 Reduce the number of trees — you don’t need 10,000.

 Limit tree depth or set minimum samples per split/leaf to prevent overfitting.
 Use cross-validation to tune hyperparameters.

 Check feature importance — you may be overfitting on unimportant or noisy features.

What is the difference between K-means and KNN?


Feature K-Means K-Nearest Neighbors
(KNN)
Type Unsupervised learning Supervised learning
Purpose Clustering – group Classification or regression
unlabeled data into k – predict a label based on
clusters neighbors
Example Group similar things Given known examples,
together.” predict the label of a new
example.”
Which are the three types of ensemble learning?
1.Bagging (Bootstrap Aggregating)
2.Boosting
3.Stacking (Stacked Generalization)

What is stochastic gradient descent and why is it used in the training of


neural networks?
Stochastic Gradient Descent (SGD) is an optimization algorithm used to minimize the
loss function of machine learnin

g models, especially neural networks.

Why is ReLU used in deep learning?


ReLU is popular because it’s simple, efficient, reduces vanishing gradients, and helps deep
networks train faster and better.
PART B
1. Enumerate Classical “Water jug Problem”. Describe the state space for
this problem and also give the solution?
The Water Jug Problem typically involves two jugs with different capacities. The objective is to
measure a specific quantity of water by performing operations like filling a jug, emptying a jug, or
transferring water between the two jugs. The problem can be stated as follows:
 You are given two jugs, one with a capacity of X litres and the other with a capacity
of Y litres You need to measure exactly Z litres of water using these two jugs.
 The allowed operations are:
o Fill one of the jugs.
o Empty one of the jugs.
o Pour water from one jug into another until one jug is either full or empty.
State Space Representation
In AI terms, the Water Jug Problem can be described using a state space representation, where:
Each state is represented by a tuple (a, b), where a is the amount of water in the first jug and b is the
amount of water in the second jug.
The initial state is (0, 0), meaning both jugs are empty.
The goal state is any configuration (a, b) where a or b equals the desired amount Z.
Transitions between states occur when one of the allowed operations is performed.

Search Algorithms to Solve the Water Jug Problem


To solve the Water Jug Problem using AI techniques, we can apply search algorithms like Breadth-
First Search (BFS) and Depth-First Search (DFS). These algorithms systematically explore the
state space to find the optimal sequence of operations that leads to the goal state.
Breadth-First Search (BFS)

 BFS explores all possible states level by level, ensuring that the shortest path (fewest
operations) is found. It is particularly useful for the Water Jug Problem as it guarantees
finding the optimal solution.
 BFS starts from the initial state (0, 0) and explores all neighboring states, then their
neighbors, and so on until the goal state is found.
Depth-First Search (DFS)

 DFS explores each path from the initial state as deeply as possible before backtracking.
While DFS can find a solution, it does not guarantee the optimal one and may result in
exploring longer paths unnecessarily.
 DFS works well for smaller problems but may struggle with larger state spaces due to
its depth-first nature.

Problem Statement

You are given two jugs:

 A 4-gallon jug
 A 3-gallon jug
 No measuring markers are available on either jug.
 You have an unlimited water supply and can either fill, empty, or transfer water
between jugs.
Goal: Obtain exactly 2 gallons of water in the 4-gallon jug.

A state is represented as (x, y) where:

 x is the amount of water in the 4-gallon jug.


 y is the amount of water in the 3-gallon jug.

Possible Operations (State Transitions):

1. Fill a Jug
o (Fill 4-gallon jug) → (4, y)
o (Fill 3-gallon jug) → (x, 3)
2. Empty a Jug
o (Empty 4-gallon jug) → (0, y)
o (Empty 3-gallon jug) → (x, 0)
3. Pour Water from One Jug to Another
o Pour water from the 4-gallon jug into the 3-gallon jug
o Pour water from the 3-gallon jug into the 4-gallon jug

Step 4-Gallon Jug 3-Gallon Jug Action

1 (0, 0) (0, 0) Start with both jugs empty


2 (0, 3) (0, 3) Fill the 3-gallon jug
3 (3, 0) (3, 0) our all water from the 3-gallon jug into the
4-gallon jug
4 (3, 3) (3, 3) Fill the 3-gallon jug again
5 (4, 2) (4, 2) Pour water from the 3-gallon jug into the 4-
gallon jug until it is full
6 (0, 2) (0, 2) Empty the 4-gallon jug
7 (2, 0) (2, 0) Pour water from the 3-gallon jug into the 4-
gallon jug

State Space Representation as a Graph

Each state (x, y) represents a node, and the edges represent possible transitions based on the allowed
operations. The solution path follows the shortest sequence from the initial state to the goal state.

Graph Representation (Partial View):

(0, 0) → (0, 3) → (3, 0) → (3, 3) → (4, 2) → (0, 2) → (2, 0)

2.(i)Solve the following Crypt arithmetic problem using constraints


satisfaction search procedure. CROSS +ROADS = DANGER (8)
(ii) Discuss about constraint satisfaction problem with an algorithm for
solving a crypt arithmetic Problem. SEND+ MORE =MONEY(7)

 Forward checking: After assigning a value, remove it from other variables' domains.
 Arc consistency (AC-3) to prune inconsistent domains.
 Heuristics:
o Choose variable with the smallest domain next.
o Choose the least constraining value.

(ii) Constraint Satisfaction Problem & Algorithm for SEND + MORE =


MONEY To solve the cryptarithmetic problem:

CROSS+ROADS=DANGER

we will use the Constraint Satisfaction Search (CSP) procedure, which involves:

1. Defining Variables: Each unique letter represents a unique digit (0-9).


2. Defining Constraints:
o Each letter represents a single unique digit (0-9).
o The sum must satisfy column-wise addition (carry constraints apply).
o The first letter of any number cannot be 0.
3. Solving via backtracking search.

Step 1: Identify Variables

The unique letters in the problem are:

C,R,O,S,A,D,N,G,EC, R, O, S, A, D, N, G, EC,R,O,S,A,D,N,G,E

We need to assign each letter a unique digit (0-9).

Step 2: Define Constraints

1. C, R, D cannot be 0 (they are leading digits).


2. Column-wise addition must hold (including carries).

Expanding the equation:

CROSS+ ROADS = DANGER

Additional constraints:

 Each digit (0-9) is unique.


 Carries are handled for column-wise addition.

Step 3: Solve using Constraint Satisfaction Search

Using Backtracking & Constraint Propagation, we systematically assign digits and check
constraints.

Solution Found:

By solving manually or using a script, we get:


98233
+ 76423
--------
174656

So, the digit assignments are:

Letter Digit
C 9
R 8
O 2
S 3
A 6
D 7
N 4
G 5
E 6

10 b)To solve the crypt arithmetic problem

We are given the problem:

SEND+MORE=MONEY

Using the Constraint Satisfaction Search Procedure, we will systematically assign unique
digits to letters while ensuring the sum is correct.

Step 1: Identify Variables

The unique letters involved are:

S,E,N,D,M,O,R,YS, E, N, D, M, O, R, YS,E,N,D,M,O,R,Y

Each letter represents a unique digit (0-9).

Step 2: Define Constraints

1. Each letter must be assigned a unique digit.


2. Leading digits cannot be 0:
o S≠0S \neq 0S=0
o M≠0M \neq 0M=0
3. Column-wise addition must be correct, including carries.

Expanding the equation:

SEND
+ MORE
---------
MONEY

Step 3: Solve Using Constraint Satisfaction Search

We use backtracking and constraint propagation to systematically assign values to each


letter while maintaining the constraints.

By solving manually or using a backtracking algorithm, we find:

9567
+ 1085
--------
10652

So, the correct digit assignments are:

Letter Digit
S 9
E 5
N 6
D 7
M 1
O 0
R 8
Y 2

(i)What is the probability that the alarm has Sounded but neither Burglary nor
Earthquake has occurred and both John and Mary Calls?
(ii)What is the Probability that John call?
(iii)What is the probability that there is burglary given that john and Mary calls
4. Construct a Bayesian Network and define the necessary CPTs for the given
scenario. We have a bag of three biased coins a,b and c with probabilities of
coming up heads of 20%, 60% and 80% respectively. One coin is drawn
randomly from the bag (with equal likelihood of drawing each of the three coins)
and then the coin is flipped three times to generate the outcomes X1, X2 and X3

 Bag contains 3 biased coins:

 Coin a: P(H)=0.2P(H) = 0.2P(H)=0.2


 Coin b: P(H)=0.6P(H) = 0.6P(H)=0.6
 Coin c: P(H)=0.8P(H) = 0.8P(H)=0.8

 One coin is drawn randomly (equal probability 1/3 each).

 The chosen coin is flipped 3 times, producing outcomes X1,X2,

Step 1: Define the Variables

 CCC = the chosen coin, with values {a, b, c}.


 X1,X2,,X3 = results of the 3 flips; each can be H (heads) or T (tails).

Step 2: Bayesian Network Structure

/|\

X1 X2 X3
 The chosen coin CCC directly influences the results of flips X1,X2,X3

X1,X2,X3 are conditionally independent given CCC.

C P(C)
a 1/3
b 1/3
c 1/3

5. Assume a disease so rare that it is seen in only one person out of


every million. Assume also that we have a test that is effective in that
if a person has the disease, there is a 99 percent chance that the test
result will be positive; however, the test is not perfect, and there is a
one in a thousand chance that the test result will be positive on a
healthy person. Assume that a new patient arrives and the test result
is positive. What is the probability that the patient has the disease?
6. Explain the principle of the gradient descent algorithm. Accompany
your explanation with a diagram. Explain the use of all the terms and
constants that you introduce and comment on the range of values
that they can take.
Gradient Descent is an optimization algorithm used to minimize a cost (or loss)
function by iteratively moving in the direction of the steepest descent (negative
gradient).
Cost J(θ)
^
| .
| .
| .
| . ← Gradient steps
| .
| .
+----------------------------> θ (Parameter)

θ₀ θ₁ θ₂ θ₃

At each step:

 Compute the slope (gradient).


 Step in the opposite direction.
 Repeat until you converge near the minimum.

7. Explain various learning techniques involved in unsupervised


learning?

Unsupervised learning involves finding patterns or structures in unlabeled data — meaning


the algorithm is given data without explicit outputs or labels and must learn the underlying
structure on its own.

Here are the main learning techniques involved in unsupervised learning:

🔹 1. Clustering

Goal: Group similar data points into clusters.

 Description: The algorithm identifies natural groupings in the data based on


similarity or distance.
 Popular algorithms:
o K-Means: Partitions data into KKK clusters by minimizing intra-cluster
variance.
o DBSCAN: Groups points that are closely packed and marks outliers.
o Hierarchical Clustering: Builds a tree of clusters using a bottom-up
(agglomerative) or top-down (divisive) approach.

Applications: Market segmentation, social network analysis, image segmentation.

🔹 2. Dimensionality Reduction

Goal: Reduce the number of variables (features) while preserving important information.
 Description: Transforms high-dimensional data into a lower-dimensional form to
simplify models and visualize structure.
 Popular techniques:
o PCA (Principal Component Analysis): Finds directions (components) that
maximize variance.
o t-SNE / UMAP: Focus on preserving local and global structure in low
dimensions (great for visualization).
o Autoencoders: Neural networks that learn compact feature representations.

Applications: Data visualization, noise reduction, feature extraction.

🔹 3. Anomaly Detection

Goal: Identify data points that significantly deviate from the norm.

 Description: Assumes that most data points are normal, and those that don’t fit are
outliers or anomalies.
 Techniques:
o Statistical methods (e.g., Gaussian modeling)
o Isolation Forests
o Clustering-based approaches (e.g., points far from cluster centers)
o Autoencoders for reconstruction error

Applications: Fraud detection, fault diagnosis, network intrusion detection.

🔹 4. Association Rule Learning

Goal: Discover interesting relationships (rules) between variables in large datasets.

 Description: Finds frequent patterns or associations between variables/items.


 Popular algorithms:
o Apriori
o Eclat
o FP-Growth

Example Rule: “If a person buys bread and butter, they are likely to buy milk.”

Applications: Market basket analysis, recommendation systems.

8. What is Gaussian process? And explain in detail of Gaussian


parameter estimates with suitable examples

A Gaussian Process (GP) is a probabilistic model used in machine learning and statistics
for regression and classification tasks. It’s a non-parametric Bayesian approach, meaning
it defines a distribution over functions without explicitly specifying their form.

🧠 Intuition Behind Gaussian Process:

 Instead of predicting a single output for each input, GPs predict a distribution.
 Think of it as an extension of the Gaussian (Normal) distribution to functions:
o A Gaussian distribution describes uncertainty over values.
o A Gaussian process describes uncertainty over functions.

📊 Formal Definition:

A Gaussian Process is a collection of random variables, any finite number of which have a
joint Gaussian distribution.
So, we get both a mean prediction and a confidence interval.

Predicted function range:

Upper Confidence (mean + 2σ)

____/_____

/ \

| | ← Samples from GP

\____ __/

\__/

\
Lower Confidence (mean - 2σ)

Points: o o o

x=1 x=2 x=3

Why Gaussian Processes are Useful

 Provide uncertainty estimates along with predictions.


 Work well with small datasets.
 Great for Bayesian optimization and active learning.

9. Draw the architecture of a single layer perceptron (SLP) and explain


its operation. Mention its advantages and disadvantages

1. Architecture of Single Layer Perceptron

A Single Layer Perceptron (SLP) is the simplest form of a neural network. It consists of:

 Input layer (features)


 Weights (connections with adjustable values)
 Summation unit
 Activation function
 Output (binary or continuous depending on the task

Input Layer Output


x1 -------|
|
x2 -------|-------> (Σ wᵢ·xᵢ + b) ---> Activation ---> ŷ (output)
|
x3 -------|
Weights (w₁, w₂, w₃, ...)

 x1,x2,x3,...,xn : Inputs

 w1,w2,w3,...,.,wn: Weights

 b: Bias term

 f(⋅): Activation function (e.g., step, sigmoid)

 y^: Output
Advantages of SLP

 Simplicity: Easy to implement and train.


 Fast: Low computational cost.
 Theoretical Foundation: Basis of more complex neural networks.

Disadvantages of SLP

 Can only solve linearly separable problems.


o E.g., can't solve the XOR problem.
 Limited Learning Power: No hidden layers → can't capture complex patterns.
 Rigid Architecture: One layer, one output.
10.Develop a Back propagation algorithm for Multilayer Feed forward
neural network consisting of one input layer, one hidden layer and
output layer from first principles

1. Network Architecture

Assume:

 n: Number of input neurons


 m: Number of hidden neurons
6. Algorithm Summary

Backpropagation Algorithm Steps:

1. Initialize weights and biases randomly (small values)


2. For each training example:
o Perform a forward pass
o Compute loss
o Backpropagate the errors:
 Compute output error
 Compute hidden layer error
o Update weights and biases
3. Repeat over multiple epochs

PART C

1. Explain about EM algorithm.

The Expectation-Maximization (EM) algorithm is a powerful iterative


optimization algorithm used to find maximum likelihood estimates of parameters
in probabilistic models, especially when the data has missing or hidden (latent)
variables.

When is EM Used?

 In problems with incomplete data


 In models with latent variables (e.g., Gaussian Mixture Models, Hidden Markov
Models)
 When direct maximization of likelihood is difficult
Real-World Examples

 Clustering using Gaussian Mixture Models (GMM)


 Image segmentation
 Missing data imputation
 Speech recognition using Hidden Markov Models (HMM)

Basic Idea

We can’t directly maximize the likelihood because of missing or hidden data. EM breaks it
into two steps:

1. E-step (Expectation):

Estimate the missing data or latent variables using current parameter estimates.

2. M-step (Maximization):

Update the parameters to maximize the expected log-likelihood found in the E-step.

Repeat until convergence.

EM Algorithm: Step-by-Step

Let:

 X: Observed data
 Z: Latent (hidden) variables
 θ\: Parameters to estimate
EM Algorithm on Gaussian Mixture Model (GMM)

Suppose you have data that is generated from a mixture of Gaussians, but you don’t know:

 Which point belongs to which Gaussian


 The parameters (means, variances, weights)

You can use EM to:

1. E-step: Compute probability each data point belongs to each Gaussian


2. M-step: Use these probabilities to re-estimate the means, variances, and mixture
weights

Repeat until parameters converge.

Advantages of EM

 Can handle missing data


 Works well with latent variable models
 Guaranteed to converge (to local maximum)

Limitations

 May converge to local maxima (not global)


 Convergence can be slow
 Requires good initialization

2. Explain alpha-beta pruning algorithm and the Min max game


playing algorithm with example?

Minimax Algorithm

✅ What is Minimax?

The Minimax algorithm is a decision rule for minimizing the possible loss in a worst-case
scenario. It is used in two-player zero-sum games where:

 One player tries to maximize the score (MAX player)


 The other tries to minimize it (MIN player)

How It Works:

 The game is represented as a tree of possible moves.


 Terminal nodes (leaves) have evaluation scores.
 The algorithm recursively backs up values from the leaves to the root.

[MAX]
/ | \
/ | \
[MIN] [MIN] [MIN]
/\ /\ /\
3 5 2 9 12 5

teps:

 MIN nodes pick the minimum of their children.


 MAX picks the maximum among the MIN results.

pgsql
CopyEdit

Level 2 (MIN): [min(3,5)=3], [min(2,9)=2], [min(12,5)=5]


Level 1 (MAX): max(3, 2, 5) = 5

Alpha-Beta Pruning

✅ What is Alpha-Beta Pruning?

Alpha-Beta pruning is an optimization over the Minimax algorithm. It prunes parts of the
tree that won't affect the final decision, reducing time complexity.

 Alpha (α): Best value MAX can guarantee


 Beta (β): Best value MIN can guarantee

Pruning Rule:

 If β ≤ α, prune (stop exploring this node), because the opponent will never allow this
branch to happen.

[MAX]
/ \
[MIN] [MIN]
/ | \ / \
3 5 6 1 9

You might also like