0% found this document useful (0 votes)

32 views22 pages

23sp - Cs188-Sp23-Midterm-Solutions

solution

Uploaded by

amitbintariqul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views22 pages

23sp - Cs188-Sp23-Midterm-Solutions

solution

Uploaded by

amitbintariqul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

CS 188 Introduction to

Spring 2023 Artificial Intelligence Midterm

• You have 110 minutes.
• The exam is closed book, no calculator, and closed notes, other than two double-sided cheat sheets that you may reference.
• For multiple choice questions,
□ means mark all options that apply
# means mark a single choice

First name

Last name

SID

Name of person to the right

Name of person to the left

Discussion TAs (or None)

Honor code: “As a member of the UC Berkeley community, I act with honesty, integrity, and respect for others.”
By signing below, I affirm that all work on this exam is my own work, and honestly reflects my own understanding of the
course material. I have not referenced any outside materials (other than one double-sided cheat sheet), nor collaborated
with any other human being on this exam. I understand that if the exam proctor catches me cheating on the exam, that I
may face the penalty of an automatic "F" grade in this class and a referral to the Center for Student Conduct.

Signature:

Point Distribution

Q1. Search: Project Groups 16

Q2. Search: Color a Dinosaur 18
Q3. Logic: Map Coloring 18
Q4. CSPs: The Zookeeper 15
Q5. Games 16
Q6. MDPs: Flying Pacman 17
Total 100

1
Q1. [16 pts] Search: Project Groups
You’re taking a class with 𝑛 students, and you want to find a 4-person project group (you and 3 other students). You decide to
formulate your problem as a search problem, as follows:
• State space: An unordered set of students in the group so far. For example, {you, Inky, Blinky} and {you, Blinky, Inky}
are equivalent states.
• Successor function: You pick someone from the class (who is not already in your group), and add them to your group.
If your group already has 4 students, there are no further successor states.
• Start state: The project group is just you, and nobody else.
• Goal test: Your group has exactly 4 students, and everyone in the group gets along. (You can assume that the goal test is
somehow able to determine whether a group gets along.)

Consider drawing the search graph for this problem.

(a) [2 pts] How many nodes are in the search graph?

( )
Hint: Recall that 𝑘𝑛 = (𝑛−𝑘)!𝑘!
𝑛!
is the number of ways to choose 𝑘 elements from a set of 𝑛 elements. Also, 0! = 1.

# (𝑛 − 1)
# (𝑛 − 1)4
(𝑛−1)
# 3
(𝑛−1) (𝑛−1) (𝑛−1) (𝑛−1)
0
+ 1
+ 2
+ 3
# Infinitely many
Recall that each node of the search graph represents a state, so this question is asking how many states are in this problem.
The state space is a set of people in the group so far. This set could contain 1 person (you), or 2 people (you and one
other), or 3 people (you and two others), or 4 people (you and three others). Therefore, we need to count up how many
possible groups (always including yourself) of size up to 4 could be made from the class of 𝑛 students.
( )
There is 𝑛−10
= 1 possible 1-person group: it’s the group with just you.
(𝑛−1)
There are 1 = (𝑛 − 1) possible 2-person groups: you, plus any of the 𝑛 − 1 other people in class.
( )
There are 𝑛−1 2
possible 3-person groups: you, plus some (unordered) group of 2 people among the 𝑛 − 1 others in class.
(𝑛−1)
There are 3 possible 4-person groups: you, plus some (unordered) group of 2 people among the 𝑛 − 1 others in class.
Note that there are no states with groups with more than 4 people. The successor function will not generate groups with
more than 4 people.

(b) [2 pts] Does the search graph contain any cycles?

# Yes, the cycles allow us to backtrack from groups that don’t pass the goal test.
# Yes, the successor function allows us to move back-and-forth between two adjacent states.
# No, because search graphs by definition never contain cycles.
No, because it is only possible to move to states with more students in the group.
Each node in the graph represents a state, and each directed edge represents an action that takes you from one state to
another state.
In this problem’s successor function, the only possible action is to go from a smaller group to a larger group. It is not
possible to take an action that goes from a larger group back to a smaller group.
In other words, if you took all the nodes (states) in the graph and ordered them by number of people in the group so far,
all the directed edges would point from left to right.

Now, consider drawing the search tree for this problem.

2
(c) [2 pts] What is the maximum branching factor for this search tree?
(𝑛 − 1)
# (𝑛 − 1)4
(𝑛−1)
# 3
(𝑛−1) (𝑛−1) (𝑛−1) (𝑛−1)
# 0
+ 1
+ 2
+ 3
# Could be infinite
The branching factor represents the number of actions available from a state.
From the root (just you in the group), there is a choice of 𝑛 − 1 people to add to your group. Therefore, there are 𝑛 − 1
possible actions from the root, and the branching factor from the root is 𝑛 − 1.
Future nodes deeper in the tree will have fewer choices of students to add to your group. For example, at depth 1, where
the group has two people (you and one other), there is only a choice of 𝑛 − 2 people to add to your group. Therefore, the
branching factors of nodes deeper in the tree will be less than 𝑛 − 1, and the maximum branching factor is from the node
at the root.

(d) [2 pts] What is the maximum depth of this search tree, assuming the root node is at depth 0?
# 1
# 2
3
# 𝑛−1
# (𝑛 − 1)4
# Could be infinite

Each extra layer of the tree corresponds to an additional action, so the maximum depth of the tree corresponds to the
maximum number of actions that could be taken in this problem.
It takes 3 actions to create a group of 4 people, and after that, there are no more actions or successor states available.

Now, suppose that you want 𝑘 students in your project group (𝑘 < 𝑛).

Also, suppose that any group of 𝑘 students passes the goal test.

(e) [2 pts] As 𝑛 and 𝑘 grow very large, which search algorithm will expand fewer nodes to find a solution?
Depth-first tree search
# Breadth-first tree search
# Not enough information
In this problem, all the solutions to the problem are at depth 𝑘 − 1 in the search tree, because it takes 𝑘 − 1 actions to build
a group of 𝑘 students.
Also, any node at depth 𝑘 − 1 is a solution to the problem, because any group of 𝑘 students is a valid solution.
BFS will look through all the nodes at depth 1 (groups of 2), then all the nodes at depth 2 (groups of 3), then all the nodes
at depth 3 (groups of 4), etc. before ever looking at a node at depth 𝑘 − 1 (group of 𝑘).
On the other hand, DFS will repeatedly expand the deepest node, expanding a single depth-0 node, then a single depth-1
node, then a single depth-2 node, etc. until it pops the first depth 𝑘 − 1 node off the fringe and returns it as a solution.

(f) [2 pts] Approximately how many nodes will be expanded by the search algorithm you selected before it finds a solution?
𝑂(𝑘)
# 𝑂(𝑛)
# 𝑂(𝑛𝑘)
# 𝑂(𝑘𝑛 )

3
# 𝑂(𝑛𝑘 )
As discussed in the previous subpart, DFS will expand one node at each depth before finding a solution. The depth of the
tree is 𝑂(𝑘), so that’s how many nodes DFS will expand.

Now, suppose that some pairs of students in the class cannot work together because they dislike each other. We’d like to modify
our search problem definition to account for this.

(g) [4 pts] Which of the following changes would, if implemented individually, ensure that any complete search algorithm
still finds a correct solution, if one exists?
■ Modify the goal test to check that no such pairs exist in the state being tested.
□ Modify the successor function so that it only adds students who can work with anybody.
■ Modify the successor function so that it only adds students who can work with anybody already in the group.
□ Require that states be ordered lists rather than sets.
# None of the above. A search algorithm cannot solve this problem. You would need a CSP algorithm.
The additional information about which students cannot work together could be added to the goal test (first option) or the
successor function (third option).
The second option fails because it will prevent the algorithm from finding some solutions. For example, if half the class
refuses to work with the other half and vice versa, a solution still exists in either half if it is big enough. However, the
successor function will be unable to generate any new successors, because everybody in the class has at least one person
they cannot work with (i.e. there is no person that can work with anybody else in class).
Making states ordered lists does not encode any information about which pairs of students cannot work together, so the
last option is false.

4
Q2. [18 pts] Search: Color a Dinosaur
Consider a computer screen with a grid of squares.

We start with a blank grid and the mouse cursor in the bottom-left square. At each time step, we have several actions to choose
from, each costing 1:
• Move the mouse cursor to an adjacent square (north, south, east, west).
• Click the mouse to fill in the currently selected square.
For example, starting from the bottom-left, the action sequence East, East, Click, North, Click produces the grid in Figure 1(a).

(a) Configuration after East, East, (b) A possible goal configuration. (c) Another possible goal configu-
Click, North, Click. The arrow de- The cursor location is ignored in the ration.
notes the mouse cursor. goal test.

Figure 1: Examples for the coloring problem.

We are provided with a list of squares, and we’d like to color all of those squares to form a picture, such as the dinosaur in
Figure 1(b) or the rectangle in Figure 1(c), using the fewest number of actions. (Don’t laugh! Pixels are expensive!)

A square is unfinished if it still needs to be colored.

For each piece of information listed below, select the part of the search problem definition where that information should be
stored.

(a) [1 pt] The mouse cursor starts in the bottom-left square.

# State space # Goal test

# Successor function Start state

The information about how the world looks when the search problem starts is encoded in the start state.

(b) [1 pt] The current location of the mouse cursor.

State space # Goal test

# Successor function # Start state

The current location of the mouse cursor changes depending on the sequence of actions we’ve taken so far. This belongs
in the state space.

(c) [1 pt] The list of squares colored so far.

State space # Goal test

# Successor function # Start state

The list of squares colored so far changes depending on the sequence of actions we’ve taken so far. This belongs in the
state space.

(d) [1 pt] The original list of squares that need to be colored.

5
# State space Goal test
# Successor function # Start state

In our goal test, we need to check if the current state (list of squares colored so far) has matches the desired picture (original
list of the squares that need to be colored).

6
For each heuristic provided, select whether it is admissible, and whether it is consistent.

(e) [3 pts] ℎ1 = Number of unfinished squares

Both admissible and consistent # Consistent, but not admissible

# Admissible, but not consistent # Neither admissible nor consistent

We need to click at least this many more times, so the true cost must be at least as great than this (since we need to also
add extra actions to move the mouse around).
This heuristic drops by at most 1 after each action; each action costs 1; so the triangle inequality is satisfied and the
heuristic is consistent.
In other words, the difference in heuristics between two adjacent states will never exceed 1. Therefore, the difference in
heuristics between two adjacent states will never overestimate the true cost between those two states (which is always 1).

(f) [3 pts] ℎ2 = 2 times the number of unfinished squares

# Both admissible and consistent # Consistent, but not admissible

# Admissible, but not consistent Neither admissible nor consistent

If there is one unfinished square and the cursor is already in the right location, then the true cost is 1 but the heuristic says
2, so it is not admissible. Since consistent implies admissible, it cannot be consistent either.

(g) [3 pts] ℎ3 = maximum Manhattan distance between any two unfinished squares

# Both admissible and consistent # Consistent, but not admissible

Admissible, but not consistent # Neither admissible nor consistent

We need to at least cover the maximum distance between any two unfinished squares. Therefore this is an underestimate
of the true cost and admissible.
The true action cost is 1. The maximum Manhattan distance between any pair of unfinished squares could drop drastically
when a square is finished, which means that this heuristic might overestimate the true cost of an action and does not satisfy
the triangle inequality.

(h) [2 pts] Consider ℎ1 and ℎ3 . Select all true statements:

□ ℎ1 dominates ℎ3 .
□ ℎ3 dominates ℎ1 .
■ Neither ℎ1 nor ℎ3 dominates the other.
■ If ℎ1 and ℎ3 are admissible, ℎ∗ dominates max(ℎ1 , ℎ3 ).
# None of the above.

Example where ℎ1 > ℎ3 : Consider a 4x4 block that still needs to be fully colored. Then ℎ1 = 16, but ℎ3 = 1.
Example where ℎ3 > ℎ1 : Suppose we still need to color two squares, spaced 100 squares apart from each other. Then
ℎ1 = 2, but ℎ2 = 100.
Because ℎ1 > ℎ3 is not always true, we cannot say that ℎ1 dominates ℎ3 . Similarly, because ℎ3 > ℎ1 is not always true,
we cannot say that ℎ3 dominates ℎ1 .
If ℎ1 and ℎ3 are both admissible, then max(ℎ1 , ℎ3 ) is admissible. By definition, ℎ∗ is the largest admissible heuristic.
Therefore, we have ℎ∗ > max(ℎ1 , ℎ3 ), which means that ℎ∗ dominates max(ℎ1 , ℎ3 ).

(i) [3 pts] Consider another heuristic:

ℎ4 = Manhattan distance to nearest unfinished square + 2 times the number of unfinished squares
Select all true statements:

7
□ Greedy tree search with ℎ4 will always find a solution (possibly not optimal) if one exists.
■ Greedy graph search with ℎ4 will always find a solution (possibly not optimal) if one exists.
■ If the goal is the rectangle in Figure 1(c), greedy tree search with ℎ4 will return the optimal solution.
# None of the above.
The first option is false. Consider a case with two unfinished squares, both very far away. Greedy search might move to
a square, and then avoid clicking it (because that would make ℎ4 increase by a lot), and move towards the other square.
Then, once the cursor reaches the other square, greedy search will again avoid clicking this square (which would increase
ℎ4 by a lot), and start moving back toward the original square. The path explored by greedy search will move back and
forth between the two squares forever and never terminate.
If the goal is to draw a rectangle, this greedy search will immediately move the mouse to the nearest unfinished square,
and then click on that square. Then, this search will move to an adjacent unfinished square (we know there has to be
an adjacent unfinished square because this is a rectangle), and click that square. This search will repeatedly move to an
adjacent unfinished square and click that square, moving around the perimeter of the rectangle until the entire rectangle
is drawn. This strategy of moving to the closest unfinished square, then clicking around the rectangle, happens to be the
optimal solution.
For any general picture, this greedy search may not find the optimal solution. For example, with a solid rectangle with 𝐾
squares, the optimal solution is the distance to the rectangle plus 2𝐾 − 1 actions to color it efficiently (e.g., in concentric
layers). (Side note: We get 2𝐾 − 1 from 𝐾 clicks, and 𝐾 − 1 moves to reach all the unfinished squares, since if you move
efficiently, every move will get you to an unfinished square.) Greedy might color a line across the middle of the rectangle;
then, no matter what, it will eventually have to cross over the already-colored middle line to finish the pattern, which is
inefficient.

8
Q3. [18 pts] Logic: Map Coloring
We’d like to express the map coloring problem in propositional logic. Consider the following map:

We’d like to assign colors to nodes 𝑋, 𝑌 , and 𝑍 such that no two adjacent nodes have the same color.

We have three colors available to use: 𝑅 (Red), 𝐺 (Green), and 𝐵 (Blue).

Let the proposition symbol 𝑁𝐶 represent node 𝑁 being assigned color 𝐶. For example, 𝑋𝑅 = 𝑡𝑟𝑢𝑒 means that node 𝑋 is colored
Red.

(a) [2 pts] How many symbols are needed to express this problem in propositional logic?
Your answer should be an integer.

For each of the 3 nodes, we need 3 symbols (one per color).

The full list of symbols is: 𝑋𝑅 , 𝑋𝐺 , 𝑋𝐵 , 𝑌𝑅 , 𝑌𝐺 , 𝑌𝐵 , 𝑍𝑅 , 𝑍𝐺 , 𝑍𝐵 .

(b) [1 pt] For a general map coloring problem with 𝑛 nodes and 𝑐 colors, how many symbols are needed to express the problem
in propositional logic?
Your answer should be an expression, possibly in terms of 𝑛 and/or 𝑐.

𝑛𝑐

For each of the 𝑛 nodes, we need 𝑐 symbols (one per color).

Now we’ll formulate several logical sentences that represent this map coloring problem.

(c) [2 pts] Below is a sentence that represents part of this map coloring problem. Fill in the blanks with ∨ or ∧.
( ) ( ) ( )
𝑋𝑅 ______ 𝑋𝐺 ______ 𝑋𝐵 ______ 𝑌𝑅 ______ 𝑌𝐺 ______ 𝑌𝐵 ______ 𝑍𝑅 ______ 𝑍𝐺 ______ 𝑍𝐵
( ) ( ) ( )
𝑋𝑅 ∨ 𝑋𝐺 ∨ 𝑋𝐵 ∧ 𝑌𝑅 ∨ 𝑌𝐺 ∨ 𝑌𝐵 ∧ 𝑍𝑅 ∨ 𝑍𝐺 ∨ 𝑍𝐵

(d) [2 pts] What does the sentence in the previous subpart represent? You can answer in 10 words or fewer.

Each node must be colored.

( )
In English: 𝑋𝑅 ∨ 𝑋𝐺 ∨ 𝑋𝐵 says that at least one of 𝑋𝑅 , 𝑋𝐺 , 𝑋𝐵 must be true, i.e. 𝑋 must take on at least one of the
colors. The clauses for 𝑌 and 𝑍 follow the same reasoning.
We AND the three clauses together because all three conditions must be true: 𝑋 must take on at least one color, and 𝑌
must take on at least one color, and 𝑍 must take on at least one color.

(e) [2 pts] Write a logical sentence that encodes the constraint “𝑋 and 𝑌 cannot have the same color”. Your sentence does
not need to be in CNF.

¬(𝑋𝑅 ∧ 𝑌𝑅 ) ∧ ¬(𝑋𝐺 ∧ 𝑌𝐺 ) ∧ ¬(𝑋𝐵 ∧ 𝑌𝐵 )

In English: ¬(𝑋𝑅 ∧ 𝑌𝑅 ) says that (𝑋𝑅 ∧ 𝑌𝑅 ), the case where 𝑋 and 𝑌 are both Red, cannot be true. The clauses for “both
can’t be Green” and “both can’t be Blue” follow the same reasoning.

9
We AND the three clauses together because all three conditions must be true: 𝑋 and 𝑌 can’t both be Red, and they can’t
both be Green, and they can’t both be Blue.
Alternate answer (listing the allowed pairwise assignments to 𝑋 and 𝑌 ):
(𝑋𝑅 ∧ 𝑌𝐺 ) ∨ (𝑋𝐺 ∧ 𝑌𝑅 ) ∨ (𝑋𝑅 ∧ 𝑌𝐵 ) ∨ (𝑋𝐵 ∧ 𝑌𝑅 ) ∨ (𝑋𝐺 ∧ 𝑌𝐵 ) ∨ (𝑋𝐵 ∧ 𝑌𝐺 )

(f) [2 pts] Suppose we’ve additionally added sentences that encode “𝑋 and 𝑍 cannot have the same color,” and “𝑌 and 𝑍
cannot have the same color.”
This problem is still missing some sentence(s). Describe the missing sentence(s) (you can answer in 10 words or fewer):

Each node can have at most one color.

Note that the sentence in subpart (c) only guarantees that each node takes on at least one color (and not zero colors). It
could be the case that a node takes on two colors, but this sentence is still satisfied: for example, 𝑋𝑅 and 𝑋𝐺 could both
be true. The sentence would still evaluate to true, but this would correspond to 𝑋 being colored both Red and Green,
which isn’t allowed.
Therefore, we need a sentence that says each node can have at most one color (and not more than one color).

(g) [1 pt] How many symbols are assigned to true in an assignment that solves the problem?
Your answer should be an integer.

For each of the 3 nodes, exactly one of the symbols (corresponding to one of the colors) will be assigned to true, which
tells us what color is assigned to that node.

Pacman would like to prove the following statement: “If 𝑋 is colored Red in this problem, then 𝑌 will not be colored Red,”
using only a SAT solver.

(h) [3 pts] In addition to the sentences defining the problem from subparts (c), (e), and (f), which other clauses should be part
of the input to the SAT solver to ensure that Pacman gets the right answer? Select all that apply.
■ (𝑋𝑅 )
□ (¬𝑋𝑅 )
■ (𝑌𝑅 )
□ (¬𝑌𝑅 )
□ (¬𝑋𝑅 ∨ ¬𝑌𝑅 )
□ (𝑋𝑅 ∨ 𝑌𝑅 )
# None of the above.
We want to prove that 𝐾𝐵 ⊧ 𝛼, where 𝛼 is (𝑋𝑅 ⟹ ¬𝑌𝑅 ). To do that, we give the SAT solver 𝐾𝐵 ∧ ¬𝛼 and show it’s
unsatisfiable. Now
𝐾𝐵 ∧ ¬𝛼 ≡ 𝐾𝐵 ∧ ¬(𝑋𝑅 ⟹ ¬𝑌𝑅 ) ≡ (𝐾𝐵 ∧ 𝑋𝑅 ∧ 𝑌𝑅 )
so we add both 𝑋𝑅 and 𝑌𝑅 .
Another way to think of this solution is to note that we’re proving the statement via contradiction. In other words, we’re
trying to find a scenario where 𝑋 is colored Red and 𝑌 is also colored Red. This corresponds to the two clauses selected
in the solution. If the SAT solver is unable to find such a contradiction, then we know that the statement is true. However,
if the SAT solver does find a scenario where 𝑋 is Red and 𝑌 is Red, then we’ve found a contradiction and the statement
must be false.

(i) [3 pts] Select all true statements:

■ Every discrete, finite CSP can be represented as a SAT problem.
■ Every SAT problem can be represented as a CSP.

10
■ A correct SAT representation of a discrete, finite CSP has exactly the same number of satisfying assignments
as the CSP has distinct solutions.
# None of the above.
The process we used to convert the map coloring problem to a SAT problem can be generalized to all discrete, finite
CSPs. We create symbols representing every possible assignment to every variable. Then we add sentences saying that
each variable can only take on exactly one value. Finally, we write sentences with the symbols that encode the constraints
between variables. We know that it will always be possible to encode constraints with the symbols because in the worst
case, we could explicitly write the constraint by listing out all possible assignments that satisfy the constraint, and then
express this list of possible assignments as a logical sentence.
To represent a SAT problem as a CSP, we make the symbols into variables. Each variable takes on either true or false as
values. Then, we encode the clauses in the SAT problem as constraints in the CSP.
If we use the process described above to represent a CSP as a SAT problem, then each solution to the CSP corresponds
to exactly one satisfying assignment of the SAT problem. In other words, for each satisfying assignment we find, we can
read off that assignment to get exactly one assignment to all the variables that satisfies all constraints, i.e. one solution to
the CSP.

11
Q4. [15 pts] CSPs: The Zookeeper
You are a newly appointed zookeeper, and your first task is to find rooms for all of the animals.

The zoo has three animals: the Iguana (𝐼), Jaguar (𝐽 ), and Koala (𝐾).

Each animal needs to be assigned to one of four rooms: the North room (N), East room (E), South room (S), or West room (W),
subject to the following constraints:
1. The jaguar cannot share a room with any other animal.
2. The iguana and koala must be in different rooms.
3. The koala can only be in the East room or the South room.

(a) [2 pts] Consider the first constraint: “The jaguar cannot share a room with any other animal.”
Can this constraint be expressed using only binary constraints on the three variables 𝐼, 𝐽 , 𝐾?
# Yes, it can be expressed as 1 binary constraint.
Yes, it can be expressed as 2 different binary constraints.
# Yes, it can be expressed as 4 different binary constraints.
# No, this is necessarily a unary constraint.
# No, this is necessarily a higher-order constraint.
Recall that a binary constraint relates two variables (in this problem, two animals). We can write this constraint as two
different binary constraints: The jaguar cannot share a room with the iguana. The jaguar cannot share a room with the
koala. 𝐽 ≠ 𝐼 and 𝐽 ≠ 𝐾.

(b) [3 pts] Suppose we enforce unary constraints, and then assign the jaguar to the South room. The remaining values in each
domain would be:
Iguana: North, East, South, West
Jaguar: South
Koala: East, South

In the table below, mark each value that would be removed by running forward-checking after this assignment.

North East South West

Iguana □ □ ■ □
Koala □ ■
The main constraint relating the jaguar to the other variables is the first constraint. Since the jaguar has been assigned to
the South room, no other animal can be assigned to the South room. Forward checking does not propagate any further.
Formally, forward checking considers every other variable in relation to the variable that was just assigned (here, that’s
𝐽 ).
Consider 𝐼 and 𝐽 . Only the first constraint is relevant with this pair of variables. Looking through the domain of 𝐼, the
only value in 𝐼’s domain that is inconsistent with the newly-assigned value of 𝐽 is South.
Similarly, consider 𝐾 and 𝐽 . Only the first constraint is relevant with this pair of variables. Looking through the domain
of 𝐾, the only value in 𝐾’s domain that is inconsistent with the newly-assigned value of 𝐽 is South.
Forward checking does not check pairs of variables that have not been assigned yet, such as 𝐼 and 𝐾. It only checks the
pairs 𝑛 and 𝐽 , where 𝑛 could be all other variables.

(c) [3 pts] Regardless of your answer to the previous subpart, suppose we’ve done some filtering and the following values
remain in each variable’s domain:
Iguana: East, West
Jaguar: South
Koala: East

12
Are all the arcs in this CSP consistent?
# Yes, all arcs are consistent.
# No, only 𝐾 → 𝐼 is not consistent.
No, only 𝐼 → 𝐾 is not consistent.
# No, 𝐾 → 𝐼 and 𝐼 → 𝐾 are both not consistent.
# No, only 𝐽 → 𝐼 is not consistent.
# No, only 𝐼 → 𝐽 is not consistent.
# No, 𝐽 → 𝐼 and 𝐼 → 𝐽 are both not consistent.

The 𝐼 → 𝐾 arc is not consistent. If we assign Iguana to East, there is no valid assignment to Koala that would satisfy the
constraints (because of constraint 2).
All of the other arcs in this CSP are consistent. We can go through and check the arcs listed as options in this subpart:
𝐾 → 𝐼 is consistent because if we assign East at 𝐾, we can still assign West at 𝐼.
𝐽 → 𝐼 is consistent because if we assign South at 𝐽 , we can still assign East or West at 𝐼.
𝐼 → 𝐽 is still consistent because if we assign East at 𝐼, South is a valid assignment at 𝐽 . Also, if we assign West at 𝐼,
South is a valid assignment at 𝐽 .
For completeness, here are the remaining arcs:
𝐽 → 𝐾 is consistent. If we assign South at 𝐽 , it’s fine to assign East at 𝐾.
𝐾 → 𝐽 is consistent. If we assign East at 𝐾, it’s fine to assign South at 𝐽 .

13
The constraints, repeated here for your convenience:

1. The jaguar cannot share a room with any other animal.

2. The iguana and koala must be in different rooms.
3. The koala can only be in the East room or the South room.

(d) [2 pts] Regardless of your answer to the previous subpart, suppose we start over and just enforce the third constraint. Then
the remaining values in each domain are:

Iguana: North, East, South, West

Jaguar: North, East, South, West
Koala: East, South

What does the minimum remaining values (MRV) heuristic suggest doing next?

# Assign North or West to a variable next.

# Assign East or South to a variable next.
Assign a value to Koala next.
# Assign a value to Iguana or Jaguar next.
MRV tells us which variable to assign next. In particular, MRV suggests assigning the variable with the fewest remaining
values in its domain, which here is Koala (with just 2 values left in its domain, compared to 4 values left in the domains
of the other variables).
MRV does not tell us which value to assign next, so we can rule out options 1 and 2 (which suggest values to assign).

(e) [2 pts] Again, consider the CSP after just enforcing the third constraint:

Iguana: North, East, South, West

Jaguar: North, East, South, West
Koala: East, South

Which assignment would the least constraining value (LCV) heuristic prefer?
Assign North to Jaguar.
# Assign East to Jaguar.
# LCV is indifferent between these two assignments.
LCV suggests assigning the value that deletes the fewest values in the domains of other variables.
Assuming we use forward-checking to filter, assigning North to Jaguar causes Iguana to lose North as an option. Assigning
East to Jaguar causes both Iguana and Koala to lose East as an option.
Assigning North to Jaguar deletes the fewest values, so LCV gives this assignment higher priority.
Intuitively, North is a less constraining option than East for 𝐽 , because North leaves 𝐾 with two options, while East only
leaves 𝐾 with one option. Both North and East for 𝐽 leave 𝐼 with three options.

(f) [3 pts] Suppose we add another constraint:

“The West room can contain at most one animal.”
Can this constraint be expressed using only binary constraints on the three variables 𝐼, 𝐽 , 𝐾?
# Yes, it can be expressed as 1 binary constraint.
# Yes, it can be expressed as 2 different binary constraints.
Yes, it can be expressed as 3 different binary constraints.
# No, this is necessarily a higher-order constraint.

14
Recall that a binary constraint relates two variables (in this problem, two animals). We can write this constraint as three
different binary constraints: ¬(𝐽 = 𝑤 ∧ 𝐼 = 𝑤) and ¬(𝐽 = 𝑤 ∧ 𝐾 = 𝑤) and ¬(𝐾 = 𝑤 ∧ 𝐼 = 𝑤).
In other words, we go through all pairs of variables, and enforce that no pair of variables can both be assigned to West.
There are 3 possible pairs of variables that need to be considered in this problem.
“The iguana and jaguar cannot both be in the West room.”
“The jaguar and koala cannot both be in the West room.”
“The iguana and koala cannot both be in the West room.”

15
Q5. [16 pts] Games
Consider the following game tree where upward triangle nodes are max nodes, and downward triangle nodes are min nodes:

B C

D E F

Y 5 3 6 14

(a) [1 pt] If 𝑌 = 4, what is the minimax value at the root node 𝐴?

𝐷 = 4, 𝐸 = 3, 𝐹 = 6
𝐵 = 4, 𝐶 = 6
𝐴=4

(b) [3 pts] What range of values for 𝑌 would cause the minimax value at the root 𝐴 to be equal to 𝑌 ?
Your answer should be an inequality using ≤, such as 𝑌 ≤ 188, or 188 ≤ 𝑌 ≤ 288, or 388 ≤ 𝑌 ≤ 388, or “impossible”
(if there are no such values for 𝑌 ).

3≤𝑌 ≤5

A useful way to work through these questions is to consider different values of 𝑌 , and eventually generalize to different
ranges of 𝑌 . For example, you could start with a very negative value like 𝑌 = −1000 and see if the minimax value at 𝐴
is equal to 𝑌 . Then, you could start generalizing by noting that every value of 𝑌 < 3 actually results in the same behavior
(there was nothing special about −1000 compared to any other negative number; the important thing was that it was less
than all the other terminal node values).
If 𝑌 < 3, then the value at 𝐷 will be equal to 𝑌 . Therefore, we have 𝐷 < 3. Then, the max node at 𝐵 will choose the
branch to 𝐸 (to get value 3) and never choose the branch 𝐷 (which gets value < 3). In summary, if 𝑌 < 3, the branch to
𝑌 will not be chosen, and the value at the root cannot be 𝑌 .
The only way for the max node at 𝐵 to choose the branch to 𝐷 over the branch to 𝐸 is if 𝐷 ≥ 3. Therefore, we at least
have 𝑌 ≥ 3. (In other words, since we ruled out all 𝑌 < 3 values in the previous paragraph, we’ve narrowed down our
range to 𝑌 ≥ 3.)
The next interesting value to consider is 𝑌 = 5, since this is the point at which different branches start getting chosen. (In
other words, 𝑌 = 3, 𝑌 = 3.001, 𝑌 = 4, etc. will not change the branches chosen.) In particular, when 𝑌 exceeds 5, 𝐷
start to choose the 5 branch instead of the 𝑌 branch. In summary, if 𝑌 > 5, then 𝐷 will choose the 5 branch instead of 𝑌 ,
which prevents 𝑌 from being the value at the root. Therefore, we also have 𝑌 ≤ 5.
At this point, we’ve narrowed down our range to 3 ≤ 𝑌 ≤ 5. The last step is to verify that all the values in this range
actually cause the root value to equal 𝑌 .
From above, we already reasoned that 3 ≤ 𝑌 ≤ 5 will cause the value at 𝐵 to be 𝑌 . We know that 𝐶 = 6, so if 3 ≤ 𝑌 ≤ 5,
the min node at 𝐴 will choose 𝐵 over 𝐶. This confirms that the root value at 𝐴 will be 𝑌 (and not 6 from choosing 𝐶),
and we’re done.

(c) [4 pts] Assume that 𝑌 = 4. If we run minimax with alpha-beta pruning, visiting nodes from left to right, which terminal
nodes will we not visit because of pruning? Select all that apply.

16
□ 𝑌 □ 6
□ 5 □ 14
□ 3 None of the above (we visit all terminal nodes).

One useful way to work through alpha-beta pruning is to think: if hypothetically, this terminal node had a very positive
or very negative value, would that affect the branches chosen in the tree? If the value at the terminal node could change
the branches chosen, then we have to check that terminal node and we cannot prune it.
We have to visit all the terminal nodes in the left-most branch (𝑌 and 5) to determine the value of 𝐷 first. Without checking
this first set of terminal nodes, we don’t have any bounds on the min or max nodes to prune with.
We have to visit the terminal node 3. This value could have been something very positive like 1000, which would have
changed the value at 𝐵. (𝐵 would have chosen to go to 𝐸 instead of 𝐷.)
We have to visit the terminal nodes 6 and 14, because these could have been very negative values like −1000, which would
have changed the value at 𝐴. (𝐴 would have chosen to go to 𝐶 instead of 𝐵.)

(d) [2 pts] When is the statement 𝐴 ≤ max(𝑌 , 14) true?

Always true. # Sometimes true. # Never true.

The quickest way to note that this is true is to note that 𝐴 must eventually take on one of the terminal node values, since
we’re just repeatedly applying mins and maxes to the terminal node values.
Therefore, 𝐶 is upper-bounded by the largest of the terminal node values, which happens to be max(𝑌 , 14).

(e) [2 pts] The root of this tree is a min node. You would like to represent this same game with a new tree that has a max
node at the root.
Select all changes that you would need to make in the new tree.
■ Convert all min nodes to max nodes.
■ Convert all max nodes to min nodes.
□ Add a layer of chance nodes to the new tree.
■ Multiply each of the terminal node values by −1.
# None of the above - it is impossible to represent this game with a new tree with a max node at the root.
This game has two agents: one agent is represented by the min nodes, and the other agent is represented by the max nodes.
This game is zero-sum. In a zero-sum game, the min and max agents have exactly opposite utilities. We denote this in
the game tree by writing out the utilities for one player (the max agent) at the terminal nodes. Then, we assume that the
max agent picks higher numbers and the min agent picks lower numbers. This correctly represents the min agent’s utility
because the min agent’s utility is the negative of the numbers at the terminal nodes, and maximizing the negative of the
numbers shown is equivalent to minimizing the numbers shown. As a concrete example: the min agent prefers 3 over 5.
This represents the fact that “3” on the tree is actually worth −3 to the min agent, and “5” on the tree is actually worth −5
to the min agent. Therefore, the min agent prefers the node labeled 3 because it gives the min agent higher utility.
There was nothing special about which agent we chose to be the max agent and which agent we chose to be the min agent;
the important thing was just that their utilities are opposites of each other. We can reinterpret the min agent as a max agent,
and reinterpret the max agent as a min agent, without changing the game being represented. This involves converting all
min nodes to max nodes, and converting all max nodes to min nodes.
In the original tree, the min agent was trying to minimize the numbers in the tree. Because we’ve swapped this agent to a
max agent looking to maximize the numbers in the tree, we can multiply all the numbers in the tree by −1 to preserve the
same ordering of preferences. (As a concrete example: in the original tree, the min agent preferred 3 over 5. In the new
tree, this is now a max agent that prefers −3 over −5.)
Multiplying the numbers by −1 also preserves the same ordering of preferences for the max agent who has now been
swapped into a min agent. In the original tree, the max agent preferred 5 over 3. In the new tree, this now a min agent
that prefers −5 over −3.

17
(f) [4 pts] Suppose the max nodes 𝐵 and 𝐶 are replaced with chance nodes that select actions uniformly at random. What
range of values for 𝑌 would cause the value at the root 𝐴 to be equal to 𝑌 ?
Your answer should be an inequality using ≤, such as 𝑌 ≤ 188, or 188 ≤ 𝑌 ≤ 288, or 388 ≤ 𝑌 ≤ 388, or “impossible”
(if there are no such values for 𝑌 ).

3≤𝑌 ≤3

Solution 1:
For 𝑌 > 5, 𝐷 = 5, 𝐵 = 4, and 𝐴 = 4, so there is no value of 𝑌 such that 𝐶 = 𝑌 .
For 𝑌 ≤ 5, 𝐷 = 𝑌 and 𝐵 = (𝑌 +3)∕2. This is always less than 6, so 𝐴 = (𝑌 +3)∕2. For 𝐴 = 𝑌 , we must have 𝑌 = (𝑌 +3)∕2,
from which we obtain 𝑌 = 3.
Solution 2:
First, note that because 𝐵 and 𝐶 are chance nodes, it is no longer the case that 𝐴 is always exactly equal to one of the
terminal node values. In particular, 𝐵 is always going to be the average of 3 and the value at 𝐷. Therefore, we either have
𝐵 = (3 + 5)∕2, if 5 is selected at 𝐷, or we have 𝐵 = (3 + 𝑌 )∕2, if 𝑌 is selected at 𝐷.
In the case of 𝐵 = (3 + 5)∕2 = 4, the value at 𝐴 will also be 4. We’re looking for cases where the value at 𝐴 is equal to
𝑌 , so we check if it’s possible that 𝑌 = 4 here. However, in this case, 𝑌 could not have been 4, because we assumed that
𝐷 chose 5 over 𝑌 (and that would not happen if 𝑌 = 4). You could also just plug 𝑌 = 4 into the tree and notice that the
root value is not 4.
In the case of 𝐵 = (3 + 𝑌 )∕2, the value at 𝐴 could also potentially be (3 + 𝑌 )∕2 (depending on how this value compares
to 6, the value at 𝐶). As before, we check if it’s possible that 𝑌 = (3 + 𝑌 )∕2. Solving this equation gives us 𝑌 = 3.
Plugging this into the tree, we see that 𝐴 indeed chooses 𝐵 = 3 over 𝐶 = 6.
Finally, we can confirm that only one value of 𝑌 works, not a range. If we changed 𝑌 even a little bit to 𝑌 = 3.001, then
the value at 𝐵 (and later, 𝐴) would be the average (3 + 3.001)∕2, which is not equal to 𝑌 = 3.001.

18
Q6. [17 pts] MDPs: Flying Pacman
Pacman is in a 1-dimensional grid with squares labeled 0 through 𝑛, inclusive, as shown below:

0 1 2 3 4 5 n-1 n

Pacman’s goal is to reach square 𝑛 as cheaply as possible. From state 𝑛, there are no more actions or rewards available.

At any given state, if Pacman is not in 𝑛, Pacman has two actions to choose from:
• Run: Pacman deterministically advances to the next state (i.e. from state 𝑖 to state 𝑖 + 1). This action costs Pacman $1.
• Fly: With probability 𝑝, Pacman directly reaches state 𝑛. With probability 1 − 𝑝, Pacman is stuck in the same state. This
action costs Pacman $2.

(a) [3 pts] Fill in the blank boxes below to define the MDP. 𝑖 represents an arbitrary state in the range {0, … , 𝑛 − 1}.

𝑠 𝑎 𝑠′ 𝑇 (𝑠, 𝑎, 𝑠′ ) 𝑅(𝑠, 𝑎, 𝑠′ )

𝑖 Run 𝑖+1 1.0 −1

𝑖 Fly 𝑖 1−𝑝 −2

𝑖 Fly 𝑛 𝑝 −2

𝑠′ is the resulting successor state after taking an action from a certain state. From state 𝑖, taking Fly either keeps you stuck
in state 𝑖 (row 2), or takes you directly to state 𝑛 (the blank in row 3).
𝑇 (𝑠, 𝑎, 𝑠′ ) represents the probability of starting in 𝑠, taking action 𝑎, and landing in 𝑠′ .
Row 1: The action Run is deterministic, so the probability of being in 𝑖, taking action Run, and landing in 𝑖 + 1 is 100%
or 1.0.
Rows 2-3: Being in 𝑖 and taking action Fly lands you in state 𝑖 with probability 1 − 𝑝, and lands you in state 𝑛 with
probability 𝑝.
Note that 𝑅(𝑠, 𝑎, 𝑠′ ) is negative because we’re trying to minimize costs. Representing costs as negative reward forces an
MDP (which maximizes rewards) to minimize those costs.
As a concrete example: this MDP prefers reward −1 over reward −2 because −1 > −2. This is consistent with the problem
statement, where Pacman would prefer to pay $1 over paying $2.
If we left the rewards positive, this MDP would prefer reward +2 over reward +1. This would be inconsistent with
Pacman’s preference of spending as little money as possible.

For the next three subparts, assume that 𝛾 = 1.

Let 𝜋𝑅 denote the policy of always selecting Run, and 𝜋𝐹 denote the policy of always selecting Fly.

Compute the values of these two policies. Your answer should be an expression, possibly in terms of 𝑛, 𝑝, and/or 𝑖.

(b) [2 pts] What is 𝑉 𝜋𝑅 (𝑖)?

𝑖−𝑛

This expression represents the expected reward of starting in state 𝑖 and taking action Run indefinitely.
If we start in 𝑖 and Run indefinitely, we will deterministically move to 𝑖 + 1, 𝑖 + 2, … until we arrive in state 𝑛. This takes
𝑛 − 𝑖 Run actions.
Each Run action incurs a reward of −1, and there’s no discount (𝛾 = 1), so the total reward of following this policy is
(−1) ⋅ (𝑛 − 𝑖) = 𝑖 − 𝑛.

19
(c) [2 pts] What is 𝑉 𝜋𝐹 (𝑖)?
∑∞ 𝑘−1 𝑝
Hint: Recall that the mean of a geometric distribution with success probability 𝑝 is 𝑘 = 1 𝑘(1 − 𝑝) = 1∕𝑝.

−2∕𝑝

We’re asked to calculate the expected total reward of starting in state 𝑖 and taking action Fly indefinitely.
At each time step, with probability 𝑝, we land in state 𝑛, and with probability 1 − 𝑝, we stay in the same place. On average,
using the formula given, it will take 1∕𝑝 Fly actions before one succeeds and we reach state 𝑛. Each of those actions costs
−2, and there’s no discount, so the total reward of following this policy is (−2) ⋅ (1∕𝑝) = −2∕𝑝.
Formally, to see how the formula results in 1∕𝑝 Fly actions on average, we can write out all the possible outcomes of
trying Fly indefinitely:
Case I: Our first Fly action succeeds (takes us to 𝑛). This happens with probability 𝑝, and results in a total of 1 action
taken.
Case II: Our first Fly action fails (we’re stuck), but our second Fly action succeeds. This happens with probability (1−𝑝)𝑝,
where the 1 − 𝑝 comes from failing once and the 𝑝 comes from succeeding. This results in a total of 2 actions taken.
Case III: We fail twice, and succeed on our third try. This happens with probability (1 − 𝑝)2 𝑝, and results in 3 actions
taken.
Case IV: We fail three times, and succeed on our fourth try. This happens with probability (1 − 𝑝)3 𝑝, and results in 4
actions taken.
There are infinitely many cases, corresponding to failing more and more times before we succeed.
In total, the expected number of tries before we succeed comes from multiplying each possible number of actions with
the probability of requiring that number of actions:
[1 ⋅ 𝑝] + [2 ⋅ (1 − 𝑝)𝑝] + [3 ⋅ (1 − 𝑝)2 𝑝] + [4 ⋅ (1 − 𝑝)3 𝑝] + …
Writing this as an infinite sum actually yields the exact formula provided to us:
∑∞ 𝑘−1 𝑝 = 1∕𝑝
𝑘=1 𝑘(1 − 𝑝)
This tells us that on average, it takes 1∕𝑝 Fly actions before one succeeds. This should match your intuition: for example,
if Fly succeeds 10% of the time, we expect to try 10 times on average before succeeding once.
Notice that the total reward is independent of 𝑖 (there’s no 𝑖 in the final expression). This should match your intuition that
no matter which state you’re in, the value of taking action Fly indefinitely is the same, because the Fly action behaves
exactly the same regardless of which state you’re in (either you’re stuck in the same place, or you directly reach 𝑛).

(d) [4 pts] Given the results of the two previous subparts, we can now find the optimal policy for the MDP.
Which of the following are true? Select all that apply. (Hint: consider what value of 𝑖 makes 𝑉 𝜋𝑅 (𝑖) and 𝑉 𝜋𝐹 (𝑖) equal.)
Note: ⌈𝑥⌉ is the smallest integer greater than or equal to 𝑥.

□ If 𝑝 < 2∕𝑛, Fly is optimal for all states.

■ If 𝑝 < 2∕𝑛, Run is optimal for all states.
□ If 𝑝 ≥ 2∕𝑛, Fly is optimal for all 𝑖 ≥ ⌈𝑛 − 2∕𝑝⌉ and Run is optimal for all 𝑖 < ⌈𝑛 − 2∕𝑝⌉.
■ If 𝑝 ≥ 2∕𝑛, Run is optimal for all 𝑖 ≥ ⌈𝑛 − 2∕𝑝⌉ and Fly is optimal for all 𝑖 < ⌈𝑛 − 2∕𝑝⌉.
# None of the above.
Following the hint, we set 𝑉 𝜋𝑅 (𝑖) = 𝑉 𝜋𝐹 (𝑖), and we get 𝑖 − 𝑛 = −2∕𝑝. Solving for 𝑖, we get 𝑖 = 𝑛 − 2∕𝑝.
𝑖 represents one of the states in the problem, so it should be an integer. We can use the ceiling function to round up 𝑖 to
the nearest integer: 𝑖 = ⌈𝑛 − 2∕𝑝⌉.
What does this value of 𝑖 represent? At state 𝑖, 𝑉 𝜋𝑅 (𝑖) ≈ 𝑉 𝜋𝐹 (𝑖), so this 𝑖 tells us approximately where it is equally good
to always Run or always Fly.
——
At this point, we have to figure out whether it’s better to Run or Fly for the other values of 𝑖. The answer choices suggest
that for all 𝑖 ≥ ⌈𝑛 − 2∕𝑝⌉, it’s always better to Fly or Run. Similarly, for all 𝑖 < ⌈𝑛 − 2∕𝑝⌉, it’s always better to Fly or Run.

20
Without looking at the answer choices, you can also use intuition to realize the same point. For low values of 𝑖, the value
of Run is low (because you’re far from the end), and for higher values of 𝑖, the value of Run gets higher (because you’re
closer to the end). However, no matter which 𝑖 you’re at, the value of Fly is the same (as we discussed in the previous
subpart).
——
Mathematically, 𝑉 𝜋𝑅 (𝑖) = 𝑖 − 𝑛 as a function of 𝑖 is increasing linearly; for higher values of 𝑖, 𝑉 𝜋𝑅 (𝑖) increases as well.
You could draw this as an upward-sloping line on a graph of state 𝑖 (x-axis) vs. value (y-axis).
However, 𝑉 𝜋𝐹 (𝑖) = −2∕𝑝 as a function of 𝑖 is a constant (it doesn’t depend on 𝑖). On the same graph of state vs. value,
you’d have a flat line.
The upwards-sloping line crosses the flat line at 𝑖 = 𝑛 − 2∕𝑝.
In summary: for higher states 𝑖 ≥ ⌈𝑛 − 2∕𝑝⌉, we have 𝑉 𝜋𝑅 (𝑖) > 𝑉 𝜋𝐹 (𝑖), which means that it’s better to Run in these states.
For lower states 𝑖 < ⌈𝑛 − 2∕𝑝⌉, we have 𝑉 𝜋𝑅 (𝑖) < 𝑉 𝜋𝐹 (𝑖), which means that it’s better to Fly in these states.
This should match your intuition: if you’re close to the end, Run is better. If you’re very far away, Fly is better.
——
The final thing we have to check is the 𝑝 < 2∕𝑛 condition that the answer choices suggest.
If we plug in 𝑝 = 2∕𝑛, we get 𝑖 = ⌈𝑛 − 2∕(2∕𝑛)⌉ = ⌈𝑛 − 𝑛⌉ = 0.
If we plug in some lower value like 𝑝 = 1∕𝑛, we get 𝑖 = ⌈𝑛 − 2∕(1∕𝑛)⌉ = ⌈𝑛 − 2𝑛⌉ = −2𝑛.
What does this 𝑖 value represent? It represents the first state where Run becomes better than Fly. But if 𝑝 < 2∕𝑛, we start
getting negative values of 𝑖! For example, when 𝑝 = 1∕𝑛, the first state where Run is better than Fly is −2𝑛, which isn’t
a valid state! In other words, for all states greater than −2𝑛, it’s better to Run than Fly. This actually means that for all
states in the MDP (0 through 𝑛 − 1, which are all non-negative and thus greater than −2𝑛), Run is better than Fly.
This should match your intuition: if the probability of succeeding with Fly is way too small, then it’s better to always Run,
no matter where you’re at.

21
Regardless of your answers to the previous parts, consider the following modified transition and reward functions (which may
not correspond to the original problem). As before, once Pacman reaches state 𝑛, no further actions or rewards are available.

For each modified MDP and discount factor, select whether value iteration will converge to a finite set of values.

(e) [2 pts] 𝛾 = 1
𝑠 𝑎 𝑠′ 𝑇 (𝑠, 𝑎, 𝑠′ ) 𝑅(𝑠, 𝑎, 𝑠′ )
𝑖 Run 𝑖+1 1.0 +5
𝑖 Fly 𝑖+1 1.0 +5

Value iteration converges

# Value iteration does not converge
# Not enough information to decide

In these questions, it helps to recall that the value of a state is the expected, discounted sum of rewards for starting in that
state and acting optimally. Value iteration is trying to compute a value for every state in the MDP.
Value iteration converges if the value at every state is finite. If the value at some states is infinite, the value iteration will
never converge.
——
In this subpart, 𝑛 is an absorbing state: no matter how you act in this MDP, you eventually have to reach 𝑛 in a finite
number of time steps. Therefore, you only have a limited number of time steps to accumulate reward in this MDP, so the
expected discounted sum of rewards in this MDP is finite.
Intuitively, Pacman is only allowed to move forward, so Pacman will inevitably reach 𝑛 and run out of rewards to collect.

(f) [2 pts] 𝛾 = 1
𝑠 𝑎 𝑠′ 𝑇 (𝑠, 𝑎, 𝑠′ ) 𝑅(𝑠, 𝑎, 𝑠′ )
𝑖 Run 𝑖+1 1.0 +5
𝑖 Fly 𝑖−1 1.0 +5

# Value iteration converges

Value iteration does not converge
# Not enough information to decide
The expected discounted sum of rewards in this MDP could be infinite. Intuitively, Pacman could use Run and Fly to
move between states 𝑖 and 𝑖 + 1 indefinitely, collecting infinite reward.

(g) [2 pts] 𝛾 < 1

𝑠 𝑎 𝑠′ 𝑇 (𝑠, 𝑎, 𝑠′ ) 𝑅(𝑠, 𝑎, 𝑠′ )
𝑖 Run 𝑖+1 1.0 +5
𝑖 Fly 𝑖−1 1.0 +5

Value iteration converges

# Value iteration does not converge
# Not enough information to decide
The only difference between this subpart and the previous subpart is the discount factor, which is now less than 1.
Pacman can still use Run and Fly to move between states 𝑖 and 𝑖 + 1 indefinitely, but future rewards are discounted. This
causes the expected discounted sum of rewards to form an infinite geometric series with common ratio 𝛾 < 1, which
converges.
Concretely, the expected discounted sum of rewards for starting somewhere and alternating Run and Fly to move back
and forth forever is: 5 + 5𝛾 + 5𝛾 2 + … = 5∕(1 − 𝛾), which is a finite value.

Dice
No ratings yet
Dice
25 pages
Daa Unit-2final
No ratings yet
Daa Unit-2final
33 pages
4.5 Issues in Code Generation
No ratings yet
4.5 Issues in Code Generation
7 pages
Wireshark Tutorial PDF
No ratings yet
Wireshark Tutorial PDF
7 pages
Synt BKR 23-07-10 11 en (Salmonella)
No ratings yet
Synt BKR 23-07-10 11 en (Salmonella)
139 pages
Bigboypyq Merged
No ratings yet
Bigboypyq Merged
111 pages
Dhrishti 2021 Mastercopy 21st Dec
No ratings yet
Dhrishti 2021 Mastercopy 21st Dec
75 pages
Pgcs 3
No ratings yet
Pgcs 3
13 pages
Mil Lesson 11 People Media
No ratings yet
Mil Lesson 11 People Media
17 pages
Report Dbms Ahm
No ratings yet
Report Dbms Ahm
58 pages
Work-From-Home (WFH) Agreement
No ratings yet
Work-From-Home (WFH) Agreement
2 pages
Pgcs
No ratings yet
Pgcs
9 pages
02 Algorithm
No ratings yet
02 Algorithm
22 pages
Ai PDF
No ratings yet
Ai PDF
85 pages
Jeremy Munday 2016 Chapter 7 Systems Theories Introducing Translation
100% (1)
Jeremy Munday 2016 Chapter 7 Systems Theories Introducing Translation
28 pages
Chapter 2 Problem Solving by Search
No ratings yet
Chapter 2 Problem Solving by Search
31 pages
Industrial Battery: Maintenance
No ratings yet
Industrial Battery: Maintenance
26 pages
Cells 6th Grade
100% (4)
Cells 6th Grade
44 pages
Traffic Simulation SUMO
No ratings yet
Traffic Simulation SUMO
27 pages
Ai2022 End Merged
No ratings yet
Ai2022 End Merged
28 pages
Pradnya Jayakar Resume Randstad
No ratings yet
Pradnya Jayakar Resume Randstad
5 pages
Guide BB
No ratings yet
Guide BB
20 pages
Itcs 6150 8150 Fall Midterm
No ratings yet
Itcs 6150 8150 Fall Midterm
14 pages
Sample Midterm
No ratings yet
Sample Midterm
10 pages
Theory Assignment 2 Memo
No ratings yet
Theory Assignment 2 Memo
9 pages
Ai Exo2
No ratings yet
Ai Exo2
8 pages
AI Unit 2 Algorithms
No ratings yet
AI Unit 2 Algorithms
45 pages
Ai 02
No ratings yet
Ai 02
22 pages
RBI Grade B 2022 Phase II FM Previous Year Paper
No ratings yet
RBI Grade B 2022 Phase II FM Previous Year Paper
19 pages
A1968 - Completing Risk Assessment Template Guideline
No ratings yet
A1968 - Completing Risk Assessment Template Guideline
22 pages
2022 QP - Synoptic
No ratings yet
2022 QP - Synoptic
10 pages
De 2024 - DA
No ratings yet
De 2024 - DA
10 pages
Algo Prev Year Solution 2019 and 2018
No ratings yet
Algo Prev Year Solution 2019 and 2018
23 pages
Answer 2019
No ratings yet
Answer 2019
7 pages
Disc01 Examprep Sols
No ratings yet
Disc01 Examprep Sols
4 pages
HydraPro Smart Bottle - Business Proposal
No ratings yet
HydraPro Smart Bottle - Business Proposal
7 pages
2014 Aut
No ratings yet
2014 Aut
5 pages
Ucsp Week 5
No ratings yet
Ucsp Week 5
7 pages
Pgcs 2023
No ratings yet
Pgcs 2023
5 pages
Theory Quiz 1 Solution
No ratings yet
Theory Quiz 1 Solution
4 pages
Tutorial2 Final Final Answersv15
No ratings yet
Tutorial2 Final Final Answersv15
12 pages
AI - AI417DE01 Lab - Week 03
No ratings yet
AI - AI417DE01 Lab - Week 03
5 pages
A Heuristic Search: Andrew W. Moore Professor School of Computer Science Carnegie Mellon University
No ratings yet
A Heuristic Search: Andrew W. Moore Professor School of Computer Science Carnegie Mellon University
13 pages
Ct-2 b1 (Set-A) Ak
No ratings yet
Ct-2 b1 (Set-A) Ak
7 pages
6.01 Final Exam Spring 2011: Name: Section
No ratings yet
6.01 Final Exam Spring 2011: Name: Section
27 pages
The Last Lesson: Alphonse Daudet
No ratings yet
The Last Lesson: Alphonse Daudet
106 pages
Practice Question
No ratings yet
Practice Question
7 pages
Endoplasmic Reticulum - 1
No ratings yet
Endoplasmic Reticulum - 1
8 pages
Midterm 1: CS 188 Summer 2019 Introduction To Artificial Intelligence
No ratings yet
Midterm 1: CS 188 Summer 2019 Introduction To Artificial Intelligence
13 pages
Artificial Intelligence Record: Shraddha Muralidhar/1BM15CS143
No ratings yet
Artificial Intelligence Record: Shraddha Muralidhar/1BM15CS143
19 pages
Solutions-Heuristic Search
No ratings yet
Solutions-Heuristic Search
7 pages
Intro Parent Manual
No ratings yet
Intro Parent Manual
9 pages
502 - KRR - June 2012 - Answer - BB
No ratings yet
502 - KRR - June 2012 - Answer - BB
8 pages
Revisionsss
No ratings yet
Revisionsss
16 pages
Soal ASAS 9 B.ing Kisi2
No ratings yet
Soal ASAS 9 B.ing Kisi2
3 pages
PDF 20230424 222359 0000
No ratings yet
PDF 20230424 222359 0000
4 pages
1 Exercises Theory - Natural Numbers - 1º Eso PDF
No ratings yet
1 Exercises Theory - Natural Numbers - 1º Eso PDF
10 pages
Ai Assignment 1
No ratings yet
Ai Assignment 1
5 pages
Arnav Agarwal Aagarwal6 - Be19@thapar - Edu: Quiz II - UCS415
100% (1)
Arnav Agarwal Aagarwal6 - Be19@thapar - Edu: Quiz II - UCS415
8 pages
Practice - Midterm - 1 - Solutions KOOOOLL PDF
No ratings yet
Practice - Midterm - 1 - Solutions KOOOOLL PDF
15 pages
PracticeSolution 1
No ratings yet
PracticeSolution 1
15 pages
v231 Tutorial1 Answers
No ratings yet
v231 Tutorial1 Answers
5 pages
Solving Problems by Searching
No ratings yet
Solving Problems by Searching
56 pages
Sample Mid Term ACI
No ratings yet
Sample Mid Term ACI
3 pages
Cos3751 201 1 2017 PDF
50% (2)
Cos3751 201 1 2017 PDF
15 pages
AlgorithmComplexityCA Correction
No ratings yet
AlgorithmComplexityCA Correction
4 pages
Single Entry System
No ratings yet
Single Entry System
4 pages
Ai Assignment 1
No ratings yet
Ai Assignment 1
5 pages
6-7.informed Search (Recovered)
No ratings yet
6-7.informed Search (Recovered)
50 pages
CIS 391/521: HW 2 - Search Solutions (Preliminary) : 1 Written Portion (35 PTS, 5 Points Each)
No ratings yet
CIS 391/521: HW 2 - Search Solutions (Preliminary) : 1 Written Portion (35 PTS, 5 Points Each)
7 pages
Water - Acidity
No ratings yet
Water - Acidity
2 pages
Homework 1 Solutions: Input Output
No ratings yet
Homework 1 Solutions: Input Output
4 pages
HW 1
No ratings yet
HW 1
4 pages
Prelim1 Practice
No ratings yet
Prelim1 Practice
14 pages
hw1 Sol
No ratings yet
hw1 Sol
6 pages
HW01
No ratings yet
HW01
7 pages
Radius
No ratings yet
Radius
3 pages
Test2 Sample
No ratings yet
Test2 Sample
9 pages
MT 2009 Answers
No ratings yet
MT 2009 Answers
8 pages
Assignment No 1-2010
100% (1)
Assignment No 1-2010
7 pages
Persuasion Map
No ratings yet
Persuasion Map
1 page
Tugas AIT - Beda Interest, ROR, Dan DCF ROR
No ratings yet
Tugas AIT - Beda Interest, ROR, Dan DCF ROR
3 pages
Abhinav Aabhinav - Be19@thapar - Edu: Quiz II - UCS415
No ratings yet
Abhinav Aabhinav - Be19@thapar - Edu: Quiz II - UCS415
8 pages
Functional Groups
No ratings yet
Functional Groups
1 page
Cos3751 201 1 2015 PDF
No ratings yet
Cos3751 201 1 2015 PDF
9 pages
Aashima Aaashima - Be19@thapar - Edu: Quiz II - UCS415
No ratings yet
Aashima Aaashima - Be19@thapar - Edu: Quiz II - UCS415
8 pages
UCS415 - EST - Final With Solutions
No ratings yet
UCS415 - EST - Final With Solutions
16 pages
Primary School Plans
From Everand
Primary School Plans
Planificaciones para primaria
No ratings yet
ACT Math Section and SAT Math Level 2 Subject Test Practice Problems 2013 Edition
From Everand
ACT Math Section and SAT Math Level 2 Subject Test Practice Problems 2013 Edition
Dr. David Kronmiller
3/5 (3)
SAT Math Level 2 Subject Test Practice Problems 2013 Edition
From Everand
SAT Math Level 2 Subject Test Practice Problems 2013 Edition
Dr. David Kronmiller
1/5 (1)

23sp - Cs188-Sp23-Midterm-Solutions

Uploaded by

23sp - Cs188-Sp23-Midterm-Solutions

Uploaded by

CS 188 Introduction to

Spring 2023 Artificial Intelligence Midterm

Name of person to the right

Name of person to the left

Discussion TAs (or None)

Q1. Search: Project Groups 16

Consider drawing the search graph for this problem.

(a) [2 pts] How many nodes are in the search graph?

(b) [2 pts] Does the search graph contain any cycles?

Now, consider drawing the search tree for this problem.

Figure 1: Examples for the coloring problem.

A square is unfinished if it still needs to be colored.

(a) [1 pt] The mouse cursor starts in the bottom-left square.

# State space # Goal test

(b) [1 pt] The current location of the mouse cursor.

State space # Goal test

(c) [1 pt] The list of squares colored so far.

State space # Goal test

(d) [1 pt] The original list of squares that need to be colored.

(e) [3 pts] ℎ1 = Number of unfinished squares

Both admissible and consistent # Consistent, but not admissible

(f) [3 pts] ℎ2 = 2 times the number of unfinished squares

# Both admissible and consistent # Consistent, but not admissible

# Both admissible and consistent # Consistent, but not admissible

(h) [2 pts] Consider ℎ1 and ℎ3 . Select all true statements:

(i) [3 pts] Consider another heuristic:

We have three colors available to use: 𝑅 (Red), 𝐺 (Green), and 𝐵 (Blue).

For each of the 3 nodes, we need 3 symbols (one per color).

For each of the 𝑛 nodes, we need 𝑐 symbols (one per color).

Each node must be colored.

¬(𝑋𝑅 ∧ 𝑌𝑅 ) ∧ ¬(𝑋𝐺 ∧ 𝑌𝐺 ) ∧ ¬(𝑋𝐵 ∧ 𝑌𝐵 )

Each node can have at most one color.

(i) [3 pts] Select all true statements:

North East South West

1. The jaguar cannot share a room with any other animal.

Iguana: North, East, South, West

# Assign North or West to a variable next.

Iguana: North, East, South, West

(f) [3 pts] Suppose we add another constraint:

(a) [1 pt] If 𝑌 = 4, what is the minimax value at the root node 𝐴?

(d) [2 pts] When is the statement 𝐴 ≤ max(𝑌 , 14) true?

Always true. # Sometimes true. # Never true.

𝑖 Run 𝑖+1 1.0 −1

For the next three subparts, assume that 𝛾 = 1.

(b) [2 pts] What is 𝑉 𝜋𝑅 (𝑖)?

□ If 𝑝 < 2∕𝑛, Fly is optimal for all states.

Value iteration converges

# Value iteration converges

(g) [2 pts] 𝛾 < 1

Value iteration converges

You might also like