Artificial Intelligence: Representation and Problem Solving
Probabilistic Reasoning (2): Bayesian Networks
1 06/14/2024
Recap
Probability Models
Joint probability distribution of random variables
Probabilistic Inference
Compute Marginal probability or Conditional probability
Chain Rule, Independence, Bayes’ Rule
Full joint distribution is hard to estimate and too big
to represent explicitly
2 Fei Fang 06/14/2024
Outline
Bayesian Networks
Independence in Bayes’ Net
Construct a Bayes’ Net
Exact Inference in Bayes’ Net
Applications of Bayes’ Net
3 Fei Fang 06/14/2024
Bayesian Network
Bayesian Network (Bayes’ net) Overview
A compact way to represent knowledge in an uncertain domain
Describe full joint probability distributions using simple, local
distributions
A probabilistic graphical model (≈ “directly influences”)
A directed acyclic graph
Each node corresponds to a random variable
Each direct edge represent is a parent of
Each node has a conditional probability distribution
Weather Way to
Commute
to Work
Sleeping
Quality
4 Fei Fang 06/14/2024
Example: Alarm
I’m at work, both of my neighbors John and Mary call
to say my alarm for burglary is ringing. Sometimes
it’s set off by minor earthquakes. Is there a burglary?
How do we model this scenario? How can we
represent our knowledge in such a domain with
uncertainty?
5 Fei Fang 06/14/2024
Example: Alarm
Random Variables: , , , ,
Domain
Knowledge base: Full joint probability distribution
How big is the table?
Joint
Probability
T T T T T
T T T T F
T T T F T
… … … … … …
Task: Compute
6 Fei Fang 06/14/2024
Example: Alarm
Recall Independence
Can be represented much more efficiently!
Are all the random variables independent in this
example?
Joint
Probability
T T T T T
T T T T F
T T T F T
… … … … … …
8 Fei Fang 06/14/2024
Example: Alarm
However, there are some intuitive independence relationships
based on our causal knowledge!
Causal knowledge –
A burglary can set the alarm off
An earthquake can set the alarm off
The alarm can cause Mary to call
The alarm canBurglar
cause John to call
Earthquake
y
Alarm
MaryCall
JohnCalls
s
10 Fei Fang 06/14/2024
Example: Alarm
and are independent with each other
Conditioned on the value of , and are independent
Similar independence assumptions for and
Conditioned on the value of , and are independent
with each other
11 Burglary (B), Earthquake (E), Alarm (A), JohnCalls (J), MaryCalls (M)06/14/2024
Example: Alarm
Given these independence relationships,
𝐏 ( 𝐵 , 𝐸 , 𝐴, 𝐽 ,𝑀 )=¿
We don’t need fill the full joint probability table
anymore to represent our knowledge!
Only need to provide these conditional probabilities
Is this better or worse?
15 Fei Fang 06/14/2024
Example: Alarm
How many numbers we need here? ! Recall we need for the original table.
17 burglary (B), Earthquake (E), Alarm (A), JohnCalls (J), MaryCalls (M)06/14/2024
Example: Alarm
Enrich the network with more links: more realistic, less compact
18 Fei Fang 06/14/2024
Bayesian Network
Bayesian Network: A compact way to represent
knowledge in an uncertain domain
Joint
Probability
T T T T T
T T T T F
T T T F T
… … … … … …
20 Fei Fang 06/14/2024
Bayesian Network
Bayesian Network: Describe full joint probability
distributions using simple, local distributions
Global semantics
Joint
Probability
T T T T T
T T T T F
T T T F T
… … … … … …
Have to be equivalent!
21 Fei Fang 06/14/2024
Bayesian Network
Bayesian Network: Describe full joint probability
distributions using simple, local distributions
What is ? means
Joint
Probability
T T T T T
T T T T F
T T T F T
… … … … … …
Have to be equivalent!
22 Fei Fang 06/14/2024
Bayesian Network
Bayesian Network: A probabilistic graphical model
A directed acyclic graph
Node – random variable; Edge – parent-child relationship
Conditional probability distribution
Often represented by CPT (conditional probability table)
A Bayes’ Net = topology (graph) + local conditional probabilities
24 Fei Fang 06/14/2024
Quiz 1
At least how many
Rainy/ 0.2 entries are needed for
Snowy
a general CPT
Sunny 0.3
(conditional
Other 0.5
probability table) for
Weather
Way to
the node “Way to
Commute Commute to Work”?
to Work
Sleeping A: 18
Quality
{𝐵𝑢𝑠,𝑊𝑎𝑙𝑘, 𝐵𝑖𝑘𝑒} B: 12
C: 6
High 0.2
D: 3
Low 0.3
25 Fei Fang 06/14/2024
Quiz 1
Rainy/ 0.2 R/S H Bus 0.7
Snowy R/S H Walk 0.25
Sunny 0.3 R/S L Bus 0.8
Other 0.5 R/S L Walk 0.15
Sunny H Bus 0.2
Weather
Way to Sunny H Walk 0.5
Commute
to Work Sunny L Bus 0.3
Sleeping
Quality Sunny L Walk 0.3
{𝐵𝑢𝑠,𝑊𝑎𝑙𝑘, 𝐵𝑖𝑘𝑒} Other H Bus 0.4
Other H Walk 0.2
High 0.2 Other L Bus 0.6
Low 0.3 Other L Walk 0.2
26 Fei Fang 06/14/2024
Another Perspective of Bayes’ Net
Assume you have no “causal knowledge” but
someone gives you the full joint probability table
You observe there is a valid factorization of the full
joint probability distribution
𝐏 ( 𝐵 , 𝐸 , 𝐴, 𝐽 ,𝑀 )= 𝛉1 ( 𝐵 ) 𝛉2 ( 𝐸 ) 𝛉 3 ( 𝐴∨ 𝐵, 𝐸 ) 𝛉4 ( 𝐽∨ 𝐴) 𝛉5 ( 𝑀∨ 𝐴 )
You further observe that such factorization can be
represented using a DAG
You can prove ’s equal to “conditional probabilities”
Now you get a Bayes’ Net
27 Fei Fang 06/14/2024
Outline
Bayesian Networks
Independence in Bayes’ Net
Construct a Bayes’ Net
Exact Inference in Bayes’ Net
Applications of Bayes’ Net
28 Fei Fang 06/14/2024
Independence in Bayes’ Net
Given a Bayes’ Net, which variables are independent?
Each node is conditionally independent of its non-
descendants given its parents Local semantics
Given , is independent of
29 Fei Fang 06/14/2024
Independence in Bayes’ Net
Each node is conditionally independent of all others
given its Markov blanket: parents + children +
children’s parents
30 Fei Fang 06/14/2024
Example
List all the independence relationships
Local Semantics: Each node is conditionally independent of its non-descendants
given its parents
Each node is conditionally independent of all others given its Markov blanket:
parents + children + children’s parents
31 Fei Fang 06/14/2024
Quiz 2
Which of the following statements of independence
are true given the Bayes’ Net based on local semantics
and Markov blankets? B C
A: A
B:
C: D E
D:
E: G
F
Local Semantics: Each node is conditionally independent of its
non-descendants given its parents H
Each node is conditionally independent of all others given its
Markov blanket: parents + children + children’s parents
32 Fei Fang 06/14/2024
Outline
Bayesian Networks
Independence in Bayes’ Net
Construct a Bayes’ Net
Exact Inference in Bayes’ Net
Applications of Bayes’ Net
34 Fei Fang 06/14/2024
Is Bayes’ Net Expressive Enough?
Any full joint probability table can be represented by
a Bayes’ Net
35 Fei Fang 06/14/2024
Is Bayes’ Net Unique?
One (full joint probability distribution)-to-many
(Bayes’ Net) mapping
𝐸 𝐵 𝐽 𝑀 𝐴
37 Fei Fang 06/14/2024
Construct a Bayes’ Net
Construct a (ideally simple) Bayes’ Net systematically
As a knowledge engineer or domain expert
1. Choose an ordering of variables
2. For
Add to the network
Select a minimal subset of variables from , denoted such that
Add edges from nodes in to , write down the conditional probability table
(CPT)
This process guarantees
38 Fei Fang 06/14/2024
Construct a Bayes’ Net
Construct a (ideally simple) Bayes’ Net systematically
Ordering of variables matters
Exploit domain knowledge to determine the ordering:
intuitively, parent of a node should contain all nodes that
directly influences
Joint
Probability
T T T T T
T T T T F
T T T F T
… … … … … …
40 Fei Fang 06/14/2024
Outline
Bayesian Networks
Independence in Bayes’ Net
Construct a Bayes’ Net
Exact Inference in Bayes’ Net
Applications of Bayes’ Net
41 Fei Fang 06/14/2024
Probabilistic Inference in Bayes’ Net
Recap: Probabilistic inference:
No evidence: Marginal probability
With evidence: Posterior / Conditional probability
Inference with full joint probability distribution:
Marginalization, Bayes’ Rule
Exact Inference in Bayes’ Net: (1) Enumeration; (2)
Variable Elimination
42 Fei Fang 06/14/2024
General Inference Procedure
Partition the set of random variables in the model
Evidence variables , and be e the list of observed values from
them
Remaining unobserved / hidden variables
Query variables
The query can be answered by
𝐏 ( 𝐗|𝐞 ) =¿
43 Fei Fang 06/14/2024
Inference in Bayes’ Net
𝐏 (𝐗 , 𝐞 )
𝐏 ( 𝐗|𝐞 ) = =𝛼 𝐏 ( 𝐗 , 𝐞 ) =𝛼 ∑ 𝐏 (𝐗 , 𝐞 , 𝐲 )
𝐏 (𝐞 ) 𝒚 ∈𝒀
Inference with full joint probability distribution table
available: Read the joint probability from the table
Inference in Bayes’ Net: compute joint probability
through conditional probability table
𝐏 ( 𝑋1 ,…, 𝑋𝑛 ) =∏ 𝐏 (𝑋 𝑖 ∨𝑃𝑎𝑟𝑒𝑛𝑡 ( 𝑋 𝑖 ) )
𝑖
45 Fei Fang 06/14/2024
Example: Alert
I’m at work, both of my neighbors John and Mary call
to say my alarm for burglary is ringing. Sometimes
it’s set off by minor earthquakes. Is there a burglary?
𝑃 ( 𝑏| 𝑗 , 𝑚 ) =?
𝐏 ( 𝐗|𝐞 ) =𝛼 𝐏 ( 𝐗 , 𝐞 ) =𝛼 ∑ 𝐏 ( 𝐗 , 𝐞 , 𝐲 )
𝒚∈𝒀
46 Fei Fang 06/14/2024
Example: Alert
𝑃 ( 𝑏∨ 𝑗 ,𝑚 )=𝛼 𝑃 ( 𝑏 ) ∑ 𝑃(𝑒′) ∑ |
′ ′
) ( |
𝑃 𝑎 𝑏,𝑒 𝑃 𝑗 𝑎 ) 𝑃(𝑚∨𝑎′)
( ′
′ ′
𝑒 ∈ {𝑒,¬𝑒} 𝑎 ∈{𝑎, ¬𝑎}
Evaluate through depth-first recursion of the
following expression tree
Top-down DF evaluation:
× Values along each path
+ at the branching nodes
48 Fei Fang 06/14/2024
Example: Alert
𝑃 ( 𝑏∨ 𝑗 ,𝑚 )=𝛼 𝑃 ( 𝑏 ) ∑ 𝑃(𝑒′) ∑ 𝑃 ( 𝑎′|𝑏,𝑒′ ) 𝑃 ( 𝑗|𝑎′ ) 𝑃(𝑚∨𝑎′)=𝛼×0.00059224
′ ′
𝑒 ∈ {𝑒,¬𝑒} 𝑎 ∈{𝑎, ¬𝑎}
𝑃 ( ¬𝑏∨ 𝑗,𝑚 )=𝛼 𝑃 ( ¬𝑏 ) ∑ 𝑃(𝑒 ′) ∑ |
′ ′
) ( |
𝑃 𝑎 ¬𝑏,𝑒 𝑃 𝑗 𝑎 ) 𝑃(𝑚∨𝑎′)=𝛼×0.0014919
( ′
′ ′
𝑒 ∈{𝑒, ¬𝑒} 𝑎 ∈{𝑎,¬ 𝑎}
Normalize
49 Fei Fang 06/14/2024
Exact Inference in Bayes’ Net: Enumeration
51 Fei Fang 06/14/2024
Exact Inference in Bayes’ Net: Variable Elimination
Avoid repeated computation of subexpressions in the
enumeration algorithm
Similar to dynamic programming
𝑃 ( 𝑏∨ 𝑗 ,𝑚 )=𝛼 𝑃 ( 𝑏 ) ∑ 𝑃(𝑒′) ∑ 𝑃 ( 𝑎′|𝑏,𝑒′ ) 𝑃 ( 𝑗|𝑎′ ) 𝑃(𝑚∨𝑎′)
′ ′
𝑒 ∈ {𝑒,¬𝑒} 𝑎 ∈{𝑎, ¬𝑎}
𝑃 ( ¬𝑏∨ 𝑗,𝑚 )=𝛼 𝑃 ( ¬𝑏) ∑ 𝑃(𝑒 ′) ∑ |
′ ′
) ( |
𝑃 𝑎 ¬𝑏,𝑒 𝑃 𝑗 𝑎 ) 𝑃(𝑚∨𝑎′)
( ′
′ ′
𝑒 ∈{𝑒, ¬𝑒} 𝑎 ∈{𝑎,¬ 𝑎}
52 Fei Fang 06/14/2024
Outline
Bayesian Networks
Independence in Bayes’ Net
Construct a Bayes’ Net
Exact Inference in Bayes’ Net
Applications of Bayes’ Net
53 Fei Fang 06/14/2024
Bayes’ Net as a Model of Real World
Bayes’ Net represents knowledge in an uncertain
domain
View it as a way to model the real world based on
domain knowledge
Is your model (Bayes’ Net) for a real-world problem
correct? Not necessarily.
54 Fei Fang 06/14/2024
Bayes’ Net as a Model of Real World
"All models are wrong“
A common aphorism in statistics
Generally attributed to the statistician George Box
"Essentially, all models are
wrong, but some are useful".
https://en.wikipedia.org/wiki/All_models_are_wrong
55 Fei Fang 06/14/2024
Use of Bayes’ Net
Diagnosis: (cause | symptom)?
Prediction: (symptom | cause)?
Classification: (class | data)
Decision-making (given a cost function)
56 Fei Fang 06/14/2024
Use of Bayes’ Net
Russel and Novig
57 Fei Fang 06/14/2024
Summary
Bayes’ Net
Graphical model
Decompose full joint probability distributions into interpretable,
simple, local distributions
Independence in Bayes’ Net
Local semantics
Markov Blanket
Construct a Bayes Net
Exact Inference in Bayes’ Net
Applications of Bayes’ Net
58 Fei Fang 06/14/2024
Backup Slides
59 Fei Fang 06/14/2024
Conditional Independence
Example graph (1)
60 Fei Fang 06/14/2024
Conditional Independence
Example graph (2)
62 Fei Fang 06/14/2024
Conditional Independence
Example graph (3)
64 Fei Fang 06/14/2024
D-Separation for Conditional Independence
is valid in general if and only if all the paths from any
node in to any node in are blocked
A path is blocked if and only if it includes a node such
that either one of the following statements are true
The rows on the path meet head-to-tail or tail-to-tail at the
node, and the node is in the set
The rows on the path meet head-to-head at the node, and
neither the node nor any of its descendants, is in the set
Tail-to-tail at Head-to-tail at Head-to-head at
66 Fei Fang 06/14/2024
D-Separation for Conditional Independence
How to tell if based on D-Separation?
Set
For each path from to (Loop 1)
Set
For each node on the path from to (Loop 2)
If head-to-tail or tail-to-tail
If
Break (Loop 2)
If head-to-head
For each node { descendant of } (Loop 3)
If
Break (Loop 3)
If
Break (Loop 2)
If
Break (Loop 1)
Return
67 Fei Fang 06/14/2024