0% found this document useful (0 votes)

13 views45 pages

Lecture 13

9.66

Uploaded by

Gio Villa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views45 pages

Lecture 13

9.66

Uploaded by

Gio Villa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

Class announcements

• Recitations Th, F 4 PM – 46-3189

– This week Review of Basic Bayes

• PSet 1 out today, due Oct 3. Other psets due

approximately every two weeks thereafter.

• Classes next week are virtual: We will have a guest

lecture from Vikash Mansinghka on Thursday that
you can watch asynchronously, and I may give one
virtual lecture (depending on where we end up
today).
Plan for today

Basic Bayesian cognition

– The number game
The number game

60
Diffuse similarity

60 80 10 30 Rule:
“multiples of 10”

60 52 57 55 Focused similarity:
numbers near 50-60

Main phenomena to explain:

– Generalization can appear either similarity-
based (graded) or rule-based (all-or-none).
– Learning from just a few positive examples.
A single unifying account of (number) concept learning?

• We’re going to use this to introduce Bayesian

approaches, but first consider ...
– The “naïve programmer” approach?
– The “modern neural network” approach?
Traditional (algorithmic level) cognitive models

• Multiple representational systems: rules and

similarity
– Categorization, language (past tense), reasoning
• Questions this leaves open:
– How does each system work? How far and in ways to
generalize as a function of the examples observed?
• Which rule to choose?
– E.g., X = {60, 80, 10, 30}: multiples of 10 vs. even numbers?
• Which similarity metric?
– E.g., X = {60, 53} vs. {60, 20}?
– Why these two systems?
– When and why does a learner switch between them?
Reverse-engineering a cognitive system:
Marr’s three levels

• Level 1: Computational theory

– What are the inputs and outputs to the computation,
what is its goal, and what is the logic by which it is
carried out?
• Level 2: Representation and algorithm
– How is information represented and processed to
achieve the computational goal?
• Level 3: Hardware implementation
– How is the computation realized in physical or
biological hardware?
Bayesian model
• H: Hypothesis space of possible concepts:
– h1 = {2, 4, 6, 8, 10, 12, …, 96, 98, 100} (“even numbers”)
– h2 = {10, 20, 30, 40, …, 90, 100} (“multiples of 10”)
– h3 = {2, 4, 8, 16, 32, 64} (“powers of 2”)
– h4 = {50, 51, 52, …, 59, 60} (“numbers between 50 and 60”)
– ...

Representational interpretations for H:

– Candidate rules
– Features for similarity
– “Consequential subsets” (Shepard, 1987)
Three hypothesis subspaces for number
concepts
• Mathematical properties (24 hypotheses):
– Odd, even, square, cube, prime numbers
– Multiples of small integers
– Powers of small integers
• Raw magnitude (5050 hypotheses):
– All intervals of integers with endpoints between 1 and
100.
• Approximate magnitude (10 hypotheses):
– Decades (1-10, 10-20, 20-30, …)
Bayesian model

• H: Hypothesis space of possible concepts:

– Mathematical properties: even, odd, square, prime, . . . .
– Approximate magnitude: {1-10}, {10-20}, {20-30}, . . . .
– Raw magnitude: all intervals between 1 and 100.

• X = {x1, . . . , xn}: n examples of a concept C.

• Evaluate hypotheses given data:
p ( X | h) p ( h)
p(h | X ) =
å p( X | h¢) p(h¢)
h¢ÎH
– p(h) [“prior”]: domain knowledge, pre-existing biases
– p(X|h) [“likelihood”]: statistical information in examples.
– p(h|X) [“posterior”]: degree of belief that h is the true extension of C.
Likelihood: p(X|h)
• Size principle: Smaller hypotheses receive greater
likelihood, and exponentially more so as n increases.
n
é 1 ù
p ( X | h) = ê ú if x1 , ! , xn Î h
ë size(h) û
= 0 if any xi Ï h

• Captures the intuition of a “representative” sample, versus

a “suspicious coincidence”.
Illustrating the size principle
h1 2 4 6 8 10 h2
12 14 16 18 20
22 24 26 28 30
32 34 36 38 40
42 44 46 48 50
52 54 56 58 60
62 64 66 68 70
72 74 76 78 80
82 84 86 88 90
92 94 96 98 100
Illustrating the size principle
h1 2 4 6 8 10 h2
12 14 16 18 20
22 24 26 28 30
32 34 36 38 40
42 44 46 48 50
52 54 56 58 60
62 64 66 68 70
72 74 76 78 80
82 84 86 88 90
92 94 96 98 100

Data slightly more of a coincidence under h1

Illustrating the size principle
h1 2 4 6 8 10 h2
12 14 16 18 20
22 24 26 28 30
32 34 36 38 40
42 44 46 48 50
52 54 56 58 60
62 64 66 68 70
72 74 76 78 80
82 84 86 88 90
92 94 96 98 100

Data much more of a coincidence under h1

Likelihood: p(X|h)
• Size principle: Smaller hypotheses receive greater
likelihood, and exponentially more so as n increases.
n
é 1 ù
p ( X | h) = ê ú if x1 , ! , xn Î h
ë size(h) û
= 0 if any xi Ï h

• Captures the intuition of a “representative” sample, versus

a “suspicious coincidence”.
• A special case of the law of “conservation of belief”:

åx
p( X = x | Y = y ) = 1
Prior: p(h)
• Choice of hypothesis space embodies a strong prior:
effectively, p(h) ~ 0 for many logically possible but
conceptually unnatural hypotheses.
• Do we need this? Why not allow all logically possible
hypotheses, with uniform priors, and let the data sort
them out (via the likelihood)?
Prior: p(h)
• Choice of hypothesis space embodies a strong prior:
effectively, p(h) ~ 0 for many logically possible but
conceptually unnatural hypotheses.
• Prevents overfitting by highly specific but unnatural
hypotheses, e.g. “multiples of 10 except 50 and 70”.

e.g., X = {60 80 10 30}:

• X = {60, 80, 10, 30}

• Why prefer “multiples of 10” over “even
numbers”? p(X|h).
• Why prefer “multiples of 10” over “multiples of
10 except 50 and 70”? p(h).
• Why does a good generalization need both high
prior and high likelihood? p(h|X) ~ p(X|h) p(h)
Prior: p(h)
• Choice of hypothesis space embodies a strong prior:
effectively, p(h) ~ 0 for many logically possible but
conceptually unnatural hypotheses.
• Prevents overfitting by highly specific but unnatural
hypotheses, e.g. “multiples of 10 except 50 and 70”.
• p(h) encodes relative weights of alternative theories:
H: Total hypothesis space
p(H1) = 1/5 p(H3) = 1/5
p(H2) = 3/5

H1: Math properties (24) H2: Raw magnitude (5050) H3: Approx. magnitude (10)
• even numbers • 10-15 • 10-20
• powers of two • 20-32 • 20-30
• multiples of three • 37-54 • 30-40
…. p(h) = p(H1) / 24 …. p(h) = p(H2) / 5050 …. p(h) = p(H3) / 10
Prior: p(h)
• Choice of hypothesis space embodies a strong prior:
effectively, p(h) ~ 0 for many logically possible but
conceptually unnatural hypotheses.
• Prevents overfitting by highly specific but unnatural
hypotheses, e.g. “multiples of 10 except 50 and 70”.
• p(h) encodes relative plausibility of alternative theories:
– Mathematical properties: p(h) ~ 1/120
– Approximate magnitude: p(h) ~ 1/50
– Raw magnitude: p(h) ~ 1/8500 (on average)

• Also degrees of plausibility within a theory,

e.g., for magnitude intervals of size s:

p(s)
s
Generalizing to new objects

From hypotheses to predictions:

How do we compute the probability that C
applies to some new object y, given the posterior
p(h|X)?
Hypothesis averaging

In general, we have the law of total probability:

p(A = a) = ∑ p(A = a | Z = z) p(Z = z)
z

In general, we have the law of total probability:

p(A = a) = ∑ p(A = a | Z = z) p(Z = z)
z

p( A = a | B = b) = å p( A = a | Z = z, B = b) p(Z = z | B = b)
z
…especially useful if A and B are independent conditioned on Z:
p( A = a | B = b) = å p( A = a | Z = z) p(Z = z | B = b)
z
Another example: what is the probability that the republican will
win the election, given that the weather man predicts rain?
p( Republican win | Weather report: “Rain storm”) =
å pp((Repub.
wÎweather
Republican | W =| w)
winswin = w | Weatherman
w)p(p(wW|Weather saysstorm”)
report: “Rain ' rain' )
conditions
Generalizing to new objects

Hypothesis averaging:
Compute the probability that C applies to some
new object y by averaging the predictions of all
hypotheses h, weighted by p(h|X):

p( y Î C | X ) = å$
p( y Î C | h) p(h | X )
!#!"
hÎH é 1 if yÎh
=ê
ë 0 if yÏh

= å p(h | X )
h É{ y , X }
Examples:
16
Examples:
16
8
2
64
Examples:
16
23
19
20
+ Examples Human generalization Bayesian Model

60 80 10 30

60 52 57 55

16 8 2 64

16 23 19 20
Summary of the Bayesian model

• How do the statistics of the examples interact with

prior knowledge to guide generalization?
posterior µ likelihood ´ prior

• Why does generalization appear rule-based or

similarity-based?
hypothesis averaging + size principle

broad p(h|X): similarity gradient

narrow p(h|X): all-or-none rule
Summary of the Bayesian model

• How do the statistics of the examples interact with

prior knowledge to guide generalization?
posterior µ likelihood ´ prior

• Why does generalization appear rule-based or

similarity-based?
hypothesis averaging + size principle

broad p(h|X): Many h of similar size, or

very few examples (i.e. 1)
narrow p(h|X): One h much smaller
Model variants
1. Bayes with weak sampling
posterior µ likelihood ´ prior
hypothesis averaging + size principle

“Weak sampling” p( X | h) µ 1 if x1 ,!, xn Î h

= 0 if any xi Ï h

2. Maximum a posteriori (MAP)

Maximum likelihood /subset principle
posterior µ likelihood ´ prior
hypothesis averaging + size principle
p( y Î C | X ) = 1 if y Î h*; h* = arg max p(h | X )
hÎH
= 0 if y Ï h *
Human generalization Full Bayesian model

Bayes with weak sampling Maximum a posteriori (MAP) / subset

(no size principle) principle (no hypothesis averaging)
Taking stock

• A model of high-level, knowledge-driven inductive reasoning

that makes strong quantitative predictions with minimal free
parameters.
(r2 > 0.9 for mean judgments on 180 generalization stimuli, with 3 free
numerical parameters)
• Explains qualitatively different patterns of generalization
(rules, similarity) as the output of a single general-purpose
rational inference engine.
– Marr level 1 (Computational theory) explanation of phenomena that
have traditionally been treated only at Marr level 2 (Representation
and algorithm).
Looking forward
• Can we see these ideas at work in more natural cognitive
function, not just toy problems and games?
– How might differently structured hypothesis spaces, different
likelihood functions or priors, be needed?
• Can we move from ‘weak rational analysis’ to ‘strong
rational analysis’ in the priors, as with the likelihood?
– “Weak”: behavior consistent with some reasonable prior.
– “Strong”: behavior consistent with the “correct” prior given the
structure of the world.
• Can we work with more flexible priors, not just restricted to
a small subset of all logically possible concepts?
– Would like to be able to learn any concept, even very complex ones,
given enough data (a non-dogmatic prior).
• Can we describe formally how these hypothesis spaces and
priors are generated by abstract knowledge or theories?
• Can we explain how people learn these rich priors?
Learning more natural concepts
“horse” “horse” “horse”

“tufa”
“tufa”

“tufa”
Learning rectangle concepts

Weighting different rectangle

hypotheses based on the size principle:
n
é 1 ù
p ( X | h) = ê ú if x1 , ! , xn Î h
ë size(h) û
= 0 if any xi Ï h
Generalization gradients
Full Bayes Subset principle Bayes w/o size principle
(MAP Bayes) (0/1 likelihoods)
Modeling word learning (Xu & Tenenbaum, 2007)
Modeling word learning (Xu & Tenenbaum, 2007)
Modeling word learning (Xu & Tenenbaum, 2007)
Children’s
generalizations

Bayesian
concept learning
with tree-structured
hypothesis space
Exploring different models

• Priors, likelihoods derived from simple assumptions.

What about more complex cases?
• Different likelihoods?
– Suppose the examples are sampled by a different process,
such as active learning, or active pedagogy.

• Different priors?
– More complex language-like hypothesis spaces, allowing
exceptions, compound concepts, and much more…

Content Library Read
No ratings yet
Content Library Read
24 pages
Lecture 12
No ratings yet
Lecture 12
42 pages
Lecture 14
No ratings yet
Lecture 14
54 pages
Artifical Intelligence Notes Part 7
No ratings yet
Artifical Intelligence Notes Part 7
49 pages
COMP3411 Week 9 - Uncertainty
No ratings yet
COMP3411 Week 9 - Uncertainty
70 pages
Unit 3-2
No ratings yet
Unit 3-2
12 pages
Bayesian Learning for Graphics
No ratings yet
Bayesian Learning for Graphics
141 pages
Lec14 15 GenerativeModelsForDiscreteData
No ratings yet
Lec14 15 GenerativeModelsForDiscreteData
74 pages
ML UNIT-5 Notes PDF
No ratings yet
ML UNIT-5 Notes PDF
41 pages
ML Lec3
No ratings yet
ML Lec3
10 pages
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
No ratings yet
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
123 pages
Lecture 10
No ratings yet
Lecture 10
59 pages
Math - ML Trang 6
No ratings yet
Math - ML Trang 6
53 pages
Unit 3 Bayesian Concept Learning
No ratings yet
Unit 3 Bayesian Concept Learning
66 pages
SCSA3015 Deep Learning Unit 2 PDF
No ratings yet
SCSA3015 Deep Learning Unit 2 PDF
32 pages
Hennig 2021 Probabilistic Machine Learning
No ratings yet
Hennig 2021 Probabilistic Machine Learning
189 pages
AL3391 AI UNIT 5 NOTES EduEngg
100% (1)
AL3391 AI UNIT 5 NOTES EduEngg
26 pages
Leon-Garcia-IPPR - Chapters 1-6
No ratings yet
Leon-Garcia-IPPR - Chapters 1-6
180 pages
Dealing With Uncertainty P (X - E) : Probability Theory The Foundation of Statistics
No ratings yet
Dealing With Uncertainty P (X - E) : Probability Theory The Foundation of Statistics
34 pages
Unit 5
No ratings yet
Unit 5
25 pages
Joint Probability Vs Conditional Probability - by Prathap Manohar Joshi - Medium
No ratings yet
Joint Probability Vs Conditional Probability - by Prathap Manohar Joshi - Medium
1 page
Unit 2
No ratings yet
Unit 2
20 pages
Foundations of Machine Learning: Module 7: Computational Learning Theory
No ratings yet
Foundations of Machine Learning: Module 7: Computational Learning Theory
64 pages
25-27 Statistical Reasoning-Probablistic Model-Naive Bayes Classifier
No ratings yet
25-27 Statistical Reasoning-Probablistic Model-Naive Bayes Classifier
35 pages
L2 - Mathematical Preliminaries
No ratings yet
L2 - Mathematical Preliminaries
41 pages
Bayesian Data Analysis and Modelling
No ratings yet
Bayesian Data Analysis and Modelling
13 pages
M3 - FDS
No ratings yet
M3 - FDS
38 pages
M3 - FDS
No ratings yet
M3 - FDS
38 pages
PMRprobabilistic Modelling Primer
No ratings yet
PMRprobabilistic Modelling Primer
14 pages
Cs Ai Lecture Notes 02
No ratings yet
Cs Ai Lecture Notes 02
103 pages
PML UNIT V Material
No ratings yet
PML UNIT V Material
44 pages
Dempster Shafer
No ratings yet
Dempster Shafer
134 pages
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
No ratings yet
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
31 pages
Chapter 3 AI
No ratings yet
Chapter 3 AI
35 pages
Tutorial 08 Part 1
No ratings yet
Tutorial 08 Part 1
95 pages
Ocean Razor
No ratings yet
Ocean Razor
15 pages
07 Probability Review
No ratings yet
07 Probability Review
56 pages
Unit3pdf 2025 01 14 10 38 08
No ratings yet
Unit3pdf 2025 01 14 10 38 08
4 pages
PyCon 2015 - Bayesian Statistics Made Simple
100% (4)
PyCon 2015 - Bayesian Statistics Made Simple
145 pages
INT254 Unit-1
No ratings yet
INT254 Unit-1
29 pages
Unit2 AI & ML
No ratings yet
Unit2 AI & ML
29 pages
15CS73 Module 4
No ratings yet
15CS73 Module 4
60 pages
4th Module - Probabilistic Reasoning
No ratings yet
4th Module - Probabilistic Reasoning
24 pages
Probability Course Overview
No ratings yet
Probability Course Overview
56 pages
Causes of Uncertainty
No ratings yet
Causes of Uncertainty
14 pages
Model Selection/ Structure Learning Koller & Friedman Chapter 14 Mackay Chapter 28
No ratings yet
Model Selection/ Structure Learning Koller & Friedman Chapter 14 Mackay Chapter 28
49 pages
Introduction To Probabilistic Learning
No ratings yet
Introduction To Probabilistic Learning
9 pages
L2 - Mathematical Preliminaries.
No ratings yet
L2 - Mathematical Preliminaries.
42 pages
Probabilistic Reasoning
No ratings yet
Probabilistic Reasoning
37 pages
AIML Module 3,4
No ratings yet
AIML Module 3,4
16 pages
5 Prob
No ratings yet
5 Prob
35 pages
Bayesian Inference & Networks Guide
100% (1)
Bayesian Inference & Networks Guide
21 pages
Module 4 - Bayesian Learning
No ratings yet
Module 4 - Bayesian Learning
36 pages
Introduction To Bayesian Statistics: Foo Lee Kien (PHD)
No ratings yet
Introduction To Bayesian Statistics: Foo Lee Kien (PHD)
65 pages
AI Unit 5
No ratings yet
AI Unit 5
35 pages
1 Intro
No ratings yet
1 Intro
9 pages
CLASS 2025 Bayesian Framework
No ratings yet
CLASS 2025 Bayesian Framework
46 pages
ExplainingBayesTheorem PDF
No ratings yet
ExplainingBayesTheorem PDF
18 pages
Pri Eng 2ed tr5 Diagnostic Tests Answers
100% (1)
Pri Eng 2ed tr5 Diagnostic Tests Answers
3 pages
Lectura - Transformational Leadership Principles and Tatics For The Nurse Executive To Shift Nursing Culture2020 - Semana7
No ratings yet
Lectura - Transformational Leadership Principles and Tatics For The Nurse Executive To Shift Nursing Culture2020 - Semana7
10 pages
Silicon Kenya - Harnessing ICT Innovations For Economic Development 2013
No ratings yet
Silicon Kenya - Harnessing ICT Innovations For Economic Development 2013
60 pages
General Practice Nursing Guide
No ratings yet
General Practice Nursing Guide
62 pages
Writing About Testing Worries Boosts Exam Performance in The Classroom
No ratings yet
Writing About Testing Worries Boosts Exam Performance in The Classroom
3 pages
Healing Code Guide: Hand Positions & Prayer
100% (23)
Healing Code Guide: Hand Positions & Prayer
4 pages
Siri, Siri, in My Hand
No ratings yet
Siri, Siri, in My Hand
5 pages
Medical Terminology Complete! Plus With Mylab Medical Terminology Pearson
No ratings yet
Medical Terminology Complete! Plus With Mylab Medical Terminology Pearson
404 pages
Grade 7 Essential Learning Competencies
100% (1)
Grade 7 Essential Learning Competencies
12 pages
LÝ LUẬN GIẢNG DẠY NGOẠI NGỮ
No ratings yet
LÝ LUẬN GIẢNG DẠY NGOẠI NGỮ
9 pages
Action Pack 7 TB
50% (8)
Action Pack 7 TB
128 pages
School of Educators Join WhatsAap Groups Students & Teachers
No ratings yet
School of Educators Join WhatsAap Groups Students & Teachers
6 pages
Sample Thesis Survey Questionnaires
100% (4)
Sample Thesis Survey Questionnaires
7 pages
Cover Letter Admission Staff
No ratings yet
Cover Letter Admission Staff
1 page
TOGI Harvard Referencing Guide
No ratings yet
TOGI Harvard Referencing Guide
8 pages
Time
No ratings yet
Time
11 pages
Care of The Client With Pulmonary Tuberculosis Utilizing Orem's Theory
No ratings yet
Care of The Client With Pulmonary Tuberculosis Utilizing Orem's Theory
13 pages
DLL Els Quarter 1 Week 8 Exam
No ratings yet
DLL Els Quarter 1 Week 8 Exam
2 pages
SHS Shaping Paper May-22
No ratings yet
SHS Shaping Paper May-22
15 pages
Aesthetic Attendance Sheet Rating Sheets and Class Record 2
No ratings yet
Aesthetic Attendance Sheet Rating Sheets and Class Record 2
21 pages
Jeroo Billimoria: Pioneering Social Entrepreneurship in India
No ratings yet
Jeroo Billimoria: Pioneering Social Entrepreneurship in India
8 pages
Module Week2
No ratings yet
Module Week2
9 pages
Book Report Guidelines and Rubric
No ratings yet
Book Report Guidelines and Rubric
2 pages
Love Yourself: English Lesson Plan
No ratings yet
Love Yourself: English Lesson Plan
2 pages
ECSA Educational Evaluation App Form
No ratings yet
ECSA Educational Evaluation App Form
9 pages
Louisiana FBLA Competitor Guide
No ratings yet
Louisiana FBLA Competitor Guide
9 pages
Understanding Bronfenbrenner's Ecological Theory
100% (1)
Understanding Bronfenbrenner's Ecological Theory
6 pages
Research Methods For Criminal Justice and Criminology A Text and Reader 1st Edition Christine Tartaro Updated 2025
No ratings yet
Research Methods For Criminal Justice and Criminology A Text and Reader 1st Edition Christine Tartaro Updated 2025
103 pages
ESL Adult Speaking Game: Vacation
No ratings yet
ESL Adult Speaking Game: Vacation
11 pages
Hayley Jo Mund: Education
No ratings yet
Hayley Jo Mund: Education
2 pages

Lecture 13

Uploaded by

Lecture 13

Uploaded by

Class announcements

• Recitations Th, F 4 PM – 46-3189

• PSet 1 out today, due Oct 3. Other psets due

• Classes next week are virtual: We will have a guest

Basic Bayesian cognition

Main phenomena to explain:

• We’re going to use this to introduce Bayesian

• Multiple representational systems: rules and

• Level 1: Computational theory

Representational interpretations for H:

• H: Hypothesis space of possible concepts:

• X = {x1, . . . , xn}: n examples of a concept C.

• Captures the intuition of a “representative” sample, versus

Data slightly more of a coincidence under h1

Data much more of a coincidence under h1

• Captures the intuition of a “representative” sample, versus

e.g., X = {60 80 10 30}:

• X = {60, 80, 10, 30}

• Also degrees of plausibility within a theory,

From hypotheses to predictions:

In general, we have the law of total probability:

In general, we have the law of total probability:

• How do the statistics of the examples interact with

• Why does generalization appear rule-based or

broad p(h|X): similarity gradient

• How do the statistics of the examples interact with

• Why does generalization appear rule-based or

broad p(h|X): Many h of similar size, or

“Weak sampling” p( X | h) µ 1 if x1 ,!, xn Î h

2. Maximum a posteriori (MAP)

Bayes with weak sampling Maximum a posteriori (MAP) / subset

• A model of high-level, knowledge-driven inductive reasoning

Weighting different rectangle

• Priors, likelihoods derived from simple assumptions.

You might also like