0% found this document useful (0 votes)

65 views3 pages

Machine Learning and Pattern Recognition Expectations

This document discusses probability theory concepts like expectations, means, and variances. It defines these terms and shows how to calculate them for different probability distributions. It also discusses how means, variances and standard deviations change when variables are shifted, scaled or combined.

Uploaded by

zeliawillscumberg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views3 pages

Machine Learning and Pattern Recognition Expectations

Uploaded by

zeliawillscumberg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Expectations and sums of variables

You are expected to know some probability theory, including expectations/averages. This
sheet reviews some of that background.

1 Probability Distributions / Notation

The notation on this sheet follows MacKay’s textbook, available online here:
https://www.inference.org.uk/itila/book.html
An outcome, x, comes from a discrete set or ‘alphabet’ A X = { a1 , a2 , . . . , a I }, with corre-
sponding probabilities P X = { p1 , p2 , . . . , p I }.
Examples:
A standard six-sided die has A X = {1, 2, 3, 4, 5, 6} with corresponding probabilities P X =
{1/6, 1/6, 1/6, 1/6, 1/6, 1/6}.
A Bernoulli distribution, which has probability distribution

p
 x = 1,
P( x ) = 1 − p x = 0, (1)

0 otherwise,


has alphabet A X = {1, 0} with P X = { p, 1 − p}.

2 Expectations
An expectation is a property of a probability distribution, defined by a probability-weighted
sum. The expectation of some function, f , of an outcome, x, is:
I
EP( x) [ f ( x )] = ∑ p i f ( a i ). (2)
i =1

Often the subscript P( x ) is dropped from the notation because the reader knows under
which distribution the expectation is being taken. Notation can vary considerably, and details
are often dropped. You might also see E[ f ], E [ f ], or h f i, which all mean the same thing.
The expectation is sometimes a useful representative value of a random function value. The
expectation of the identity function, f ( x ) = x, is the ‘mean’, which is one measure of the
centre of a distribution.
The expectation is a linear operator:
E[ f ( x ) + g( x )] = E[ f ( x )] + E[ g( x )] and E[c f ( x )] = cE[ f ( x )]. (3)
These properties are apparent if you explicitly write out the summations.
The expectation of a constant with respect to x is the constant:
I
E[c] = c ∑ pi = c, (4)
i =1

because probability distributions sum to one (‘probabilities are normalized’).

The expectation of independent outcomes separate:
E[ f ( x ) g(y)] = E[ f ( x )] E[ g(y)]. (5)
True if x and y are independent.
Exercise 1: prove this. (Answers at the end of the note.)

MLPR:w0f Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 1

3 The mean
The mean of a distribution over a number, is simply the ‘expected’ value of the numerical
outcome.
I
‘Expected Value’ = ‘mean’ = µ = E[ x ] = ∑ pi ai . (6)
i =1
For a six-sided die:
1 1 1 1 1 1
E[ x ] = × 1 + × 2 + × 3 + × 4 + × 5 + × 6 = 3.5. (7)
6 6 6 6 6 6
In every day language I wouldn’t say that I ‘expect’ to see 3.5 as the outcome of throwing
a die. . . I expect to see an integer! However, 3.5 is the ‘expected value’ as it is commonly
defined. Similarly a single Bernoulli outcome will be a zero or a one, but its ‘expected’ value
is a fraction,
E[ x ] = p × 1 + (1 − p)× 0 = p, (8)
the probability of getting a one.
Change of units: I might have a distribution over heights measured in metres, for which I
have computed the mean. If I multiply the heights by 100 to obtain heights in centimetres,
the mean in centimetres can be obtained by multiplying the mean in metres by 100. Formally:
E[100 x ] = 100 E[ x ].

4 The variance
The variance is also an expectation, measuring the average squared distance from the mean:

var[ x ] = σ2 = E[( x − µ)2 ] = E[ x2 ] − E[ x ]2 , (9)

where µ = E[ x ] is the mean.

Exercise 2: prove that E[( x − µ)2 ] = E[ x2 ] − E[ x ]2 .
Exercise 3: show that var[cx ] = c2 var[ x ].
Exercise 4: show that var[ x + y] = var[ x ] + var[y], for independent outcomes x and y.
Exercise 5: Given outcomes distributed with mean µ and variance σ2 , how could you shift
and scale them to have mean zero and variance one?
Change of units: If the outcome x is a height measured in metres, then x2 has units m2 ;
x2 is an area. The variance also has units m2 , it cannot be represented on the same scale as
the outcome, because it has different units. If you multiply all heights by 100 to convert to
centimetres, the variance is multiplied by 1002 . Therefore, the relative size of the mean and
the variance depends on the units you use, and so often isn’t meaningful.
Standard deviation: The standard deviation σ, the square root of the variance, does have the
same units as the mean. The standard deviation is often used as a measure of the typical
distance from the mean. Often variances are used in intermediate calculations because
they are easier to deal with: it is variances that add (as in Exercise 4 above), not standard
deviations.

5 Sums of independent variables: “random walks”

A drunkard starts at the centre of an alleyway, with exits at each end. He takes a sequence
of random staggers either to the left or right along the alleyway. His position after N steps
is k N = ∑nN=1 xn , where the outcomes, { xn }, the staggering motions, are drawn from a
distribution with zero mean and finite variance σ2 . For example A X = {−1, +1} with
P X = {1/2, 1/2}, which has E[ xn ] = 0 and var[ xn ] = 1.

MLPR:w0f Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 2

If the drunkard started in the centre of the alleyway, will he ever escape? If so, roughly how
long will it take? (If you don’t already know, have a think. . . )

The expected, or mean position after N steps is E[k N ] = NE[ xn ] = 0. This doesn’t mean we
don’t think the drunkard will escape. There are ways of escaping both left and right, it’s just
‘on average’ that he’ll stay in the middle.
The variance of the drunkard’s position is√var[k N ] = Nvar[ xn ] = Nσ2 . The standard de-
viation of the position is then std[k N ] = Nσ, which is a measure of the width of the
distribution over the displacement from the centre of the alleyway. If we double the length
of the alley, then it will typically take four times the number of random steps to escape.
Worthwhile √ remembering: the typical magnitude of the sum of N independent zero-mean variables
scales with N. The individual variables need to have finite variance, and ‘typical magnitude’
is measured by standard deviation. Sometimes you might have to work out the σ for your
problem, or do other detailed calculations. But sometimes the scaling of the width of the
distribution is all that really matters.
Corollary: the typical √
magnitude of the mean of N independent zero-mean variables with finite
variance scales with 1/ N.

6 Solutions
As always, you are strongly recommended to work hard on a problem yourself before looking
at the solutions. As you transition into doing research, there won’t be any answers, and you
have to build confidence in getting and checking your own answers.

Exercise 1: For independent outcomes x and y, p( x, y) = p( x ) p(y) and so

E[ f ( x ) g(y)] = ∑ x ∑y p( x ) p(y) f ( x ) g(y) = ∑ x p( x ) f ( x ) ∑y p(y) g(y) = E[ f ( x )]E[ g(y)].

Exercise 2: E[( x − µ)2 ] = E[ x2 + µ2 − 2xµ] = E[ x2 ] + µ2 − 2µE[ x ] = E[ x2 ] − µ2 .

Exercise 3: var[cx ] = E[(cx )2 ] − E[cx ]2 = E[c2 x2 ] − (cE[ x ])2 = c2 (E[ x2 ] − E[ x ]2 ) = c2 var[ x ].
Exercise 4: var[ x + y] = E[( x + y)2 ] − E[ x + y]2 = E[ x2 ] + E[y2 ] + 2E[ xy] − (E[ x ]2 + E[y]2 +
2E[ x ]E[y]) = var[ x ] + var[y], if E[ xy] = E[ x ]E[y], true if x and y are independent variables.
Exercise 5: z = ( x − µ)/σ has mean 0 and variance 1. The division is by the standard
deviation, not the variance. You should now be able to prove this result for yourself.

What to remember: using the expectation notation where possible, rather than writing out
the summations or integrals explicitly, makes the mathematics concise.

MLPR:w0f Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 3

MLPR w0f - Machine Learning and Pattern Recognition
No ratings yet
MLPR w0f - Machine Learning and Pattern Recognition
3 pages
Lecture 15 N
No ratings yet
Lecture 15 N
37 pages
Probability Concepts Explained
No ratings yet
Probability Concepts Explained
21 pages
Expected Value in Discrete Variables
No ratings yet
Expected Value in Discrete Variables
10 pages
Mit18 05 s22 Class04-Prep-B
No ratings yet
Mit18 05 s22 Class04-Prep-B
7 pages
323 Egec
No ratings yet
323 Egec
18 pages
Basic Probability Reference Sheet: February 27, 2001
No ratings yet
Basic Probability Reference Sheet: February 27, 2001
8 pages
Mathematical Expectation Discrete
No ratings yet
Mathematical Expectation Discrete
23 pages
Probability Theory Essentials
No ratings yet
Probability Theory Essentials
3 pages
Lecture - 02
No ratings yet
Lecture - 02
36 pages
Class6 Prep A
No ratings yet
Class6 Prep A
7 pages
New Yet To Read Dmop Post Mid
No ratings yet
New Yet To Read Dmop Post Mid
10 pages
3 Expectation
No ratings yet
3 Expectation
70 pages
PHY114 - Lecture - Notes - Lecture 02
No ratings yet
PHY114 - Lecture - Notes - Lecture 02
7 pages
Chapter 4 Slides
No ratings yet
Chapter 4 Slides
27 pages
Probability for Engineering Students
No ratings yet
Probability for Engineering Students
10 pages
Stats 1 - IITM BS Notes - Part 4
No ratings yet
Stats 1 - IITM BS Notes - Part 4
16 pages
Sampling Distributions:: N X X X X
No ratings yet
Sampling Distributions:: N X X X X
3 pages
Distribuciones de Probabilidades
No ratings yet
Distribuciones de Probabilidades
10 pages
ProbabilityStatistics Probability2
No ratings yet
ProbabilityStatistics Probability2
11 pages
Iit Random Variable Semester
No ratings yet
Iit Random Variable Semester
44 pages
Probability and Statistics: B Madhav Reddy Madhav.b@srmap - Edu.in
No ratings yet
Probability and Statistics: B Madhav Reddy Madhav.b@srmap - Edu.in
17 pages
Stats Cheat Sheets
No ratings yet
Stats Cheat Sheets
15 pages
Mod2 4
No ratings yet
Mod2 4
19 pages
Week Two Note
No ratings yet
Week Two Note
19 pages
4 Random Variables
No ratings yet
4 Random Variables
4 pages
Basic Probability & Statistics Review
No ratings yet
Basic Probability & Statistics Review
20 pages
Chapter Six Expectation: Page 1 of 16
No ratings yet
Chapter Six Expectation: Page 1 of 16
16 pages
Expectations 13 Pages
No ratings yet
Expectations 13 Pages
13 pages
Understanding Mean and Variance
No ratings yet
Understanding Mean and Variance
14 pages
Introductory Probability and The Central Limit Theorem
No ratings yet
Introductory Probability and The Central Limit Theorem
11 pages
04 Estimation
No ratings yet
04 Estimation
48 pages
Mit18 05 s22 Class05-Prep-A
No ratings yet
Mit18 05 s22 Class05-Prep-A
7 pages
Lecture 6. Statistics
No ratings yet
Lecture 6. Statistics
28 pages
13 Discrete RV
No ratings yet
13 Discrete RV
29 pages
2.3 Expectation of Random Variables
No ratings yet
2.3 Expectation of Random Variables
3 pages
LN06 Random Variables
No ratings yet
LN06 Random Variables
5 pages
For Statsprob 1st PPT in 2nd Sem
No ratings yet
For Statsprob 1st PPT in 2nd Sem
7 pages
060 Random Variables
No ratings yet
060 Random Variables
5 pages
SDM 1 Formula
No ratings yet
SDM 1 Formula
9 pages
5-Mean - Var
No ratings yet
5-Mean - Var
46 pages
Manual For Instructors: TO Linear Algebra Fifth Edition
No ratings yet
Manual For Instructors: TO Linear Algebra Fifth Edition
12 pages
Mathematical Expectation in Statistics
No ratings yet
Mathematical Expectation in Statistics
3 pages
Probability & Statistics for Students
No ratings yet
Probability & Statistics for Students
11 pages
Mathematical Expectation
No ratings yet
Mathematical Expectation
17 pages
Mean and Variance of Random Variables and Probability Distribution Discussion
No ratings yet
Mean and Variance of Random Variables and Probability Distribution Discussion
36 pages
Lecture 3 - Adv. Probability - Discrete Random Variables
No ratings yet
Lecture 3 - Adv. Probability - Discrete Random Variables
51 pages
Statistical Methods in Transport Analysis
No ratings yet
Statistical Methods in Transport Analysis
20 pages
Random Variables: Prof. Megha Sharma
No ratings yet
Random Variables: Prof. Megha Sharma
35 pages
Probability & Random Variables
No ratings yet
Probability & Random Variables
8 pages
Expectation and Variance Guide
No ratings yet
Expectation and Variance Guide
39 pages
Expectation 2014
No ratings yet
Expectation 2014
2 pages
Capitulo 007
No ratings yet
Capitulo 007
10 pages
Probability Slides
No ratings yet
Probability Slides
12 pages
Expected Value:) P (X X X E
No ratings yet
Expected Value:) P (X X X E
28 pages
Mathematical Foundations of Computer Science Lecture Outline
No ratings yet
Mathematical Foundations of Computer Science Lecture Outline
5 pages
Biological Data Science Lecture4
No ratings yet
Biological Data Science Lecture4
21 pages
Master of Science in Renewable Energy and Management
No ratings yet
Master of Science in Renewable Energy and Management
1 page
w2c Central Limit
No ratings yet
w2c Central Limit
1 page
Award in Education and Training Sample
No ratings yet
Award in Education and Training Sample
9 pages
BDS 2018-19
No ratings yet
BDS 2018-19
6 pages
Doing Business in Hungary
No ratings yet
Doing Business in Hungary
22 pages
BDS 2016-17
No ratings yet
BDS 2016-17
4 pages
Biological Data Science Lecture6
No ratings yet
Biological Data Science Lecture6
29 pages
W2e Multivariate Gaussian
No ratings yet
W2e Multivariate Gaussian
6 pages
Week 2 Naive Bayes
No ratings yet
Week 2 Naive Bayes
15 pages
MATH11183 Week 1-Part 2
No ratings yet
MATH11183 Week 1-Part 2
18 pages
Part 4
No ratings yet
Part 4
24 pages
MDA3S
No ratings yet
MDA3S
22 pages
Part 5
No ratings yet
Part 5
31 pages
Week 8 Pca
No ratings yet
Week 8 Pca
26 pages
PMRslides 02
No ratings yet
PMRslides 02
13 pages
W6a Gaussian Process Kernels
No ratings yet
W6a Gaussian Process Kernels
6 pages
Bayesian Week4 LectureNotes
No ratings yet
Bayesian Week4 LectureNotes
15 pages
Part 3
No ratings yet
Part 3
29 pages
TS Part2
No ratings yet
TS Part2
62 pages
PMRslides 03 B
No ratings yet
PMRslides 03 B
45 pages
Bio Statslectures
No ratings yet
Bio Statslectures
60 pages
w9b Netflix Prize
No ratings yet
w9b Netflix Prize
3 pages
2017 AMAM Exam Paper
No ratings yet
2017 AMAM Exam Paper
6 pages
Slides 03 A
No ratings yet
Slides 03 A
21 pages
Bayesian Workshop1 Solution
No ratings yet
Bayesian Workshop1 Solution
3 pages
Heat Advection
No ratings yet
Heat Advection
12 pages
Laplace Approximation in Bayesian Logistic Regression
No ratings yet
Laplace Approximation in Bayesian Logistic Regression
4 pages
2019 AMAM Exam Paper
No ratings yet
2019 AMAM Exam Paper
3 pages
Mini Voting System: Submitted by
No ratings yet
Mini Voting System: Submitted by
10 pages
Lesson 2 Possessive Adjectives and Possessive Pronouns
No ratings yet
Lesson 2 Possessive Adjectives and Possessive Pronouns
2 pages
9702 PHYSICS: MARK SCHEME For The October/November 2012 Series
No ratings yet
9702 PHYSICS: MARK SCHEME For The October/November 2012 Series
2 pages
MATHS Progression Test
No ratings yet
MATHS Progression Test
4 pages
The RATER Model
No ratings yet
The RATER Model
2 pages
Eldar Inq
No ratings yet
Eldar Inq
8 pages
Mahabaleshwar Travel Guide
No ratings yet
Mahabaleshwar Travel Guide
4 pages
To Solve The Integer Programming
No ratings yet
To Solve The Integer Programming
3 pages
Corporate Governance & Regulation Notes
No ratings yet
Corporate Governance & Regulation Notes
3 pages
Design of Machine Members-I
No ratings yet
Design of Machine Members-I
8 pages
Chapter 21 Intangible Assets
No ratings yet
Chapter 21 Intangible Assets
10 pages
Brochure BAG Brand-Book A5 CMYK 2017-11-30
No ratings yet
Brochure BAG Brand-Book A5 CMYK 2017-11-30
15 pages
UAN PF Final Settlement Form Submission
No ratings yet
UAN PF Final Settlement Form Submission
2 pages
CUET 2023 History Question Paper May 24 Shift 2
No ratings yet
CUET 2023 History Question Paper May 24 Shift 2
9 pages
Art Critique Graphic Organizer Directions
No ratings yet
Art Critique Graphic Organizer Directions
4 pages
Grow Publisher Terms of Service
No ratings yet
Grow Publisher Terms of Service
5 pages
March Quarter 2025 Results
No ratings yet
March Quarter 2025 Results
21 pages
INT69 SC2 Diagnose
No ratings yet
INT69 SC2 Diagnose
2 pages
Nahar 2020
No ratings yet
Nahar 2020
17 pages
Assignment - DMBA203 - MBA 2 - Set 1 and 2 - Feb-March 2024
No ratings yet
Assignment - DMBA203 - MBA 2 - Set 1 and 2 - Feb-March 2024
3 pages
Resume of Mohammed Abubacker
No ratings yet
Resume of Mohammed Abubacker
4 pages
Second Quarter Test in Food and Beverage Services
75% (8)
Second Quarter Test in Food and Beverage Services
8 pages
Nookala Sri Likhitha Annapurna Profile
No ratings yet
Nookala Sri Likhitha Annapurna Profile
29 pages
2012 AJ BestOBest
No ratings yet
2012 AJ BestOBest
48 pages
Senior Engineer Integration Kafka Confluent Kubernetes AWS Job Description
No ratings yet
Senior Engineer Integration Kafka Confluent Kubernetes AWS Job Description
2 pages
Maintenance - BA's
No ratings yet
Maintenance - BA's
8 pages
Shahina
No ratings yet
Shahina
31 pages
Cost Benefit
No ratings yet
Cost Benefit
9 pages
Derrick Screen Project
No ratings yet
Derrick Screen Project
15 pages
Hydrogeology Insights for Miners
No ratings yet
Hydrogeology Insights for Miners
17 pages

Machine Learning and Pattern Recognition Expectations

Uploaded by

Machine Learning and Pattern Recognition Expectations

Uploaded by

Expectations and sums of variables

1 Probability Distributions / Notation

has alphabet A X = {1, 0} with P X = { p, 1 − p}.

because probability distributions sum to one (‘probabilities are normalized’).

MLPR:w0f Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 1

var[ x ] = σ2 = E[( x − µ)2 ] = E[ x2 ] − E[ x ]2 , (9)

where µ = E[ x ] is the mean.

5 Sums of independent variables: “random walks”

MLPR:w0f Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 2

Exercise 1: For independent outcomes x and y, p( x, y) = p( x ) p(y) and so

Exercise 2: E[( x − µ)2 ] = E[ x2 + µ2 − 2xµ] = E[ x2 ] + µ2 − 2µE[ x ] = E[ x2 ] − µ2 .

MLPR:w0f Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 3

You might also like