0% found this document useful (0 votes)

17 views26 pages

Bayesian

This document provides an introduction to Bayesian statistics. It discusses key concepts such as random variables, probability density functions, means, variances, and numerical examples. The document defines random variables as having either a probability density function for continuous variables or a probability mass function for discrete variables. It also explains how to calculate the mean or expectation of a random variable and introduces two common ways to express the variance - as the expected value of the squared deviation from the mean or as the expected value of the squared random variable minus the squared expected value.

Uploaded by

twqtwtw6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views26 pages

Bayesian

Uploaded by

twqtwtw6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Introduction to Bayesian Statistics

Richard Yi Da Xu

School of Computing & Communication, UTS

January 31, 2017

Richard Yi Da Xu Introduction to Bayesian Statistics

Random variables

I Pre-university: A number is just a fixed value.

When we talk about probabilities:
I When X is a continuous random variable, it has a probability density function (pdf)
I When X is a discrete random variable, it has a probability mass function (pmf)
p(x) = p(X = x) means that:
The probability when a random variable X is equal to a fixed number x, i.e.,

the probablity that number of machine learning participants = 20

Richard Yi Da Xu Introduction to Bayesian Statistics

Mean or Expectation

I discrete case:
N
1 X
µ = E(X ) = xi
N
i=1
I continous case:
Z
µ = E(X ) = xp(x)dx
x∈S
I can also measure the expecation of a function:
Z
E(f (X )) = f (x)p(x)dx
x∈S

For example,
Z Z
E(cos(X )) = cos(x)p(x)dx E(X 2 ) = x 2 p(x)dx
x∈S x∈S
I What about f (E(X )): Discuss later when we discuss Jensens Equality in
Expecation-Maximization

Richard Yi Da Xu Introduction to Bayesian Statistics

Variances an intuitive explanation

I You have data X = {2, 3, 3, 2, 1, 4}, i.e., x1 = 2, x2 = 3, . . . x6 = 4

I You have the mean:

2+3+3+2+1+4
µ= = 2.5
6
I The variance is then:

N
2 1 X 2
VAR(data) = σ = (xi − µ)
N i=1

I Division by N is intuitive. Otherwise, more data implies more variance

I Also think about what kind of values can VAR and σ take? - we will look at what kind of
distribuiton is required for them.

Richard Yi Da Xu Introduction to Bayesian Statistics

Two alternative expression:
People sometimes use: Other times, people use:
I You have data X = {2, 3, 3, 2, 1, 4}, I You have data X = {1, 2, 3, 4}, and
i.e., x1 = 2, x2 = 3, . . . x6 = 4 P(X = 1) = 62 , P(X = 2) = 26 , P(X = 3) = 1
6
and P(X = 4) = 16 .
N
2 1 X 2
VAR(X ) = σ = (xi − µ) 2
X 2
N i=1 Discrete :VAR(X ) = σ = (x − µ) p(x)
x∈X
N N N
1 X 2 1 X 1 X 2 Z
= xi − 2xi µ + µ Continous :VAR(X ) = σ =
2
(x − µ) p(x)
2
N i=1 N i=1 N i=1
x∈X
N N
1 X 2 1 X 2
= x − 2µ xi +µ
N i=1 i N i=1 X
2 2
VAR(X ) = x − 2µx + µ p(x)
| {z } x∈X
µ
2 2
X X X
N
! = x p(x) − 2 µxp(x) + µ p(x)
1 X 2 2
= x −µ x∈X x∈X x∈X
N i=1 i X 2
X 2
X
= x p(x) − 2µ xp(x) +µ p(x)
x∈X x∈X x∈X
| {z } | {z }
µ 1
 
2 2
X
= x p(x) − µ
x∈X
It’s easy to verify that both sides are the same

Richard Yi Da Xu Introduction to Bayesian Statistics

Numerical example

First version Second version

I X = {2, 3, 3, 2, 1, 4}, i.e., I X = {1, 2, 3, 4}, and P(X = 1) = 62 ,
x1 = 2, x2 = 3, . . . x6 = 4 P(X = 2) = 26 , P(X = 3) = 16 and
P(X = 4) = 16 .
N
!
1 X 2 2
VAR(X ) = x −µ
N i=1 i 2 2
X
Discrete :VAR(X ) = σ = (x − µ) p(x)
x∈X
1 2 2 2
= (2 − 2.5) + (3 − 2.5) + (3 − 2.5) 2
Z
2
6 Continous :VAR(X ) = σ = (x − µ) p(x)
2 2 2 x∈X
+ (2 − 2.5) + (1 − 2.5) + (4 − 2.5)
| {z }
f (x)
≈ 0.917
 
2 2
X
VAR(X ) =  x p(x) − µ
x∈X

1 2 22 22
(1 − 2.5) + (2 − 2.5) + (3 − 2.5) +
6 6 6
21
(4 − 2.5)
6
≈ 0.917
Both sides are the same

Richard Yi Da Xu Introduction to Bayesian Statistics

Important fact of the Variances

Z
VAR(X ) = E[(X − E(X ))2 ] = (x − µ)2 p(x)dx
x∈S
Z Z Z
= x 2 p(x)dx − 2µ xp(x)dx + µ2 xp(x)dx
x∈S x∈S x∈S

= E(X ) − (E(X))2
2

Think VAR(X ) as “mean-subtracted” second order moment of random variable X .

Richard Yi Da Xu Introduction to Bayesian Statistics

Joint distributions

I The following is a tablet form of joint density Pr(X , Y ):

Y =0 Y =1 Y =2 Total
3 3 6
X =0 0 15 15 15
2 6 8
X =1 15 15
0 15
1 1
X =2 15
0 0 15
3 9 3
Total 15 15 15
1
I This table shows Pr(X , Y ) or Pr(X = x, Y = y).
6
I For example, p(X = 1, Y = 1) = 15
:
I exercise what is the probablity that X = 2, Y = 1?
I exercise what is the probablity that X = 3, Y = 2?
I exercise what is the value of:

2 X
X 2
Pr(X = i, Y = j)?
i=0 j=0

Richard Yi Da Xu Introduction to Bayesian Statistics

Marginal distributions

Y =0 Y =1 Y =2 Total
3 3 6
X =0 0 15 15 15
2 6 8
X =1 15 15
0 15
1 1
X =2 15
0 0 15
3 9 3
Total 15 15 15
1

I Using sum rule, the marginal distribution tells us that:

X Z
Pr(X ) = Pr(x, y) or p(X ) = p(x, y)dy
y ∈Sy y∈Sy

I For example:

2 X
2
X 3 6 0 9
Pr(Y = 1) = p(x = i, y = 1) = + + =
15 15 15 15
i=0 j=0

I exercise what is Pr(X = 2) and Pr(X = 1)?

Richard Yi Da Xu Introduction to Bayesian Statistics

Conditional distributions

Y =0 Y =1 Y =2 Total
3 3 6
X =0 0 15 15 15
2 6 8
X =1 15 15
0 15
1 1
X =2 15
0 0 15
3 9 3
Total 15 15 15
1

I Conditional density:

p(X , Y ) p(Y |X )p(X ) p(Y |X )p(X )

p(X |Y ) = = = P
p(Y ) p(Y ) X p(Y |X )p(X )

I What about p(X |Y = y)? Pick an example:

p(X = 1, Y = 1) 6/15 2
p(X = 1|Y = 1) = = =
p(Y = 1) 9/15 3

Richard Yi Da Xu Introduction to Bayesian Statistics

Conditional distributions: Exercise

Y =0 Y =1 Y =2 Total
3 3 6
X =0 0 15 15 15
2 6 8
X =1 15 15
0 15
1 1
X =2 15
0 0 15
3 9 3
Total 15 15 15
1

I The formulation for conditional density:

p(X , Y ) p(Y |X )p(X ) p(Y |X )p(X )

p(X |Y ) = = = P
p(Y ) p(Y ) X p(Y |X )p(X )

I exercise: What is p(X = 2|Y = 1)?

I exercise: What is p(X = 1|Y = 2)?

Richard Yi Da Xu Introduction to Bayesian Statistics

Independence

If X and Y are independent:

I p(X |Y ) = p(X )
I p(X , Y ) = P(X )P(Y )
I Both factors are related when A and B are independent:

p(X , Y ) p(X )p(Y )

p(X |Y ) = = = p(X )
p(Y ) p(Y )

Y =0 Y =1 Y =2 Total Y =0 Y =1 Y =2 Total
3 3 6 18 54 18 6
X =0 0 15 15 15 X =0 225 225 225 15
2 6 8 24 72 24 8
X =1 15 15 0 15 X =1 225 225 225 15
1 1 3 9 3 1
X =2 15 0 0 15 X =2 225 225 225 15
3 9 3 3 9 3
Total 15 15 15 1 Total 15 15 15 1

X and Y are NOT independent X and Y are independent

Richard Yi Da Xu Introduction to Bayesian Statistics

Conditional Independence

I Imagine we have three random variables: X , Y and Z :

I Once we know Z , then knowing Y does NOT tell us any additional information
about X
I Therefore:

Pr(X |Y , Z ) = Pr(X |Z )

I This means that X is conditionally independent of Y given Z .

I If Pr(X |Y , Z ) = Pr(X |Z ), then what about Pr(X , Y |Z )?

Pr(X , Y , Z ) Pr(X |Y , Z ) Pr(Y , Z )

Richard Yi Da Xu Introduction to Bayesian Statistics

An example of Conditional Independence

We will study Dynamic model later.

xt−1 xt xt+1

yt−1 yt yt+1

From this model, we can see:

p(xt |x1 , . . . , xt−1 , y1 , . . . , yt−1 ) = p(xt |xt−1 )

p(yt |x1 , . . . , xt−1 , xt , y1 , . . . , yt−1 ) = p(yt |xt )

Right now, think of if a given variable is the only item that “blocks” the path between two
(or more) variables.

Richard Yi Da Xu Introduction to Bayesian Statistics

Another Example: Bayesian Linear Regression

We have data pairs:

I Input: X = x1 , . . . xN
I Output: Y = y1 , . . . yN
Each pair, xi and yi are related through model equation:

yi = f (xi |w) + N (0, σ 2 )

I Input alone isn’t going to tell you model parameter: p(w|X ) = p(w)
I Output alone isn’t going to tell you model parameter: p(w|Y ) = p(w)
I Obviously: p(w|X , Y ) 6= p(w)
Posterior over parameter w:

p(y|w, x)p(w|x)p(x) p(y|w, x)p(w) p(y |w, x)p(w)

p(w|x, y) = = = R
p(y|x)p(x) p(y|x) w p(y|x, w)p(w)

Richard Yi Da Xu Introduction to Bayesian Statistics

Expectation of Joint probabilities

Given that X , Y is a two-dimensional random variable:

I Continous case:

Z Z
E[f (X , Y )] = f (x, y)p(x, y)dxdy
y ∈Sy x∈Sx

I Discrete case:

Ni jN
X X
E[f (X , Y )] = f (X = i, Y = j)p(X = i, Y = j)
i=1 j=1

Richard Yi Da Xu Introduction to Bayesian Statistics

Numerical Example:

Y =1 Y =2 Y =3 Y =1 Y =2 Y =3
3 3 X =1 6 7 8
X =1 0 15 15
X =2 2 6
0 X =2 3 6 2
15 15 X =3 1 8 6
1
X =3 15 0 0
f (X,Y)
p (X,Y)

Ni jN
X X
E[f (X , Y )] = f (X = i, Y = j)p(X = i, Y = j)
i=1 j=1

3 3 2 6
=6×0+7× +8× +3× +6×
15 15 15 15
1
+2×0+1× +8×0+6×0
15

Richard Yi Da Xu Introduction to Bayesian Statistics

Conditional Expectation

It’s a useful property for later

Z
E(Y ) = E(Y |X )p(X )dx
ZX Z Z Z
= yp(Y |X )dy p(X )dx = yp(Y , X )dy dx
X Y X Y
| {z }
Z Z
= y p(Y , X )dx dy
Y X
Z
= yp(Y )dy = E(Y )
Y

Richard Yi Da Xu Introduction to Bayesian Statistics

Bayesian Predictive distribution

Put marginal distribution and Conditional Independence into a test:

I Very often, in machine learning, you want to compute the probability of new data
y ∗ given training data Y , i.e., p(y ∗ |Y ). You assume there are some model
explains both Y and y ∗ . The model parameter is θ:

Z
p(y ∗ |Y ) = p(y ∗ |θ)p(θ|Y )dθ
θ

I Excercise, tell me why the above works?

Richard Yi Da Xu Introduction to Bayesian Statistics

Revisit Bayes Theorem

Instead of using arbitrary random variable symbols, we now use:

I θ for model parameter
I and X = x1 , . . . xn for dataset:

p(X |θ) p(θ)

| {z } |{z}
likelihood prior p(X |θ)p(θ)
p(θ|X ) = = R
| {z } p(X ) θ p(X |θ)p(θ)
posterior | {z }
normalization constant

Richard Yi Da Xu Introduction to Bayesian Statistics

An Intrusion Detection System (IDS) Example

The setting: Imagine out of all the TCP connections (say millons), 1% of which are
intrusions:
I When there is an intrusion, the probability of system sends alarm is 87%.
I When there is no intrusion, the probability of system sends alarm is 6%.

I Prior probability:
1% of which are intrusions
=⇒ p(θ = intrusion) = 0.01 p(θ = no intrusion) = 0.99

I Likelihood probability:
I given intrusion occur, probability of system sends alarm is 87%

p(X = alarm|θ = intrusion) = 0.87 p(X = no alarm|θ = intrusion) = 0.13

I given there is no intrusion, the probability of system sends alarm is 6%:

p(X = alarm|θ = no intrusion) = 0.06 p(X = no alarm|θ = no intrusion) = 0.94

Richard Yi Da Xu Introduction to Bayesian Statistics

Posterior

I We are interested in posterior probability: Pr(θ|X ):

I There 2 two possible values for parameter θ and 2 possible observation X
I Therefore, there are 4 rates we need to compute:
I True Positive When system sends alarm, probability of an intrusion occurs:

Pr(θ = intrusion|X = alarm)

I False Positive When system sends alarm, probability that there is no intrusion:

Pr(θ = no intrusion|X = alarm)

I True Negative When system sends no alarm, probability that there is no intrusion:

Pr(θ = no intrusion|X = no alarm)

I False Negative When system sends no alarm, probability that an intrusion occurs:

Pr(θ = intrusion|X = no alarm)

I Question which are the two probabilities you’d like to maximise?

Richard Yi Da Xu Introduction to Bayesian Statistics

Apply Bayes Theorem in this setting

Pr(X |θ) Pr(θ)

Richard Yi Da Xu Introduction to Bayesian Statistics

Apply Bayes Theorem in this setting

True Positive rate When system sends alarm, what is the probability of an intrusion
occurs:

Pr(θ = intrusion|X = alarm)

Pr(X = alarm|θ = intrusion) Pr(θ = intrusion)
=
Pr(X = alarm|θ = Intrusion) Pr(θ = Intrusion) + Pr(X = alarm|θ = no intrusion) Pr(θ = Intrusion)
0.87 × 0.01
= = 0.1278
0.87 × 0.01 + 0.06 × 0.99

False Positive rate When system sends alarm, what is the probability that there is no
intrusion:

Pr(θ = no intrusion|X = alarm)

Pr(X = alarm|θ = no intrusion) Pr(θ = no intrusion)
=
Pr(X = alarm|θ = no intrusion) Pr(θ = no intrusion) + Pr(X = alarm|θ = no intrusion) Pr(θ = no intrusion)
0.06 × 0.99
= = 0.8722
0.87 × 0.01 + 0.06 × 0.99

Richard Yi Da Xu Introduction to Bayesian Statistics

Apply Bayes Theorem in this setting

False Negative When system sends no alarm, what is the probability that an intrusion
occurs?

Pr(θ = intrusion|X = no alarm)

Pr(X = no alarm|θ = intrusion)p(θ = intrusion)
=
Pr(X = no alarm|θ = Intrusion) Pr(θ = Intrusion) + Pr(X = no alarm|θ = no intrusion) Pr(θ = no Intrusion)
0.13 × 0.01
= = 0.0014
0.13 × 0.01 + 0.94 × 0.99

True Negative When system sends no alarm, what is the probability that there is no
intrusion?

Pr(θ = no intrusion|X = no alarm)

Pr(X = no alarm|θ = no intrusion) Pr(θ = no intrusion)
=
Pr(X = no alarm|θ = no intrusion) Pr(θ = no intrusion) + Pr(X = no alarm|θ = no intrusion) Pr(θ = no intrusion)
0.94 × 0.99
= = 0.9986
0.87 × 0.001 + 0.06 × 0.99

Richard Yi Da Xu Introduction to Bayesian Statistics

Statistics way to think about Posterior Inference

The posterior inference is to find the best q(θ) to approximate p(θ|X ), such that:

infq(θ)∈Q KL(q(θ)kp(θ)) − Eθ∼q(θ) ln(p(X |θ)
Z
q(θ)
Z
=infq(θ)∈Q ln q(θ) − ln(p(X |θ)q(θ)
θ p(θ) θ
Z
=infq(θ)∈Q [ln q(θ) − (ln p(θ) + ln p(X |θ))] q(θ)
θ
Z
q(θ)
=infq(θ)∈Q ln q(θ)
θ p(θ)p(X |θ)
Z
1 q(θ)
= infq(θ)∈Q ln q(θ)
p(X ) θ p(θ|X )
=infq(θ)∈Q {KL(q(θ)kp(θ|X ))}

Richard Yi Da Xu Introduction to Bayesian Statistics

Sam Roweis Probx
No ratings yet
Sam Roweis Probx
12 pages
Scribe: Naive Bayes Classifier
No ratings yet
Scribe: Naive Bayes Classifier
16 pages
Bayesian Statistics Lecture Notes
No ratings yet
Bayesian Statistics Lecture Notes
146 pages
2 Probability
No ratings yet
2 Probability
30 pages
Joint & Conditional Probability Distributions
No ratings yet
Joint & Conditional Probability Distributions
23 pages
Bayesian Modelling Tuts-4-9
No ratings yet
Bayesian Modelling Tuts-4-9
6 pages
Bayesian vs. Frequentist Statistics
No ratings yet
Bayesian vs. Frequentist Statistics
84 pages
2 Statistical Definitions: 2.1 Probability Density Function
No ratings yet
2 Statistical Definitions: 2.1 Probability Density Function
9 pages
Bayesian Modelling For Data Analysis and Learning From Data
No ratings yet
Bayesian Modelling For Data Analysis and Learning From Data
19 pages
Applied Maths
No ratings yet
Applied Maths
34 pages
2 Mle
No ratings yet
2 Mle
28 pages
Bayes For Beginners: Luca Chech and Jolanda Malamud Supervisor: Thomas Parr 13 February 2019
No ratings yet
Bayes For Beginners: Luca Chech and Jolanda Malamud Supervisor: Thomas Parr 13 February 2019
41 pages
19-Bayesian 2
No ratings yet
19-Bayesian 2
39 pages
Bayesian Statistics: Thomas Bayes
No ratings yet
Bayesian Statistics: Thomas Bayes
22 pages
24 Intro To Bayesian Inference
No ratings yet
24 Intro To Bayesian Inference
33 pages
7101 Prob & Statistics
No ratings yet
7101 Prob & Statistics
11 pages
Chapter 4 Bayesian Networks
No ratings yet
Chapter 4 Bayesian Networks
62 pages
LN 13
No ratings yet
LN 13
5 pages
Bayesian Network
No ratings yet
Bayesian Network
30 pages
Intro Bayes Time Series 1
No ratings yet
Intro Bayes Time Series 1
72 pages
Probab Refresh
No ratings yet
Probab Refresh
7 pages
A Beginner's Notes On Bayesian Econometrics (Art)
No ratings yet
A Beginner's Notes On Bayesian Econometrics (Art)
21 pages
An Overview of Bayesian Econometrics
No ratings yet
An Overview of Bayesian Econometrics
30 pages
03 MultivariateProbability
No ratings yet
03 MultivariateProbability
73 pages
Single Parametric Models
No ratings yet
Single Parametric Models
10 pages
Part 1
No ratings yet
Part 1
200 pages
BML Lecture Notes
No ratings yet
BML Lecture Notes
126 pages
Introduction To Bayesian Statistics: 24 February 2016 A Semester's Worth of Material in Just A Few Dozen Slides
No ratings yet
Introduction To Bayesian Statistics: 24 February 2016 A Semester's Worth of Material in Just A Few Dozen Slides
40 pages
BaYesian Models Machine Learning 2016
No ratings yet
BaYesian Models Machine Learning 2016
126 pages
Gaussian Mixture Model Guide
No ratings yet
Gaussian Mixture Model Guide
48 pages
A Course in Bayesian Econometrics University of Queensland
No ratings yet
A Course in Bayesian Econometrics University of Queensland
22 pages
20-Bayesian 310456690
No ratings yet
20-Bayesian 310456690
34 pages
Mathematics in Machine Learning
No ratings yet
Mathematics in Machine Learning
83 pages
Lecture 2
No ratings yet
Lecture 2
9 pages
Probability Theory For Machine Learning: Chris Cremer September 2015
No ratings yet
Probability Theory For Machine Learning: Chris Cremer September 2015
40 pages
Bayesian Networks
No ratings yet
Bayesian Networks
45 pages
Chapter 2: Belief, Probability, and Exchangeability: Lecture 1: Probability, Bayes Theorem, Distributions
No ratings yet
Chapter 2: Belief, Probability, and Exchangeability: Lecture 1: Probability, Bayes Theorem, Distributions
17 pages
Bayesian Statistics For Beginners A Step by Step Approach Therese M. Donovan Download
100% (1)
Bayesian Statistics For Beginners A Step by Step Approach Therese M. Donovan Download
39 pages
SMBI Ch1 - Introduction To Bayesian Statistics
No ratings yet
SMBI Ch1 - Introduction To Bayesian Statistics
92 pages
Smbi CH1
No ratings yet
Smbi CH1
97 pages
MAS3301 Bayesian Statistics: M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2008-9
No ratings yet
MAS3301 Bayesian Statistics: M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2008-9
18 pages
Lecture 1 Introduction
No ratings yet
Lecture 1 Introduction
16 pages
Probability Distributions Overview
No ratings yet
Probability Distributions Overview
8 pages
03 Bay Est He or em
No ratings yet
03 Bay Est He or em
13 pages
Probability Cheatsheet
No ratings yet
Probability Cheatsheet
8 pages
EN007001 Engineering Research Methodology: Statistical Inference: Bayesian Inference
No ratings yet
EN007001 Engineering Research Methodology: Statistical Inference: Bayesian Inference
72 pages
MCMC Bayes PDF
No ratings yet
MCMC Bayes PDF
27 pages
Bayes Network
100% (1)
Bayes Network
80 pages
DDA3020 L02 Probability
No ratings yet
DDA3020 L02 Probability
27 pages
GP Gpss15 Session1
No ratings yet
GP Gpss15 Session1
244 pages
확통1 LectureNote09 on Bayesian Statistical Inference
No ratings yet
확통1 LectureNote09 on Bayesian Statistical Inference
78 pages
Probs Stats
No ratings yet
Probs Stats
26 pages
Bayesclassday 1
No ratings yet
Bayesclassday 1
57 pages
Formula Sheet
No ratings yet
Formula Sheet
19 pages
Baysian Analysis Notes
No ratings yet
Baysian Analysis Notes
30 pages
Bayesian Ibrahim
No ratings yet
Bayesian Ibrahim
370 pages
Lec4 - Probability Theory and Naive Bayes Classifier
No ratings yet
Lec4 - Probability Theory and Naive Bayes Classifier
27 pages
Lecture Material 2.5 - Bayesian Estimation & Concepts
No ratings yet
Lecture Material 2.5 - Bayesian Estimation & Concepts
12 pages
CH 5
No ratings yet
CH 5
45 pages
Vi 06 Mensuration-Solution
No ratings yet
Vi 06 Mensuration-Solution
16 pages
Linear Algebra: Least Squares
No ratings yet
Linear Algebra: Least Squares
13 pages
Learning The Structure of Dynamic Probabilistic Networks: Nir Friedman Kevin Murphy Stuart Russell
No ratings yet
Learning The Structure of Dynamic Probabilistic Networks: Nir Friedman Kevin Murphy Stuart Russell
9 pages
Data Validity and Reliability Analysis
No ratings yet
Data Validity and Reliability Analysis
6 pages
CH 11 Algebra and Forumulae
No ratings yet
CH 11 Algebra and Forumulae
12 pages
Unit-4 - Permutation and Combinations
No ratings yet
Unit-4 - Permutation and Combinations
10 pages
Average Range Versus Height Lab
100% (1)
Average Range Versus Height Lab
4 pages
Advanced Quantum Mechanics Course Contents - 2
No ratings yet
Advanced Quantum Mechanics Course Contents - 2
2 pages
Notes On Micro Controller and Digital Signal Processing
No ratings yet
Notes On Micro Controller and Digital Signal Processing
70 pages
Unit 3 - Permutation and Combination: by Name of The Creator-Vikas Ranjan Designation - Trainer Department - CTLD
No ratings yet
Unit 3 - Permutation and Combination: by Name of The Creator-Vikas Ranjan Designation - Trainer Department - CTLD
17 pages
Physical Medicine and Rehabilitation Board Review Second Edition Sara J. Cuccurullo Download Full Chapters
100% (6)
Physical Medicine and Rehabilitation Board Review Second Edition Sara J. Cuccurullo Download Full Chapters
89 pages
Area and Perimeter Consolidation
No ratings yet
Area and Perimeter Consolidation
8 pages
Ioqm Syllabus
No ratings yet
Ioqm Syllabus
3 pages
Understanding Arithmetic Logic Units
No ratings yet
Understanding Arithmetic Logic Units
11 pages
Lecture 1 Basic Statistics
No ratings yet
Lecture 1 Basic Statistics
24 pages
Formula Fro Ass 2
No ratings yet
Formula Fro Ass 2
50 pages
Aero Engg: Inertia & Calculations
No ratings yet
Aero Engg: Inertia & Calculations
26 pages
Semidefinite Programming Basics
No ratings yet
Semidefinite Programming Basics
51 pages
Revision: Previous Lecture Was About Generating Function Approach
No ratings yet
Revision: Previous Lecture Was About Generating Function Approach
20 pages
ISO Methodology PDF
No ratings yet
ISO Methodology PDF
132 pages
AEEM-728 Introduction To Ultrasonics Lecture Notes
100% (2)
AEEM-728 Introduction To Ultrasonics Lecture Notes
147 pages
The Infinite Code Je Taime Edition V 1.1
No ratings yet
The Infinite Code Je Taime Edition V 1.1
17 pages
K-Means Document Clustering Using Vector Space Model
No ratings yet
K-Means Document Clustering Using Vector Space Model
5 pages
CS1B - September 2024 - Examiner Report
No ratings yet
CS1B - September 2024 - Examiner Report
19 pages
Limit, Continuity and Differentiability
No ratings yet
Limit, Continuity and Differentiability
23 pages
Diff. Data and Information
No ratings yet
Diff. Data and Information
12 pages
Am. GM - HM
No ratings yet
Am. GM - HM
2 pages
Exercise 1 2 HLP
No ratings yet
Exercise 1 2 HLP
34 pages
C++ Matrix Class
No ratings yet
C++ Matrix Class
11 pages
LINEAR ALGEBRA MCQ's
No ratings yet
LINEAR ALGEBRA MCQ's
13 pages

Bayesian

Uploaded by

Bayesian

Uploaded by

Introduction to Bayesian Statistics

School of Computing & Communication, UTS

January 31, 2017

Richard Yi Da Xu Introduction to Bayesian Statistics

I Pre-university: A number is just a fixed value.

the probablity that number of machine learning participants = 20

Richard Yi Da Xu Introduction to Bayesian Statistics

Richard Yi Da Xu Introduction to Bayesian Statistics

I You have data X = {2, 3, 3, 2, 1, 4}, i.e., x1 = 2, x2 = 3, . . . x6 = 4

I Division by N is intuitive. Otherwise, more data implies more variance

Richard Yi Da Xu Introduction to Bayesian Statistics

Richard Yi Da Xu Introduction to Bayesian Statistics

First version Second version

Richard Yi Da Xu Introduction to Bayesian Statistics

Think VAR(X ) as “mean-subtracted” second order moment of random variable X .

Richard Yi Da Xu Introduction to Bayesian Statistics

I The following is a tablet form of joint density Pr(X , Y ):

Richard Yi Da Xu Introduction to Bayesian Statistics

I Using sum rule, the marginal distribution tells us that:

I exercise what is Pr(X = 2) and Pr(X = 1)?

Richard Yi Da Xu Introduction to Bayesian Statistics

p(X , Y ) p(Y |X )p(X ) p(Y |X )p(X )

I What about p(X |Y = y)? Pick an example:

Richard Yi Da Xu Introduction to Bayesian Statistics

I The formulation for conditional density:

p(X , Y ) p(Y |X )p(X ) p(Y |X )p(X )

I exercise: What is p(X = 2|Y = 1)?

Richard Yi Da Xu Introduction to Bayesian Statistics

If X and Y are independent:

p(X , Y ) p(X )p(Y )

X and Y are NOT independent X and Y are independent

Richard Yi Da Xu Introduction to Bayesian Statistics

I Imagine we have three random variables: X , Y and Z :

I This means that X is conditionally independent of Y given Z .

Pr(X , Y , Z ) Pr(X |Y , Z ) Pr(Y , Z )

Richard Yi Da Xu Introduction to Bayesian Statistics

We will study Dynamic model later.

From this model, we can see:

p(xt |x1 , . . . , xt−1 , y1 , . . . , yt−1 ) = p(xt |xt−1 )

Richard Yi Da Xu Introduction to Bayesian Statistics

We have data pairs:

yi = f (xi |w) + N (0, σ 2 )

p(y|w, x)p(w|x)p(x) p(y|w, x)p(w) p(y |w, x)p(w)

Richard Yi Da Xu Introduction to Bayesian Statistics

Given that X , Y is a two-dimensional random variable:

Richard Yi Da Xu Introduction to Bayesian Statistics

Richard Yi Da Xu Introduction to Bayesian Statistics

It’s a useful property for later

Richard Yi Da Xu Introduction to Bayesian Statistics

Put marginal distribution and Conditional Independence into a test:

I Excercise, tell me why the above works?

Richard Yi Da Xu Introduction to Bayesian Statistics

Instead of using arbitrary random variable symbols, we now use:

p(X |θ) p(θ)

Richard Yi Da Xu Introduction to Bayesian Statistics

p(X = alarm|θ = intrusion) = 0.87 p(X = no alarm|θ = intrusion) = 0.13

p(X = alarm|θ = no intrusion) = 0.06 p(X = no alarm|θ = no intrusion) = 0.94

Richard Yi Da Xu Introduction to Bayesian Statistics

I We are interested in posterior probability: Pr(θ|X ):

Pr(θ = intrusion|X = alarm)

Pr(θ = no intrusion|X = alarm)

Pr(θ = no intrusion|X = no alarm)

Pr(θ = intrusion|X = no alarm)

I Question which are the two probabilities you’d like to maximise?

Richard Yi Da Xu Introduction to Bayesian Statistics

Pr(X |θ) Pr(θ)

Richard Yi Da Xu Introduction to Bayesian Statistics

Pr(θ = intrusion|X = alarm)

Pr(θ = no intrusion|X = alarm)

Richard Yi Da Xu Introduction to Bayesian Statistics

Pr(θ = intrusion|X = no alarm)

Pr(θ = no intrusion|X = no alarm)

Richard Yi Da Xu Introduction to Bayesian Statistics

Richard Yi Da Xu Introduction to Bayesian Statistics

You might also like