0% found this document useful (0 votes)

12 views5 pages

Lecture Notes 1-12

The document discusses the maximum entropy principle and its application in mathematical analysis within biology and medicine. It explains the relationship between empirical mean values and probability distributions, introducing concepts such as the contraction principle and Lagrange duality for constrained optimization. Additionally, it highlights the Legendre-Fenchel duality and its relevance to entropy functions and thermodynamic principles.

Uploaded by

cepem13540

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views5 pages

Lecture Notes 1-12

Uploaded by

cepem13540

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

AMATH 423/523 Mathematical Analysis in Biology and Medicine Winter, 2023

2.3 Maximum entropy principle

2.3.1 Different entropy functions for differents random variables

For a real scientific problem with a large state space S, to be able to obtain the ν from data is
not really feasible; it is only a gedankenexperiment. More realistically one measures the mean
value of an observable that returns a value gk when the system is in state k ∈ S:

g : S → R, with g1 , g2 , · · · , gn .

Such a function is called a random variable in the mathematical theory of probability. The
empirical mean value is related to the counting frequency
n
k1 g1 + k2 g2 + · · · + kn gn X
g(K) = = νi gi .
K i=1

In the limit of K → ∞, since all the νi → pi , we have

n
X
lim g(K) = gi pi = E[g]. (8)
K→∞
i=1

Again, just like we asked the statistical question of “what is the probability of observing ν”,
one can ask the statistical question of “what is the probability of observing g(K):

P g(K) ∈ (x, x + dx] ? (9)

2.3.2 Contraction principle

First, let us learn a very important mathematical fact: Consider two positive numbers a, b > 0,
and number N → ∞. Then it is obvious that

lim e−aN + e−bN → 0.

N →∞

But more interestingly,

1
ln e−aN + e−bN = min{a, b}.

− lim (10)
N →∞ N

To prove this, withour loss of generality let us assum that a < b, i.e., (b − a) > 0. We see that

ln e−aN + e−bN = ln e−aN + ln 1 + e−(b−a)N .

Therefore in the limit of N → ∞, the second term goes to zero and we have
1
ln e−aN + e−bN = a.

− lim
N →∞ N

Prof. Hong Qian 12 Thursday 12th January, 2023, 14:01

AMATH 423/523 Mathematical Analysis in Biology and Medicine Winter, 2023

∞
We note that the problem in Eq. (10) is of ∞ . So by l’Hospital’s rule we can also have

ln e−aN + e−bN

ae−aN + be−bN
− lim = lim = min{a, b}.
N →∞ N N →∞ e−aN + e−bN

The result in (10) is actually valid for any a and b, positive or negative. Even more:
Z
− lim ln e−a(x)N dx = inf a(x).
N →∞ x∈[x0 ,x1 ] x∈[x0 ,x1 ]

We now turn to the answer for the question in Eq. (9). We have
Z
P g(K) ∈ (x, x + dx] = e−KH[ν ∥p] dν = e−Kφ(x) , (11a)
ν ·g =x
n o
where φ(x) = inf H(ν∥p) ν · g = x , (11b)
ν

as K → ∞, in the limit of “big data”.

This mathemtical result can be understood very intuitively: With a given value of x being
“observed” as the empirical mean value for random variable g = (g1 , g2 , · · · , gn ):
n
X
νi gi = x, (12)
i=1

not every ν in the probability simplex is compatible to the value x. In fact, among all the ν that
are compatible with Eq. 12, the ν with the smallest H(ν∥p) has the largest probability: This is
the ν among the sample data set that “had produced” the observed x, since after a measurement,
there is no probability, only missing information. See Figure 1 for an illustration.
The relation between φ(x) and H(ν∥p) is called contraction principle in mathematics, and
maximum entropy principle in physics and engineering.

3 Constrained optimization and duality

3.1 Lagrange duality
3.1.1 Lagrange multiplier and saddle point

We shall us an example to illustrate this: Consider f (x, y) = a1 x2 + a2 y 2 , where a1 , a2 > 0

under constraint g(x, y) = b0 + b1 x + b2 y = 0.

a2 (b0 + b1 x)2
n o
2
inf f (x, y) g(x, y) = 0 = inf a1 x +
x,y x b22
a2 (b0 + b1 x∗ )2 a1 a2 b20
= a1 x∗2 + = ,
b22 a1 b22 + a2 b21

Prof. Hong Qian 13 Thursday 12th January, 2023, 14:01

AMATH 423/523 Mathematical Analysis in Biology and Medicine Winter, 2023

Figure 1: The space for all possible empirical frequency ν is a probability simplex shown in orange
color. When an empirical mean value ν · g = x is observed, only the ν along a red line is possible. The
blue circles are the contour lines of entropy function Φ(ν), with the center ν = p and corresponding
Φ = 0. The tangent point between a red line and a blue circle gives the ν ∗ (x) that has minimum Φ
value along the red line. At ν ∗ (x), ∇Φ is in the direct of g. This is the geometric interpretation of the
method of Lagrangian multiplier.

with optimal
a2 b 0 b 1 a1 b 0 b 2
x∗ = − 2 2
, and y ∗ = − 2 .
a1 b 2 + a2 b 1 a1 b2 + a2 b21
By the method of Lagrange multiplier, the Lagrangian function

L(x, y, z) = a1 x2 + a2 y 2 − z b0 + b1 x + b2 y ,

its Hessian matrix for curvatures,

 
2a1 0 b1
 0 2a2 b2  ,
b1 b2 0

with determiniant −2(a1 b22 + a2 b21 ) < 0 for any b’s.

Therefore according to the method of Lagrange multiplier, the constrained optimization
problem in Eq. 11 becomes
n o
φ(x) = inf H(ν∥p) ν · g = x
ν
n n oo
= sup inf H(ν∥p) − y ν · g − x
y ν
n n o o
= sup inf H(ν∥p) − yν · g + xy
y ν
n o
= sup xy − ψ(y) , (13)
y

Prof. Hong Qian 14 Thursday 12th January, 2023, 14:01

AMATH 423/523 Mathematical Analysis in Biology and Medicine Winter, 2023

in which we have introduced a new function:

n o n o
ψ(y) = − inf H(ν∥p) − yν · g = sup yν · g − H(ν∥p) . (14)
ν ν

Then following the theory of Legendre-Fenchel duality, we have

n o
ψ(y) = sup xy − φ(x) . (15)
x

3.1.2 Legendre-Fenchel transform (LFT) and duality

The function ψ(y) in (15) is called the Legendre-Fenchel transform (LFT) of φ(x). Then the
function φ(x) in (13) is the LFT of ψ(y). The pair φ(x) and ψ(y) are known as Legendre-
Fenchel duality. It naturally arises in a constrained optimization problem, leading to a low-
dimensional structure, the green curves in Fig. 1, that is embedded in a higher dimensional
space of ν. The independent variables x and y are called conjugate variables to each other.

3.1.3 Conjugate variables and Lagrange-Gibbs equation

Formally the Lagrange function is defined as

L[ν, y] = H(ν∥p) − y ν · g − x .

Carrying out the partial derivatives w.r.t. all the ν1 , ν2 , · · · , νn can be expressed using the
notation of differentiation dν :

dν L[ν, y] = dν H(ν∥p) − yg · dν = 0. (16)

Very interestingly, a relation called Gibbs’ equation in physics is exactly like this. In the
latter case, H is called Gibbs entropy, y −1 is temperature, and g · dν is the mechanical
work. The meaning of the equation is related to the First Law of Thermodynamics, for energy
conservation.

3.2 Linear constraints and full LFT

If function ψ(y), the LFT of the entropy function for the empirical mean value g, φ(x), plays a
key role in the theory, one naturally asks what is the LFT of the entropy function H(ν∥p) for
the empirical counting frequency ν?
Let us compute the LFT of H(ν∥p):
( n n
!) n
n o X νi X µj X
sup ν · µ − H(ν∥p) = sup − νi ln µi
pj e + ln pj eµj
ν ν i=1
p i e j=1 j=1
n
X
= ln pj eµj , (17)
j=1

Prof. Hong Qian 15 Thursday 12th January, 2023, 14:01

AMATH 423/523 Mathematical Analysis in Biology and Medicine Winter, 2023

with the optimal ν ∗ :

pi eµi
νi∗ = Pn µj
. (18)
j=1 pj e

Very interestingly again, a relation called Boltzmann’s relation in physics is exactly like this. In
the latter case, −µi is the mechanical energy of the state i in kB T unit, where T is temperature
and kB is a constant named after Boltzmann. The pi are assumed to be 1, known as the principle
of equal a priori probability.

Prof. Hong Qian 16 Thursday 12th January, 2023, 14:01

STAT 538 Maximum Entropy Models C Marina Meil A Mmp@stat - Washington.edu
No ratings yet
STAT 538 Maximum Entropy Models C Marina Meil A Mmp@stat - Washington.edu
20 pages
Max Ent
No ratings yet
Max Ent
6 pages
Maximum Entropy Method: Sampling Bias: Jorge - Cossio@cigb - Edu.cu
No ratings yet
Maximum Entropy Method: Sampling Bias: Jorge - Cossio@cigb - Edu.cu
10 pages
Class3 ML MaxEnt
No ratings yet
Class3 ML MaxEnt
6 pages
The Q-Exponentials Do Not Maximize The Rényi Entropy
No ratings yet
The Q-Exponentials Do Not Maximize The Rényi Entropy
12 pages
Lecture Notes in Statistical Mechanics and Mesoscopics: Doron Cohen
No ratings yet
Lecture Notes in Statistical Mechanics and Mesoscopics: Doron Cohen
158 pages
Info Theory Homework Solutions
No ratings yet
Info Theory Homework Solutions
9 pages
Statistics Course Review Notes
No ratings yet
Statistics Course Review Notes
20 pages
Exam3 Cheatsheet
No ratings yet
Exam3 Cheatsheet
1 page
Introduction To Information Theory
No ratings yet
Introduction To Information Theory
20 pages
Selective Review - Probability
No ratings yet
Selective Review - Probability
30 pages
Entropy 20 00574
No ratings yet
Entropy 20 00574
11 pages
Rau J Statistical Mechanics in A Nutshell
No ratings yet
Rau J Statistical Mechanics in A Nutshell
23 pages
MIT6 441S16 Midterm
No ratings yet
MIT6 441S16 Midterm
5 pages
Discussion Notes 2-6
No ratings yet
Discussion Notes 2-6
3 pages
More Detailed Content of The Course
No ratings yet
More Detailed Content of The Course
4 pages
Entropy and Uncertainty
No ratings yet
Entropy and Uncertainty
15 pages
281A Final Sol
No ratings yet
281A Final Sol
9 pages
New Course Notes
No ratings yet
New Course Notes
77 pages
Lecture 5
No ratings yet
Lecture 5
42 pages
Statistical Mechanics Lectures
No ratings yet
Statistical Mechanics Lectures
148 pages
Information Geometry of Maxent Principle
No ratings yet
Information Geometry of Maxent Principle
37 pages
E2 201: Information Theory (2019) Solutions To Homework 3
No ratings yet
E2 201: Information Theory (2019) Solutions To Homework 3
11 pages
MUML Preliminiaries
No ratings yet
MUML Preliminiaries
24 pages
tr22 08
No ratings yet
tr22 08
7 pages
The Entropy Influence Conjecture Revisited: Bireswar Das Manjish Pal Vijay Visavaliya
No ratings yet
The Entropy Influence Conjecture Revisited: Bireswar Das Manjish Pal Vijay Visavaliya
8 pages
Lecture 37
No ratings yet
Lecture 37
6 pages
tmpA92C TMP
No ratings yet
tmpA92C TMP
6 pages
Lecture 2
No ratings yet
Lecture 2
6 pages
Info Theory Course Notes
No ratings yet
Info Theory Course Notes
46 pages
Lec 4
No ratings yet
Lec 4
8 pages
MATH5410 Extensions and Random Matrices
No ratings yet
MATH5410 Extensions and Random Matrices
7 pages
Reu Project: 1 Preface
No ratings yet
Reu Project: 1 Preface
10 pages
Math Ia
No ratings yet
Math Ia
24 pages
CF Notes
No ratings yet
CF Notes
7 pages
Info - Information Theory Basics (3) (1) .PDF (Half)
No ratings yet
Info - Information Theory Basics (3) (1) .PDF (Half)
125 pages
Elements of Information Theory 2006 Thomas M. Cover and Joy A. Thomas
No ratings yet
Elements of Information Theory 2006 Thomas M. Cover and Joy A. Thomas
16 pages
Cheatsheet PDF
100% (1)
Cheatsheet PDF
4 pages
September 13, 2001 Reading: Chapter Four Homework: 4.1,4.2,4.3,4.4
No ratings yet
September 13, 2001 Reading: Chapter Four Homework: 4.1,4.2,4.3,4.4
4 pages
Entropy, Irreversibility and Inference at The Foundations of Statistical Physics
No ratings yet
Entropy, Irreversibility and Inference at The Foundations of Statistical Physics
12 pages
Slide Lardev M2MOensae16 17
No ratings yet
Slide Lardev M2MOensae16 17
189 pages
Lecture Note W07
No ratings yet
Lecture Note W07
41 pages
נוסחאות ואי שיוויונים
No ratings yet
נוסחאות ואי שיוויונים
12 pages
Appendix
No ratings yet
Appendix
10 pages
Physics Letters A: Q-Generalized D-Dimensional Q-Fourier
No ratings yet
Physics Letters A: Q-Generalized D-Dimensional Q-Fourier
5 pages
STAT2601B (22-23, 2nd) Chapter 6
No ratings yet
STAT2601B (22-23, 2nd) Chapter 6
14 pages
Jarzynski 2018 - Berry Conj and Information Theory
No ratings yet
Jarzynski 2018 - Berry Conj and Information Theory
8 pages
Maximum Entropy Probability Distribution
No ratings yet
Maximum Entropy Probability Distribution
9 pages
Optimality Conditions
No ratings yet
Optimality Conditions
10 pages
Week - 5
No ratings yet
Week - 5
54 pages
Asymptotic Integrals in Stats
No ratings yet
Asymptotic Integrals in Stats
325 pages
Revised
No ratings yet
Revised
24 pages
MIT8 333F13 Lec6
No ratings yet
MIT8 333F13 Lec6
10 pages
Probability Theory I: CAM 384K Concepts
No ratings yet
Probability Theory I: CAM 384K Concepts
14 pages
Generation of Quasi-Uniform Distributions and Verification of The Tightness of The Inner Bound
No ratings yet
Generation of Quasi-Uniform Distributions and Verification of The Tightness of The Inner Bound
18 pages
Empirical Process (Sara Van de Geer)
No ratings yet
Empirical Process (Sara Van de Geer)
91 pages
Maximum Likelihood Estimation (MLE)
No ratings yet
Maximum Likelihood Estimation (MLE)
4 pages
Info Theory & Coding Syllabus
No ratings yet
Info Theory & Coding Syllabus
96 pages
Lecture UnsupervisedML - SOM
No ratings yet
Lecture UnsupervisedML - SOM
38 pages
hw0 22au
No ratings yet
hw0 22au
5 pages
Lecture 1 2022
No ratings yet
Lecture 1 2022
28 pages
HW 1
No ratings yet
HW 1
11 pages
HW 5
No ratings yet
HW 5
3 pages
HW 4
No ratings yet
HW 4
13 pages
HW 2
No ratings yet
HW 2
10 pages
White Board 1-10
No ratings yet
White Board 1-10
19 pages
White Board 1-12
No ratings yet
White Board 1-12
20 pages
Homework 2: Directions, Reminders and Policies
No ratings yet
Homework 2: Directions, Reminders and Policies
2 pages
Antenna Solved Exam 2016
50% (2)
Antenna Solved Exam 2016
5 pages
Multiple Batch Extraction
78% (9)
Multiple Batch Extraction
12 pages
Cougar Drilling Solution
100% (1)
Cougar Drilling Solution
41 pages
Parapsychology
No ratings yet
Parapsychology
17 pages
MKD g7f Adha
No ratings yet
MKD g7f Adha
2 pages
Governor'S Hills Science School
No ratings yet
Governor'S Hills Science School
3 pages
Beam and Force Moment Calculations
No ratings yet
Beam and Force Moment Calculations
12 pages
Elastic Constants Basics (Rock Physics)
No ratings yet
Elastic Constants Basics (Rock Physics)
7 pages
Real-Time 3D Walking Pattern Generation For A Biped Robot With Telescopic Legs 2001
No ratings yet
Real-Time 3D Walking Pattern Generation For A Biped Robot With Telescopic Legs 2001
8 pages
Activity On Colligative Property
No ratings yet
Activity On Colligative Property
2 pages
Inspection Checklist: S.No Components Class Type of Check Quantum of Check
No ratings yet
Inspection Checklist: S.No Components Class Type of Check Quantum of Check
8 pages
Density 3 QP-merged
No ratings yet
Density 3 QP-merged
96 pages
Tutorial 3 - Filtration
No ratings yet
Tutorial 3 - Filtration
4 pages
RIO DE JANEIRO / Gale o - Antnio Carlos Jobim, INTL (SBGL) Aerodrome Chart
No ratings yet
RIO DE JANEIRO / Gale o - Antnio Carlos Jobim, INTL (SBGL) Aerodrome Chart
2 pages
Mathematics of Graphs
No ratings yet
Mathematics of Graphs
30 pages
Archimedes' Principle Activities
No ratings yet
Archimedes' Principle Activities
12 pages
Chapter No: 07: Cost Theory & Analysis
No ratings yet
Chapter No: 07: Cost Theory & Analysis
4 pages
PB (Ni SB:) O - Pbzrtio Ceramic Sensors For Underwater Transducer Application
No ratings yet
PB (Ni SB:) O - Pbzrtio Ceramic Sensors For Underwater Transducer Application
5 pages
Methods of Structural Geology
100% (1)
Methods of Structural Geology
10 pages
HW 5
No ratings yet
HW 5
2 pages
G24. A Review On The Design, Applications and Numerical Modeling of Geocell Reinforced Soil
No ratings yet
G24. A Review On The Design, Applications and Numerical Modeling of Geocell Reinforced Soil
24 pages
Experiment No: 2: Experiment Name: Single Slider-Crank Mechanism
No ratings yet
Experiment No: 2: Experiment Name: Single Slider-Crank Mechanism
6 pages
AO3 - Activities Booklet v1.0
No ratings yet
AO3 - Activities Booklet v1.0
40 pages
Shear Strengthening Performance of Hybrid FRP-FRCM
No ratings yet
Shear Strengthening Performance of Hybrid FRP-FRCM
11 pages
Analys Assessmet Redy For Print
No ratings yet
Analys Assessmet Redy For Print
22 pages
Cls 7 Light
No ratings yet
Cls 7 Light
4 pages
Bearing Capacity 3
No ratings yet
Bearing Capacity 3
54 pages
Mining Engineering Exam Guide
No ratings yet
Mining Engineering Exam Guide
4 pages
03 Wave Propagation
No ratings yet
03 Wave Propagation
34 pages
An Explanatory Study On Defects in Plastic Molding Parts Caused by Machine Parameters in Injection Molding Process 2022
No ratings yet
An Explanatory Study On Defects in Plastic Molding Parts Caused by Machine Parameters in Injection Molding Process 2022
6 pages

Lecture Notes 1-12

Uploaded by

Lecture Notes 1-12

Uploaded by

AMATH 423/523 Mathematical Analysis in Biology and Medicine Winter, 2023

2.3 Maximum entropy principle

In the limit of K → ∞, since all the νi → pi , we have

2.3.2 Contraction principle

lim e−aN + e−bN → 0.

But more interestingly,

ln e−aN + e−bN = ln e−aN + ln 1 + e−(b−a)N .

Prof. Hong Qian 12 Thursday 12th January, 2023, 14:01

as K → ∞, in the limit of “big data”.

3 Constrained optimization and duality

We shall us an example to illustrate this: Consider f (x, y) = a1 x2 + a2 y 2 , where a1 , a2 > 0

Prof. Hong Qian 13 Thursday 12th January, 2023, 14:01

its Hessian matrix for curvatures,

with determiniant −2(a1 b22 + a2 b21 ) < 0 for any b’s.

Prof. Hong Qian 14 Thursday 12th January, 2023, 14:01

in which we have introduced a new function:

Then following the theory of Legendre-Fenchel duality, we have

3.1.2 Legendre-Fenchel transform (LFT) and duality

3.1.3 Conjugate variables and Lagrange-Gibbs equation

Formally the Lagrange function is defined as

dν L[ν, y] = dν H(ν∥p) − yg · dν = 0. (16)

3.2 Linear constraints and full LFT

Prof. Hong Qian 15 Thursday 12th January, 2023, 14:01

with the optimal ν ∗ :

Prof. Hong Qian 16 Thursday 12th January, 2023, 14:01

You might also like