[go: up one dir, main page]

0% found this document useful (0 votes)
25 views119 pages

Chapter 5: Vector Calculus (Math For Machine Learning)

Chapter 5 of the document focuses on Vector Calculus, covering topics such as differentiation of univariate functions, partial differentiation, gradients, and Taylor series. It defines Taylor polynomials and series, including the Maclaurin series, and provides examples of their applications. The chapter also introduces concepts related to multivariate functions and their derivatives.

Uploaded by

namphhe181541
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views119 pages

Chapter 5: Vector Calculus (Math For Machine Learning)

Chapter 5 of the document focuses on Vector Calculus, covering topics such as differentiation of univariate functions, partial differentiation, gradients, and Taylor series. It defines Taylor polynomials and series, including the Maclaurin series, and provides examples of their applications. The chapter also introduces concepts related to multivariate functions and their derivatives.

Uploaded by

namphhe181541
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 119

Chapter 5.

Vector Calculus

0
Equivalent to 2 slots. Lecturers can use extra materials for other slots in the chater
FPT University Chapter 5. Vector Calculus 1 / 34
Chapter 5. Vector Calculus

1 5.1 Differentiation of Univariate Functions


2 5.2 Partial Differentiation and Gradients
3 5.3 Gradients of Vector-Valued Functions
4 5.6 Backpropagation and Automatic Differentiation
5 5.7 Higher-Order Derivatives
6 5.8 Linearization and Multivariate Taylor Series
FPT University Chapter 5. Vector Calculus 2 / 34
5.1 Differentiation of Univariate Functions
Definition
Let D be a domain of R, and let f : D → R be a function.
The Taylor polynomial of degree n of a function f at x0 ∈ D is
defined as
n
X f (k) (x0 )
Tn (x) := (x − x0 )k .
k!
k=0

FPT University Chapter 5. Vector Calculus 3 / 34


5.1 Differentiation of Univariate Functions
Definition
Let D be a domain of R, and let f : D → R be a function.
The Taylor polynomial of degree n of a function f at x0 ∈ D is
defined as
n
X f (k) (x0 )
Tn (x) := (x − x0 )k .
k!
k=0

For a smooth functions f ∈ C ∞ (D), the Taylor series of f at x0 is


defined as

FPT University Chapter 5. Vector Calculus 3 / 34


5.1 Differentiation of Univariate Functions
Definition
Let D be a domain of R, and let f : D → R be a function.
The Taylor polynomial of degree n of a function f at x0 ∈ D is
defined as
n
X f (k) (x0 )
Tn (x) := (x − x0 )k .
k!
k=0

For a smooth functions f ∈ C ∞ (D), the Taylor series of f at x0 is


defined as

X f (k) (x0 )
T∞ (x) := (x − x0 )k .
k!
k=0

FPT University Chapter 5. Vector Calculus 3 / 34


5.1 Differentiation of Univariate Functions
Definition
Let D be a domain of R, and let f : D → R be a function.
The Taylor polynomial of degree n of a function f at x0 ∈ D is
defined as
n
X f (k) (x0 )
Tn (x) := (x − x0 )k .
k!
k=0

For a smooth functions f ∈ C ∞ (D), the Taylor series of f at x0 is


defined as

X f (k) (x0 )
T∞ (x) := (x − x0 )k .
k!
k=0

For x0 = 0, we obtain the Maclaurin series.

FPT University Chapter 5. Vector Calculus 3 / 34


5.1 Differentiation of Univariate Functions
Definition
Let D be a domain of R, and let f : D → R be a function.
The Taylor polynomial of degree n of a function f at x0 ∈ D is
defined as
n
X f (k) (x0 )
Tn (x) := (x − x0 )k .
k!
k=0

For a smooth functions f ∈ C ∞ (D), the Taylor series of f at x0 is


defined as

X f (k) (x0 )
T∞ (x) := (x − x0 )k .
k!
k=0

For x0 = 0, we obtain the Maclaurin series.


f is called analytic if T∞ (x) = f (x).

FPT University Chapter 5. Vector Calculus 3 / 34


5.1 Differentiation of Univariate Functions
Definition
Let D be a domain of R, and let f : D → R be a function.
The Taylor polynomial of degree n of a function f at x0 ∈ D is
defined as
n
X f (k) (x0 )
Tn (x) := (x − x0 )k .
k!
k=0

For a smooth functions f ∈ C ∞ (D), the Taylor series of f at x0 is


defined as

X f (k) (x0 )
T∞ (x) := (x − x0 )k .
k!
k=0

For x0 = 0, we obtain the Maclaurin series.


f is called analytic if T∞ (x) = f (x).

FPT University Chapter 5. Vector Calculus 3 / 34


5.1 Differentiation of Univariate Functions
Definition
Let D be a domain of R, and let f : D → R be a function.
The Taylor polynomial of degree n of a function f at x0 ∈ D is
defined as
n
X f (k) (x0 )
Tn (x) := (x − x0 )k .
k!
k=0

For a smooth functions f ∈ C ∞ (D), the Taylor series of f at x0 is


defined as

X f (k) (x0 )
T∞ (x) := (x − x0 )k .
k!
k=0

For x0 = 0, we obtain the Maclaurin series.


f is called analytic if T∞ (x) = f (x).

FPT University Chapter 5. Vector Calculus 3 / 34


Note.

1. Taylor polynomial of degree n is an approximation of a function. For


n = 1, we obtain the linear approximation.

FPT University Chapter 5. Vector Calculus 4 / 34


Note.

1. Taylor polynomial of degree n is an approximation of a function. For


n = 1, we obtain the linear approximation.
2. The Taylor polynomial is similar to f in a neighborhood around x0 .

FPT University Chapter 5. Vector Calculus 4 / 34


Note.

1. Taylor polynomial of degree n is an approximation of a function. For


n = 1, we obtain the linear approximation.
2. The Taylor polynomial is similar to f in a neighborhood around x0 .
3. For k ≤ n, Taylor polynomial of degree n is an exact representation of
a polynomial f of degree k.

FPT University Chapter 5. Vector Calculus 4 / 34


Note.

1. Taylor polynomial of degree n is an approximation of a function. For


n = 1, we obtain the linear approximation.
2. The Taylor polynomial is similar to f in a neighborhood around x0 .
3. For k ≤ n, Taylor polynomial of degree n is an exact representation of
a polynomial f of degree k.

The function
f (x) = cos(x) is
approximated by
Taylor polynomials
around x0 = 1.

FPT University Chapter 5. Vector Calculus 4 / 34


Note.

1. Taylor polynomial of degree n is an approximation of a function. For


n = 1, we obtain the linear approximation.
2. The Taylor polynomial is similar to f in a neighborhood around x0 .
3. For k ≤ n, Taylor polynomial of degree n is an exact representation of
a polynomial f of degree k.

The function
f (x) = cos(x) is
approximated by
Taylor polynomials
around x0 = 1.

FPT University Chapter 5. Vector Calculus 4 / 34


Example
Consider the function f (x) = x 3 + x − 3 and find the Taylor polynomial T4
at x0 = 1.
Answer. We have

f 0 (x) = 3x 2 + 1, f 00 (x) = 6x, f 000 (x) = 6, f (4) (x) = 0.

FPT University Chapter 5. Vector Calculus 5 / 34


Example
Consider the function f (x) = x 3 + x − 3 and find the Taylor polynomial T4
at x0 = 1.
Answer. We have

f 0 (x) = 3x 2 + 1, f 00 (x) = 6x, f 000 (x) = 6, f (4) (x) = 0.

Evaluate them at x0 = 1, we get

FPT University Chapter 5. Vector Calculus 5 / 34


Example
Consider the function f (x) = x 3 + x − 3 and find the Taylor polynomial T4
at x0 = 1.
Answer. We have

f 0 (x) = 3x 2 + 1, f 00 (x) = 6x, f 000 (x) = 6, f (4) (x) = 0.

Evaluate them at x0 = 1, we get

f (1) = −1, f 0 (1) = 4, f 00 (1) = 6, f 000 (1) = 6, f (4) (1) = 0.

FPT University Chapter 5. Vector Calculus 5 / 34


Example
Consider the function f (x) = x 3 + x − 3 and find the Taylor polynomial T4
at x0 = 1.
Answer. We have

f 0 (x) = 3x 2 + 1, f 00 (x) = 6x, f 000 (x) = 6, f (4) (x) = 0.

Evaluate them at x0 = 1, we get

f (1) = −1, f 0 (1) = 4, f 00 (1) = 6, f 000 (1) = 6, f (4) (1) = 0.

Thus, the Taylor polynomial T4 of f at x0 = 1 is

FPT University Chapter 5. Vector Calculus 5 / 34


Example
Consider the function f (x) = x 3 + x − 3 and find the Taylor polynomial T4
at x0 = 1.
Answer. We have

f 0 (x) = 3x 2 + 1, f 00 (x) = 6x, f 000 (x) = 6, f (4) (x) = 0.

Evaluate them at x0 = 1, we get

f (1) = −1, f 0 (1) = 4, f 00 (1) = 6, f 000 (1) = 6, f (4) (1) = 0.

Thus, the Taylor polynomial T4 of f at x0 = 1 is


6 6
T4 (x) = −1 + 4(x − 1) + (x − 1)2 + (x − 1)3
2! 3!
= −1 + 4(x − 1) + 3(x − 1)2 + (x − 1)3
= f (x).

FPT University Chapter 5. Vector Calculus 5 / 34


Example
Consider the function f (x) = x 3 + x − 3 and find the Taylor polynomial T4
at x0 = 1.
Answer. We have

f 0 (x) = 3x 2 + 1, f 00 (x) = 6x, f 000 (x) = 6, f (4) (x) = 0.

Evaluate them at x0 = 1, we get

f (1) = −1, f 0 (1) = 4, f 00 (1) = 6, f 000 (1) = 6, f (4) (1) = 0.

Thus, the Taylor polynomial T4 of f at x0 = 1 is


6 6
T4 (x) = −1 + 4(x − 1) + (x − 1)2 + (x − 1)3
2! 3!
= −1 + 4(x − 1) + 3(x − 1)2 + (x − 1)3
= f (x).

FPT University Chapter 5. Vector Calculus 5 / 34


Maclaurin series of some basic functions


x2 x3 X xn
ex = 1 + x + + + ··· =
2! 3! n!
k=0

x3 x5 X (−1)k x 2k+1
sin(x) = x − + − ··· =
3! 5! (2k + 1)!
k=0

x2 x4 X (−1)k x 2k
cos(x) = 1 − + − ··· =
2! 4! (2k)!
k=0

1 X
= 1 + x + x2 + · · · = xk.
1−x
k=0

FPT University Chapter 5. Vector Calculus 6 / 34


5.2 Partial Differentiation and Gradients
Definition
A function f : Rn → R is a rule that assigns to each x ∈ Rn of n variables
x1 , . . . , xn to a real value f (x) = f (x1 , · · · , xn ).

FPT University Chapter 5. Vector Calculus 7 / 34


5.2 Partial Differentiation and Gradients
Definition
A function f : Rn → R is a rule that assigns to each x ∈ Rn of n variables
x1 , . . . , xn to a real value f (x) = f (x1 , · · · , xn ).

Definition
For a function f : Rn → R, the partial derivatives of f are defined by

FPT University Chapter 5. Vector Calculus 7 / 34


5.2 Partial Differentiation and Gradients
Definition
A function f : Rn → R is a rule that assigns to each x ∈ Rn of n variables
x1 , . . . , xn to a real value f (x) = f (x1 , · · · , xn ).

Definition
For a function f : Rn → R, the partial derivatives of f are defined by

∂f f (x1 + h, x2 , . . . , xn ) − f (x1 , x2 , . . . , xn )
:= lim
∂x1 h→0 h
..
.
∂f f (x1 , x2 , . . . , xn + h) − f (x1 , x2 , . . . , xn )
:= lim
∂xn h→0 h

FPT University Chapter 5. Vector Calculus 7 / 34


5.2 Partial Differentiation and Gradients
Definition
A function f : Rn → R is a rule that assigns to each x ∈ Rn of n variables
x1 , . . . , xn to a real value f (x) = f (x1 , · · · , xn ).

Definition
For a function f : Rn → R, the partial derivatives of f are defined by

∂f f (x1 + h, x2 , . . . , xn ) − f (x1 , x2 , . . . , xn )
:= lim
∂x1 h→0 h
..
.
∂f f (x1 , x2 , . . . , xn + h) − f (x1 , x2 , . . . , xn )
:= lim
∂xn h→0 h

∂f
Note. When finding the partial derivative ∂xi , we consider only xi varies
and keep the others constant.
FPT University Chapter 5. Vector Calculus 7 / 34
5.2 Partial Differentiation and Gradients
Definition
A function f : Rn → R is a rule that assigns to each x ∈ Rn of n variables
x1 , . . . , xn to a real value f (x) = f (x1 , · · · , xn ).

Definition
For a function f : Rn → R, the partial derivatives of f are defined by

∂f f (x1 + h, x2 , . . . , xn ) − f (x1 , x2 , . . . , xn )
:= lim
∂x1 h→0 h
..
.
∂f f (x1 , x2 , . . . , xn + h) − f (x1 , x2 , . . . , xn )
:= lim
∂xn h→0 h

∂f
Note. When finding the partial derivative ∂xi , we consider only xi varies
and keep the others constant.
FPT University Chapter 5. Vector Calculus 7 / 34
Example
1
Find the partial derivatives of function f (x, y ) = 1+x 2 +2y 4
.
Answer.

FPT University Chapter 5. Vector Calculus 8 / 34


Example
1
Find the partial derivatives of function f (x, y ) = 1+x 2 +2y 4
.
Answer.
∂f (x, y ) 1 ∂ 2x
=− 2 4 2
(1 + x 2 + 3y 4 ) = −
∂x (1 + x + 3y ) ∂x (1 + x + 3y 4 )2
2

∂f (x, y ) 1 ∂ 2 4 12y 3
=− (1 + x + 3y ) = − .
∂y (1 + x 2 + 3y 4 )2 ∂y (1 + x 2 + 3y 4 )2

FPT University Chapter 5. Vector Calculus 8 / 34


Example
1
Find the partial derivatives of function f (x, y ) = 1+x 2 +2y 4
.
Answer.
∂f (x, y ) 1 ∂ 2x
=− 2 4 2
(1 + x 2 + 3y 4 ) = −
∂x (1 + x + 3y ) ∂x (1 + x + 3y 4 )2
2

∂f (x, y ) 1 ∂ 2 4 12y 3
=− (1 + x + 3y ) = − .
∂y (1 + x 2 + 3y 4 )2 ∂y (1 + x 2 + 3y 4 )2

FPT University Chapter 5. Vector Calculus 8 / 34


Definition
For a function f : Rn → R of n variables x1 , · · · , xn , the gradient of f is
defined as:
df h
∂f ∂f
i
1×n
∇x f = grad f = = ∂x · · · ∂xn ∈ R .
dx 1

FPT University Chapter 5. Vector Calculus 9 / 34


Definition
For a function f : Rn → R of n variables x1 , · · · , xn , the gradient of f is
defined as:
df h
∂f ∂f
i
1×n
∇x f = grad f = = ∂x · · · ∂xn ∈ R .
dx 1

Example
Find the gradient of function f (x1 , x2 , x3 ) = x12 x23 − 4x22 x3 .

FPT University Chapter 5. Vector Calculus 9 / 34


Definition
For a function f : Rn → R of n variables x1 , · · · , xn , the gradient of f is
defined as:
df h
∂f ∂f
i
1×n
∇x f = grad f = = ∂x · · · ∂xn ∈ R .
dx 1

Example
Find the gradient of function f (x1 , x2 , x3 ) = x12 x23 − 4x22 x3 .
Answer.
h i
∂f ∂f ∂f
∇x f = ∂x1 ∂x2 ∂x3

= 2x1 x23 3x12 x22 − 8x2 x3 −4x22 ∈ R1×3 .


 

FPT University Chapter 5. Vector Calculus 9 / 34


Definition
For a function f : Rn → R of n variables x1 , · · · , xn , the gradient of f is
defined as:
df h
∂f ∂f
i
1×n
∇x f = grad f = = ∂x · · · ∂xn ∈ R .
dx 1

Example
Find the gradient of function f (x1 , x2 , x3 ) = x12 x23 − 4x22 x3 .
Answer.
h i
∂f ∂f ∂f
∇x f = ∂x1 ∂x2 ∂x3

= 2x1 x23 3x12 x22 − 8x2 x3 −4x22 ∈ R1×3 .


 

FPT University Chapter 5. Vector Calculus 9 / 34


Basic rules of Partial Differentiation

For f , g : Rn → R functions of variables x ∈ Rn .

Sum rule: ∇x [f + g ] = ∇x f + ∇x g .

FPT University Chapter 5. Vector Calculus 10 / 34


Basic rules of Partial Differentiation

For f , g : Rn → R functions of variables x ∈ Rn .

Sum rule: ∇x [f + g ] = ∇x f + ∇x g .
Product rule: ∇x [fg ] = g (x)∇x f + f (x)∇x g .

FPT University Chapter 5. Vector Calculus 10 / 34


Basic rules of Partial Differentiation

For f , g : Rn → R functions of variables x ∈ Rn .

Sum rule: ∇x [f + g ] = ∇x f + ∇x g .
Product rule: ∇x [fg ] = g (x)∇x f + f (x)∇x g .

Consider a function f : Rn → R of n variables x = (x1 , . . . , xn ). Moreover,


xi (t) are themselves functions of t. Then we have

FPT University Chapter 5. Vector Calculus 10 / 34


Basic rules of Partial Differentiation

For f , g : Rn → R functions of variables x ∈ Rn .

Sum rule: ∇x [f + g ] = ∇x f + ∇x g .
Product rule: ∇x [fg ] = g (x)∇x f + f (x)∇x g .

Consider a function f : Rn → R of n variables x = (x1 , . . . , xn ). Moreover,


xi (t) are themselves functions of t. Then we have

df ∂f dx1 ∂f dxn
Chain rule : f 0 (t) = = + ··· +
dt ∂x1 dt ∂xn dt
n
X ∂f dxi
= .
∂xi dt
i=1

FPT University Chapter 5. Vector Calculus 10 / 34


Basic rules of Partial Differentiation

For f , g : Rn → R functions of variables x ∈ Rn .

Sum rule: ∇x [f + g ] = ∇x f + ∇x g .
Product rule: ∇x [fg ] = g (x)∇x f + f (x)∇x g .

Consider a function f : Rn → R of n variables x = (x1 , . . . , xn ). Moreover,


xi (t) are themselves functions of t. Then we have

df ∂f dx1 ∂f dxn
Chain rule : f 0 (t) = = + ··· +
dt ∂x1 dt ∂xn dt
n
X ∂f dxi
= .
∂xi dt
i=1

FPT University Chapter 5. Vector Calculus 10 / 34


Example
Consider f (x1 , x2 ) = x12 + x1 x2 , where x1 = sin t, x2 = cos t. Find the
derivative of f with respect to t.
Answer: We have

FPT University Chapter 5. Vector Calculus 11 / 34


Example
Consider f (x1 , x2 ) = x12 + x1 x2 , where x1 = sin t, x2 = cos t. Find the
derivative of f with respect to t.
Answer: We have
∂f ∂f
= 2x1 + x2 , = x1 .
∂x1 ∂x2
dx1 dx2
= cos t, = − sin t.
dt dt
Hence

FPT University Chapter 5. Vector Calculus 11 / 34


Example
Consider f (x1 , x2 ) = x12 + x1 x2 , where x1 = sin t, x2 = cos t. Find the
derivative of f with respect to t.
Answer: We have
∂f ∂f
= 2x1 + x2 , = x1 .
∂x1 ∂x2
dx1 dx2
= cos t, = − sin t.
dt dt
Hence
df
= (2x1 + x2 ) cos t + x1 (− sin t)
dt
= (2 sin t + cos t) cos t − sin t sin t
= 2 sin t cos t + cos2 t − sin2 t
= sin(2t) + cos(2t).

FPT University Chapter 5. Vector Calculus 11 / 34


Example
Consider f (x1 , x2 ) = x12 + x1 x2 , where x1 = sin t, x2 = cos t. Find the
derivative of f with respect to t.
Answer: We have
∂f ∂f
= 2x1 + x2 , = x1 .
∂x1 ∂x2
dx1 dx2
= cos t, = − sin t.
dt dt
Hence
df
= (2x1 + x2 ) cos t + x1 (− sin t)
dt
= (2 sin t + cos t) cos t − sin t sin t
= 2 sin t cos t + cos2 t − sin2 t
= sin(2t) + cos(2t).

FPT University Chapter 5. Vector Calculus 11 / 34


5.3 Gradients of Vector-Valued Functions
Definition
A vector-valued function of n variables x = (x1 , . . . , xn ), f : Rn → Rm is
given as    
f1 (x) f1 (x1 , . . . , xn )
f (x) =  ...  =  ..
,
   
.
fm (x) fm (x1 , . . . , xn )

FPT University Chapter 5. Vector Calculus 12 / 34


5.3 Gradients of Vector-Valued Functions
Definition
A vector-valued function of n variables x = (x1 , . . . , xn ), f : Rn → Rm is
given as    
f1 (x) f1 (x1 , . . . , xn )
f (x) =  ...  =  ..
,
   
.
fm (x) fm (x1 , . . . , xn )
where fi : Rn → R is a function of x.

FPT University Chapter 5. Vector Calculus 12 / 34


5.3 Gradients of Vector-Valued Functions
Definition
A vector-valued function of n variables x = (x1 , . . . , xn ), f : Rn → Rm is
given as    
f1 (x) f1 (x1 , . . . , xn )
f (x) =  ...  =  ..
,
   
.
fm (x) fm (x1 , . . . , xn )
where fi : Rn → R is a function of x.

Definition
The partial derivative of a vector-valued function f : Rn → Rm with
respect to xi , i = 1, . . . , n, is given as the vector

FPT University Chapter 5. Vector Calculus 12 / 34


5.3 Gradients of Vector-Valued Functions
Definition
A vector-valued function of n variables x = (x1 , . . . , xn ), f : Rn → Rm is
given as    
f1 (x) f1 (x1 , . . . , xn )
f (x) =  ...  =  ..
,
   
.
fm (x) fm (x1 , . . . , xn )
where fi : Rn → R is a function of x.

Definition
The partial derivative of a vector-valued function f : Rn → Rm with
respect to xi , i = 1, . . . , n, is given as the vector
 ∂f1 
i∂x
∂f
=  ...  ∈ Rm .
 
∂xi ∂f m
∂xi

FPT University Chapter 5. Vector Calculus 12 / 34


5.3 Gradients of Vector-Valued Functions
Definition
A vector-valued function of n variables x = (x1 , . . . , xn ), f : Rn → Rm is
given as    
f1 (x) f1 (x1 , . . . , xn )
f (x) =  ...  =  ..
,
   
.
fm (x) fm (x1 , . . . , xn )
where fi : Rn → R is a function of x.

Definition
The partial derivative of a vector-valued function f : Rn → Rm with
respect to xi , i = 1, . . . , n, is given as the vector
 ∂f1 
i∂x
∂f
=  ...  ∈ Rm .
 
∂xi ∂f m
∂xi

FPT University Chapter 5. Vector Calculus 12 / 34


Example
Consider a linear transformation (operator) f : Rn → Rm

f (x) = Ax, where A = aij ∈ Rm×n .


 

It is easy to see that the partial derivatives of f are

FPT University Chapter 5. Vector Calculus 13 / 34


Example
Consider a linear transformation (operator) f : Rn → Rm

f (x) = Ax, where A = aij ∈ Rm×n .


 

It is easy to see that the partial derivatives of f are

FPT University Chapter 5. Vector Calculus 13 / 34


Example
Consider a linear transformation (operator) f : Rn → Rm

f (x) = Ax, where A = aij ∈ Rm×n .


 

It is easy to see that the partial derivatives of f are


 
a1j
∂f  .. 
= Aj =  .  ∈ Rm , j = 1, . . . , n,
∂xj
amj

FPT University Chapter 5. Vector Calculus 13 / 34


Example
Consider a linear transformation (operator) f : Rn → Rm

f (x) = Ax, where A = aij ∈ Rm×n .


 

It is easy to see that the partial derivatives of f are


 
a1j
∂f  .. 
= Aj =  .  ∈ Rm , j = 1, . . . , n,
∂xj
amj

FPT University Chapter 5. Vector Calculus 13 / 34


Example
Consider a vector-valued function
T
f (x, y ) = xy 2 y 3 x 2 − y 2 .


FPT University Chapter 5. Vector Calculus 14 / 34


Example
Consider a vector-valued function
T
f (x, y ) = xy 2 y 3 x 2 − y 2 .


Then
 2
y
∂f
=  0 ,
∂x
2x
 
2xy
∂f
=  3y 2  .
∂y
−2y

FPT University Chapter 5. Vector Calculus 14 / 34


Example
Consider a vector-valued function
T
f (x, y ) = xy 2 y 3 x 2 − y 2 .


Then
 2
y
∂f
=  0 ,
∂x
2x
 
2xy
∂f
=  3y 2  .
∂y
−2y

FPT University Chapter 5. Vector Calculus 14 / 34


Definition
The collection of all first-order partial derivatives of a vector-valued
function f : Rn → Rm is called the Jacobian.
The Jacobian J is an m × n matrix, which we define and arrange as follows:

FPT University Chapter 5. Vector Calculus 15 / 34


Definition
The collection of all first-order partial derivatives of a vector-valued
function f : Rn → Rm is called the Jacobian.
The Jacobian J is an m × n matrix, which we define and arrange as follows:
df h
∂f ∂f
i
J = ∇x f = = ∂x ··· ∂xn
dx 1
 ∂f1 ∂f1 
∂x1 · · · ∂x n

=  ... · · · · · ·  .
 
∂fm
∂x1 · · · ∂f
∂xn
m

FPT University Chapter 5. Vector Calculus 15 / 34


Definition
The collection of all first-order partial derivatives of a vector-valued
function f : Rn → Rm is called the Jacobian.
The Jacobian J is an m × n matrix, which we define and arrange as follows:
df h
∂f ∂f
i
J = ∇x f = = ∂x ··· ∂xn
dx 1
 ∂f1 ∂f1 
∂x1 · · · ∂x n

=  ... · · · · · ·  .
 
∂fm
∂x1 · · · ∂f
∂xn
m

FPT University Chapter 5. Vector Calculus 15 / 34


Definition
The collection of all first-order partial derivatives of a vector-valued
function f : Rn → Rm is called the Jacobian.
The Jacobian J is an m × n matrix, which we define and arrange as follows:
df h
∂f ∂f
i
J = ∇x f = = ∂x ··· ∂xn
dx 1
 ∂f1 ∂f1 
∂x1 · · · ∂x n

=  ... · · · · · ·  .
 
∂fm
∂x1 · · · ∂f
∂xn
m

FPT University Chapter 5. Vector Calculus 15 / 34


Example
If f : Rn → Rm is a linear operator given by

f (x) = Ax,

where A be a matrix in Rm×n . Then it is clear that the Jacobian of f :

FPT University Chapter 5. Vector Calculus 16 / 34


Example
If f : Rn → Rm is a linear operator given by

f (x) = Ax,

where A be a matrix in Rm×n . Then it is clear that the Jacobian of f :

∇x f = A.

FPT University Chapter 5. Vector Calculus 16 / 34


Example
If f : Rn → Rm is a linear operator given by

f (x) = Ax,

where A be a matrix in Rm×n . Then it is clear that the Jacobian of f :

∇x f = A.

Example
 −x +x 
e 1 2
2 3
Consider a vector-valued function f : R → R , f (x1 , x2 ) =  x1 x22  .
sin(x1 )

FPT University Chapter 5. Vector Calculus 16 / 34


Example
If f : Rn → Rm is a linear operator given by

f (x) = Ax,

where A be a matrix in Rm×n . Then it is clear that the Jacobian of f :

∇x f = A.

Example
 −x +x 
e 1 2
2 3
Consider a vector-valued function f : R → R , f (x1 , x2 ) =  x1 x22  .
sin(x1 )

FPT University Chapter 5. Vector Calculus 16 / 34


Example
If f : Rn → Rm is a linear operator given by

f (x) = Ax,

where A be a matrix in Rm×n . Then it is clear that the Jacobian of f :

∇x f = A.

Example
 −x +x 
e 1 2
2 3
Consider a vector-valued function f : R → R , f (x1 , x2 ) =  x1 x22  .
sin(x1 )
The Jacobian of f is
 −x +x
−e 1 2 e −x1 +x2

J = ∇x f =  x22 2x1 x2  .
cos x1 0

FPT University Chapter 5. Vector Calculus 16 / 34


Example
If f : Rn → Rm is a linear operator given by

f (x) = Ax,

where A be a matrix in Rm×n . Then it is clear that the Jacobian of f :

∇x f = A.

Example
 −x +x 
e 1 2
2 3
Consider a vector-valued function f : R → R , f (x1 , x2 ) =  x1 x22  .
sin(x1 )
The Jacobian of f is
 −x +x
−e 1 2 e −x1 +x2

J = ∇x f =  x22 2x1 x2  .
cos x1 0

FPT University Chapter 5. Vector Calculus 16 / 34


Chain rule

Consider a valued-vector function f : Rn → Rm given by


 T
f (x) = f (x1 , . . . , xn ) = f1 (x) · · · fm (x)

and xi = xi (t1 , . . . , tl ) are themselves function of l-variables


t = (t1 , . . . , tl ). It means x : Rl → Rn

FPT University Chapter 5. Vector Calculus 17 / 34


Chain rule

Consider a valued-vector function f : Rn → Rm given by


 T
f (x) = f (x1 , . . . , xn ) = f1 (x) · · · fm (x)

and xi = xi (t1 , . . . , tl ) are themselves function of l-variables


t = (t1 , . . . , tl ). It means x : Rl → Rn
 T
x(t) = x(t1 , . . . , tl ) = x1 (t) · · · xn (t) .

FPT University Chapter 5. Vector Calculus 17 / 34


Chain rule

Consider a valued-vector function f : Rn → Rm given by


 T
f (x) = f (x1 , . . . , xn ) = f1 (x) · · · fm (x)

and xi = xi (t1 , . . . , tl ) are themselves function of l-variables


t = (t1 , . . . , tl ). It means x : Rl → Rn
 T
x(t) = x(t1 , . . . , tl ) = x1 (t) · · · xn (t) .

Then f ◦ x : Rl → Rm given by f (t) = f (x(t)) and

FPT University Chapter 5. Vector Calculus 17 / 34


Chain rule

Consider a valued-vector function f : Rn → Rm given by


 T
f (x) = f (x1 , . . . , xn ) = f1 (x) · · · fm (x)

and xi = xi (t1 , . . . , tl ) are themselves function of l-variables


t = (t1 , . . . , tl ). It means x : Rl → Rn
 T
x(t) = x(t1 , . . . , tl ) = x1 (t) · · · xn (t) .

Then f ◦ x : Rl → Rm given by f (t) = f (x(t)) and


n
∂fj X ∂fj ∂xi
= , for all j = 1, . . . , m and k = 1, . . . , l
∂tk ∂xi ∂tk
i=1
∇t f = ∇x f ∇t x.

FPT University Chapter 5. Vector Calculus 17 / 34


Chain rule

Consider a valued-vector function f : Rn → Rm given by


 T
f (x) = f (x1 , . . . , xn ) = f1 (x) · · · fm (x)

and xi = xi (t1 , . . . , tl ) are themselves function of l-variables


t = (t1 , . . . , tl ). It means x : Rl → Rn
 T
x(t) = x(t1 , . . . , tl ) = x1 (t) · · · xn (t) .

Then f ◦ x : Rl → Rm given by f (t) = f (x(t)) and


n
∂fj X ∂fj ∂xi
= , for all j = 1, . . . , m and k = 1, . . . , l
∂tk ∂xi ∂tk
i=1
∇t f = ∇x f ∇t x.

FPT University Chapter 5. Vector Calculus 17 / 34


Example
Consider a function f (x1 , x2 ) = x12 + 2x1 x2 and x1 (s, t) = s. cos t,
x2 = s. sin t. Find the partial derivatives of f with respect to s and t.
Answer:
∂f ∂f ∂x1 ∂f ∂x2
= +
∂s ∂x1 ∂s ∂x2 ∂s
= (2x1 + 2x2 ) cos t + 2x1 sin t
= 2s. cos2 t + 4s. sin t. cos t = 2s. cos2 t + 2s. sin(2t)
∂f ∂f ∂x1 ∂f ∂x2
= +
∂t ∂x1 ∂t ∂x2 ∂t
= (2x1 + 2x2 )(−s. sin t) + 2x1 .(s. cos t)
= −s 2 . sin(2t) + 2s 2 cos(2t)

FPT University Chapter 5. Vector Calculus 18 / 34


Example
Consider a function f (x1 , x2 ) = x12 + 2x1 x2 and x1 (s, t) = s. cos t,
x2 = s. sin t. Find the partial derivatives of f with respect to s and t.
Answer:
∂f ∂f ∂x1 ∂f ∂x2
= +
∂s ∂x1 ∂s ∂x2 ∂s
= (2x1 + 2x2 ) cos t + 2x1 sin t
= 2s. cos2 t + 4s. sin t. cos t = 2s. cos2 t + 2s. sin(2t)
∂f ∂f ∂x1 ∂f ∂x2
= +
∂t ∂x1 ∂t ∂x2 ∂t
= (2x1 + 2x2 )(−s. sin t) + 2x1 .(s. cos t)
= −s 2 . sin(2t) + 2s 2 cos(2t)

FPT University Chapter 5. Vector Calculus 18 / 34


5.6.1 Gradients in a Deep Network

In deep learning, the function value y is computed as a many-level


function composition

y = (fK ◦ fK −1 ◦ · · · ◦ f1 )(x) = fK (fK −1 (· · · (f1 (x)) · · · ))

where x are the inputs (e.g., images), y are the observations (e.g., class
labels), and every function fi , i = 1, · · · , K , possesses its own parameters.

FPT University Chapter 5. Vector Calculus 19 / 34


Given a neural network with multiple layers:

In the i th layer:
fi (xi−1 ) = σ(Ai−1 xi−1 + bi−1 )

FPT University Chapter 5. Vector Calculus 20 / 34


Given a neural network with multiple layers:

In the i th layer:
fi (xi−1 ) = σ(Ai−1 xi−1 + bi−1 )
where xi−1 is the output of the layer i − 1, σ is an activation function
(sigmoid or ReLU or tanh, ... functions).

FPT University Chapter 5. Vector Calculus 20 / 34


Given a neural network with multiple layers:

In the i th layer:
fi (xi−1 ) = σ(Ai−1 xi−1 + bi−1 )
where xi−1 is the output of the layer i − 1, σ is an activation function
(sigmoid or ReLU or tanh, ... functions).
Training these model requires us to compute the gradient of a loss
function L w.r.t all model parameters θj = {Aj , bj }, j = 0, . . . , K − 1.

FPT University Chapter 5. Vector Calculus 20 / 34


Given a neural network with multiple layers:

In the i th layer:
fi (xi−1 ) = σ(Ai−1 xi−1 + bi−1 )
where xi−1 is the output of the layer i − 1, σ is an activation function
(sigmoid or ReLU or tanh, ... functions).
Training these model requires us to compute the gradient of a loss
function L w.r.t all model parameters θj = {Aj , bj }, j = 0, . . . , K − 1.

FPT University Chapter 5. Vector Calculus 20 / 34


Suppose we have inputs x and observations y and a network structure

f0 := x
fi := σi (Ai fi−1 + bi−1 ), i = 1, . . . , K .

We need find θ = {θj } = {Aj , bj }, j = 0, . . . , K − 1 which minimize the


loss function
L(θ) = ky − fK (θ, x)k2 .

FPT University Chapter 5. Vector Calculus 21 / 34


Suppose we have inputs x and observations y and a network structure

f0 := x
fi := σi (Ai fi−1 + bi−1 ), i = 1, . . . , K .

We need find θ = {θj } = {Aj , bj }, j = 0, . . . , K − 1 which minimize the


loss function
L(θ) = ky − fK (θ, x)k2 .

FPT University Chapter 5. Vector Calculus 21 / 34


The gradients of L w.r.t θ:
∂L ∂L ∂fK
= ,
∂θK −1 ∂fK ∂θK −1
∂L ∂L ∂fK ∂fK −1
= ,···
∂θK −2 ∂fK ∂fK −1 ∂θK −2
∂L ∂L ∂fK ∂fi+2 ∂fi+1
= ···
∂θi ∂fK ∂fK −1 ∂fi+1 ∂θi
∂L ∂L
Most of the computation of ∂θi+1 can be reused to compute ∂θi .

Gradients are passed backward through the network.

FPT University Chapter 5. Vector Calculus 22 / 34


5.6.2 Automatic Differentiation

Backpropagation is a special case of a general technique in numerical


analysis called automatic differentiation.
Automatic differentiation refers to a set of techniques to numerically
evaluate the exact gradient of a function by working with
intermediate variables and applying the chain rule.

FPT University Chapter 5. Vector Calculus 23 / 34


5.6.2 Automatic Differentiation

Backpropagation is a special case of a general technique in numerical


analysis called automatic differentiation.
Automatic differentiation refers to a set of techniques to numerically
evaluate the exact gradient of a function by working with
intermediate variables and applying the chain rule.

FPT University Chapter 5. Vector Calculus 23 / 34


Given a simple graph representing the data flow from inputs x to outputs
y:
x −→ a −→ b −→ y .
To compute the derivative dy /dx:
 
  dy dy db da
dy dy db da =
= dx db da dx
dx db da dx
Reverse mode: gradients are prop-
Forward mode: the gradients flow
reverse mode agated backward
with forward mode the data from
through the graph, i.e., reverse to
left to right through the graph.
the data flow.

FPT University Chapter 5. Vector Calculus 24 / 34


Given a simple graph representing the data flow from inputs x to outputs
y:
x −→ a −→ b −→ y .
To compute the derivative dy /dx:
 
  dy dy db da
dy dy db da =
= dx db da dx
dx db da dx
Reverse mode: gradients are prop-
Forward mode: the gradients flow
reverse mode agated backward
with forward mode the data from
through the graph, i.e., reverse to
left to right through the graph.
the data flow.
Definition
Reverse mode automatic differentiation is called backpropagation.

FPT University Chapter 5. Vector Calculus 24 / 34


Given a simple graph representing the data flow from inputs x to outputs
y:
x −→ a −→ b −→ y .
To compute the derivative dy /dx:
 
  dy dy db da
dy dy db da =
= dx db da dx
dx db da dx
Reverse mode: gradients are prop-
Forward mode: the gradients flow
reverse mode agated backward
with forward mode the data from
through the graph, i.e., reverse to
left to right through the graph.
the data flow.
Definition
Reverse mode automatic differentiation is called backpropagation.

FPT University Chapter 5. Vector Calculus 24 / 34


Example

Consider the function


q
f (x) = x 2 + exp (x 2 ) + cos(x 2 + exp (x 2 )).

Use intermediate variables:



a = x 2 , b = exp(a), c = a + b, d = c, e = cos c, f = d + e.

We get
∂a ∂b
= 2x = exp(a)
∂x ∂a
∂c ∂c ∂d 1
= =1 = √
∂a ∂b ∂c 2 c
∂e ∂f ∂f
= − sin c = = 1.
∂c ∂d ∂e

FPT University Chapter 5. Vector Calculus 25 / 34


We can compute ∂f /∂x using backpropagation method

∂f ∂f ∂d ∂f ∂e 1
= + = 1. √ + 1.(− sin c)
∂c ∂d ∂c ∂e ∂c 2 c
∂f ∂f ∂c ∂f
= =
∂b ∂c ∂b ∂c
∂f ∂f ∂b ∂f ∂c ∂f ∂f
= + = exp(a) +
∂a ∂b ∂a ∂c ∂a ∂b ∂c
∂f ∂f ∂a ∂f
= = .2x
∂x ∂a ∂x ∂a

FPT University Chapter 5. Vector Calculus 26 / 34


Let x1 , . . . , xd be the input variables to the function, xd+1 , . . . , xD−1 be
the intermediate variables, and xD the output variable.

For i = d + 1, . . . , D : xi = gi (xPa(xi ) ),

where the gi (·) are elementary functions and xPa(xi ) are the parent nodes
of the variable xi in the graph. Let f = xD . By the chain rule,

FPT University Chapter 5. Vector Calculus 27 / 34


Let x1 , . . . , xd be the input variables to the function, xd+1 , . . . , xD−1 be
the intermediate variables, and xD the output variable.

For i = d + 1, . . . , D : xi = gi (xPa(xi ) ),

where the gi (·) are elementary functions and xPa(xi ) are the parent nodes
of the variable xi in the graph. Let f = xD . By the chain rule,
∂f X ∂f ∂xj X ∂f ∂gj
= =
∂xi ∂xj ∂xi ∂xj ∂xi
xj : xi ∈Pa(xj ) xj : xi ∈Pa(xj )

is the backpropagation of the gradient through the computation graph,


where Pa(xj ) is the set of parent nodes of xj .

FPT University Chapter 5. Vector Calculus 27 / 34


Let x1 , . . . , xd be the input variables to the function, xd+1 , . . . , xD−1 be
the intermediate variables, and xD the output variable.

For i = d + 1, . . . , D : xi = gi (xPa(xi ) ),

where the gi (·) are elementary functions and xPa(xi ) are the parent nodes
of the variable xi in the graph. Let f = xD . By the chain rule,
∂f X ∂f ∂xj X ∂f ∂gj
= =
∂xi ∂xj ∂xi ∂xj ∂xi
xj : xi ∈Pa(xj ) xj : xi ∈Pa(xj )

is the backpropagation of the gradient through the computation graph,


where Pa(xj ) is the set of parent nodes of xj .

FPT University Chapter 5. Vector Calculus 27 / 34


5.7 Higher-Order Derivatives
Consider a function f : R2 → R of two variables x, y . We use the notation
for higher-order partial derivatives:
∂2f ∂2f
   
∂ ∂f ∂ ∂f
= , =
∂x 2 ∂x ∂x ∂y 2 ∂y ∂y
2 2
   
∂ f ∂ ∂f ∂ f ∂ ∂f
= , = ,···
∂x∂y ∂x ∂y ∂y ∂x ∂y ∂x
If f (x, y ) is a twice (continuously) differentiable function, then

FPT University Chapter 5. Vector Calculus 28 / 34


5.7 Higher-Order Derivatives
Consider a function f : R2 → R of two variables x, y . We use the notation
for higher-order partial derivatives:
∂2f ∂2f
   
∂ ∂f ∂ ∂f
= , =
∂x 2 ∂x ∂x ∂y 2 ∂y ∂y
2 2
   
∂ f ∂ ∂f ∂ f ∂ ∂f
= , = ,···
∂x∂y ∂x ∂y ∂y ∂x ∂y ∂x
If f (x, y ) is a twice (continuously) differentiable function, then
∂2f ∂2f
= .
∂x∂y ∂y ∂x

FPT University Chapter 5. Vector Calculus 28 / 34


5.7 Higher-Order Derivatives
Consider a function f : R2 → R of two variables x, y . We use the notation
for higher-order partial derivatives:
∂2f ∂2f
   
∂ ∂f ∂ ∂f
= , =
∂x 2 ∂x ∂x ∂y 2 ∂y ∂y
2 2
   
∂ f ∂ ∂f ∂ f ∂ ∂f
= , = ,···
∂x∂y ∂x ∂y ∂y ∂x ∂y ∂x
If f (x, y ) is a twice (continuously) differentiable function, then
∂2f ∂2f
= .
∂x∂y ∂y ∂x

Definition
Hessian matrix of of f is

FPT University Chapter 5. Vector Calculus 28 / 34


5.7 Higher-Order Derivatives
Consider a function f : R2 → R of two variables x, y . We use the notation
for higher-order partial derivatives:
∂2f ∂2f
   
∂ ∂f ∂ ∂f
= , =
∂x 2 ∂x ∂x ∂y 2 ∂y ∂y
2 2
   
∂ f ∂ ∂f ∂ f ∂ ∂f
= , = ,···
∂x∂y ∂x ∂y ∂y ∂x ∂y ∂x
If f (x, y ) is a twice (continuously) differentiable function, then
∂2f ∂2f
= .
∂x∂y ∂y ∂x

Definition
Hessian matrix of of f is
∂2f ∂2f
" #
∂x 2
H = ∇2x,y f (x, y ) = ∂2f
∂x∂y
∂2f .
∂y ∂x ∂y 2
FPT University Chapter 5. Vector Calculus 28 / 34
5.7 Higher-Order Derivatives
Consider a function f : R2 → R of two variables x, y . We use the notation
for higher-order partial derivatives:
∂2f ∂2f
   
∂ ∂f ∂ ∂f
= , =
∂x 2 ∂x ∂x ∂y 2 ∂y ∂y
2 2
   
∂ f ∂ ∂f ∂ f ∂ ∂f
= , = ,···
∂x∂y ∂x ∂y ∂y ∂x ∂y ∂x
If f (x, y ) is a twice (continuously) differentiable function, then
∂2f ∂2f
= .
∂x∂y ∂y ∂x

Definition
Hessian matrix of of f is
∂2f ∂2f
" #
∂x 2
H = ∇2x,y f (x, y ) = ∂2f
∂x∂y
∂2f .
∂y ∂x ∂y 2
FPT University Chapter 5. Vector Calculus 28 / 34
Example
2 +2y
Let f (x, y ) = e −x . Find the Hessian matrix of f at (0, 0).

Answer:

FPT University Chapter 5. Vector Calculus 29 / 34


Example
2 +2y
Let f (x, y ) = e −x . Find the Hessian matrix of f at (0, 0).

Answer:

FPT University Chapter 5. Vector Calculus 29 / 34


Example
2 +2y
Let f (x, y ) = e −x . Find the Hessian matrix of f at (0, 0).

Answer: We have
∂f 2 ∂f 2
= −2xe −x +2y , = 2e −x +2y
∂x ∂y
∂2f 2 −x 2 +2y ∂ f
2
2
2
= (−2 + 4x )e , 2
= 4e −x +2y
∂x ∂y
2
∂ f 2
∂ f 2 2
= = −4xe −x +y .
∂x∂y ∂y ∂x
Hence  
−2 0
∇2(x,y ) f (0, 0) = .
0 4

FPT University Chapter 5. Vector Calculus 29 / 34


Example
2 +2y
Let f (x, y ) = e −x . Find the Hessian matrix of f at (0, 0).

Answer: We have
∂f 2 ∂f 2
= −2xe −x +2y , = 2e −x +2y
∂x ∂y
∂2f 2 −x 2 +2y ∂ f
2
2
2
= (−2 + 4x )e , 2
= 4e −x +2y
∂x ∂y
2
∂ f 2
∂ f 2 2
= = −4xe −x +y .
∂x∂y ∂y ∂x
Hence  
−2 0
∇2(x,y ) f (0, 0) = .
0 4

FPT University Chapter 5. Vector Calculus 29 / 34


5.8 Linearization and Multivariate Taylor Series
Theorem
Let f : Rn → R be a function that has continuous partial derivatives up to
order 2. Then

f (x) = f (a) + ∇x f (a) · (x − a) + R1 (x, a),

where the error term R1 (x, a) going to zero faster than a constant times
kx − ak2 as x → a.

Definition
The first order Taylor polynomial of f at a is:

FPT University Chapter 5. Vector Calculus 30 / 34


5.8 Linearization and Multivariate Taylor Series
Theorem
Let f : Rn → R be a function that has continuous partial derivatives up to
order 2. Then

f (x) = f (a) + ∇x f (a) · (x − a) + R1 (x, a),

where the error term R1 (x, a) going to zero faster than a constant times
kx − ak2 as x → a.

Definition
The first order Taylor polynomial of f at a is:

T1 (x) = f (a) + ∇x f (a) · (x − a)


∂f ∂f
= f (a) + (a)(x1 − a1 ) + · · · + (a)(x − an ).
∂x1 ∂xn

FPT University Chapter 5. Vector Calculus 30 / 34


5.8 Linearization and Multivariate Taylor Series
Theorem
Let f : Rn → R be a function that has continuous partial derivatives up to
order 2. Then

f (x) = f (a) + ∇x f (a) · (x − a) + R1 (x, a),

where the error term R1 (x, a) going to zero faster than a constant times
kx − ak2 as x → a.

Definition
The first order Taylor polynomial of f at a is:

T1 (x) = f (a) + ∇x f (a) · (x − a)


∂f ∂f
= f (a) + (a)(x1 − a1 ) + · · · + (a)(x − an ).
∂x1 ∂xn

FPT University Chapter 5. Vector Calculus 30 / 34


Example
Find the first order Taylor polynomial of f (x, y ) = x 2 + 2xy 3 at (1, 2).

Answer:

FPT University Chapter 5. Vector Calculus 31 / 34


Example
Find the first order Taylor polynomial of f (x, y ) = x 2 + 2xy 3 at (1, 2).

Answer:

FPT University Chapter 5. Vector Calculus 31 / 34


Example
Find the first order Taylor polynomial of f (x, y ) = x 2 + 2xy 3 at (1, 2).

Answer: We have f (1, 2) = 17 and

∂f ∂f ∂f ∂f
= 2x + 2y 3 , = 6xy 2 ⇒ (1, 2) = 18, (1, 2) = 24.
∂x ∂y ∂x ∂y

Hence the first order Taylor polynomial of f at (1, 2) is

T1 (x, y ) = 17 + 18(x − 1) + 24(y − 2).

FPT University Chapter 5. Vector Calculus 31 / 34


Example
Find the first order Taylor polynomial of f (x, y ) = x 2 + 2xy 3 at (1, 2).

Answer: We have f (1, 2) = 17 and

∂f ∂f ∂f ∂f
= 2x + 2y 3 , = 6xy 2 ⇒ (1, 2) = 18, (1, 2) = 24.
∂x ∂y ∂x ∂y

Hence the first order Taylor polynomial of f at (1, 2) is

T1 (x, y ) = 17 + 18(x − 1) + 24(y − 2).

FPT University Chapter 5. Vector Calculus 31 / 34


Theorem
Let f : Rn → R be a function that has continuous partial derivatives up to
order 3. Then we can write
1
f (x) = f (a) + ∇x f (a) · (x − a) + (x − a)T ∇2x f (a)(x − a) + R2
2
n n n
X ∂f 1 X X ∂2f
= f (a) + (a)(xi − ai ) + (a)(xi − ai )(xj − aj ) + R2 ,
∂xi 2 ∂xi ∂xj
i=1 i=1 j=1

FPT University Chapter 5. Vector Calculus 32 / 34


Theorem
Let f : Rn → R be a function that has continuous partial derivatives up to
order 3. Then we can write
1
f (x) = f (a) + ∇x f (a) · (x − a) + (x − a)T ∇2x f (a)(x − a) + R2
2
n n n
X ∂f 1 X X ∂2f
= f (a) + (a)(xi − ai ) + (a)(xi − ai )(xj − aj ) + R2 ,
∂xi 2 ∂xi ∂xj
i=1 i=1 j=1

where the error term R2 = R2 (x, a) going to zero faster than a constant
times kx − ak3 as x → a.

FPT University Chapter 5. Vector Calculus 32 / 34


Theorem
Let f : Rn → R be a function that has continuous partial derivatives up to
order 3. Then we can write
1
f (x) = f (a) + ∇x f (a) · (x − a) + (x − a)T ∇2x f (a)(x − a) + R2
2
n n n
X ∂f 1 X X ∂2f
= f (a) + (a)(xi − ai ) + (a)(xi − ai )(xj − aj ) + R2 ,
∂xi 2 ∂xi ∂xj
i=1 i=1 j=1

where the error term R2 = R2 (x, a) going to zero faster than a constant
times kx − ak3 as x → a.

Definition
The second order Taylor polynomial of f at a is

FPT University Chapter 5. Vector Calculus 32 / 34


Theorem
Let f : Rn → R be a function that has continuous partial derivatives up to
order 3. Then we can write
1
f (x) = f (a) + ∇x f (a) · (x − a) + (x − a)T ∇2x f (a)(x − a) + R2
2
n n n
X ∂f 1 X X ∂2f
= f (a) + (a)(xi − ai ) + (a)(xi − ai )(xj − aj ) + R2 ,
∂xi 2 ∂xi ∂xj
i=1 i=1 j=1

where the error term R2 = R2 (x, a) going to zero faster than a constant
times kx − ak3 as x → a.

Definition
The second order Taylor polynomial of f at a is
1
f (a) + ∇x f (a) · (x − a) + (x − a)T ∇2x f (a)(x − a).
2

FPT University Chapter 5. Vector Calculus 32 / 34


Theorem
Let f : Rn → R be a function that has continuous partial derivatives up to
order 3. Then we can write
1
f (x) = f (a) + ∇x f (a) · (x − a) + (x − a)T ∇2x f (a)(x − a) + R2
2
n n n
X ∂f 1 X X ∂2f
= f (a) + (a)(xi − ai ) + (a)(xi − ai )(xj − aj ) + R2 ,
∂xi 2 ∂xi ∂xj
i=1 i=1 j=1

where the error term R2 = R2 (x, a) going to zero faster than a constant
times kx − ak3 as x → a.

Definition
The second order Taylor polynomial of f at a is
1
f (a) + ∇x f (a) · (x − a) + (x − a)T ∇2x f (a)(x − a).
2

FPT University Chapter 5. Vector Calculus 32 / 34


2
Find the second order Taylor polynomial of f (x, y ) = e x+y about
(x, y ) = (0, 0).

Answer:

FPT University Chapter 5. Vector Calculus 33 / 34


2
Find the second order Taylor polynomial of f (x, y ) = e x+y about
(x, y ) = (0, 0).

Answer:

FPT University Chapter 5. Vector Calculus 33 / 34


2
Find the second order Taylor polynomial of f (x, y ) = e x+y about
(x, y ) = (0, 0).

Answer: We can compute


∂f 2 ∂f 2 ∂2f 2
= e x+y , = 2ye x+y , = e x+y ,
∂x ∂y ∂x 2
∂2f ∂2f 2 ∂2f 2
= = 2ye x+y , 2
= (2 + 4y 2 )e x+y .
∂x∂y ∂y ∂x ∂y

FPT University Chapter 5. Vector Calculus 33 / 34


2
Find the second order Taylor polynomial of f (x, y ) = e x+y about
(x, y ) = (0, 0).

Answer: We can compute


∂f 2 ∂f 2 ∂2f 2
= e x+y , = 2ye x+y , = e x+y ,
∂x ∂y ∂x 2
∂2f ∂2f 2 ∂2f 2
= = 2ye x+y , 2
= (2 + 4y 2 )e x+y .
∂x∂y ∂y ∂x ∂y
Hence
 
  2 1 0
∇(x,y ) (0, 0) = 1 0 , ∇(x,y ) (0, 0) = .
0 2

FPT University Chapter 5. Vector Calculus 33 / 34


2
Find the second order Taylor polynomial of f (x, y ) = e x+y about
(x, y ) = (0, 0).

Answer: We can compute


∂f 2 ∂f 2 ∂2f 2
= e x+y , = 2ye x+y , = e x+y ,
∂x ∂y ∂x 2
∂2f ∂2f 2 ∂2f 2
= = 2ye x+y , 2
= (2 + 4y 2 )e x+y .
∂x∂y ∂y ∂x ∂y
Hence
 
  2 1 0
∇(x,y ) (0, 0) = 1 0 , ∇(x,y ) (0, 0) = .
0 2
The second order Taylor polynomial is
   
x −0 1  2 x −0
f (0, 0) + ∇(x,y ) (0, 0) + x − 0 y − 0 ∇(x,y ) (0, 0)
y −0 2 y −0
1
=1 + x + x 2 + y 2 .
2
FPT University Chapter 5. Vector Calculus 33 / 34
2
Find the second order Taylor polynomial of f (x, y ) = e x+y about
(x, y ) = (0, 0).

Answer: We can compute


∂f 2 ∂f 2 ∂2f 2
= e x+y , = 2ye x+y , = e x+y ,
∂x ∂y ∂x 2
∂2f ∂2f 2 ∂2f 2
= = 2ye x+y , 2
= (2 + 4y 2 )e x+y .
∂x∂y ∂y ∂x ∂y
Hence
 
  2 1 0
∇(x,y ) (0, 0) = 1 0 , ∇(x,y ) (0, 0) = .
0 2
The second order Taylor polynomial is
   
x −0 1  2 x −0
f (0, 0) + ∇(x,y ) (0, 0) + x − 0 y − 0 ∇(x,y ) (0, 0)
y −0 2 y −0
1
=1 + x + x 2 + y 2 .
2
FPT University Chapter 5. Vector Calculus 33 / 34
summary

We have studied:

How to differentiate an univariate function?


gradients of multivariable functions and vector-valued functions;
backpropagation;
higher-order derivatives;
linearization and Taylor series.

Exercises for practice: 5.1-5.7 (pages 170, 171).

FPT University Chapter 5. Vector Calculus 34 / 34

You might also like