0% found this document useful (0 votes)

16 views39 pages

Lecture 2 - Math

The document provides an overview of essential mathematical concepts for machine learning, focusing on linear algebra, probability, and vector calculus. It emphasizes the importance of a solid mathematical foundation for understanding machine learning algorithms and making informed decisions in their application. Key topics include vector and matrix operations, probability theory, and optimization techniques, all critical for effective machine learning practices.

Uploaded by

aeryaery0

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views39 pages

Lecture 2 - Math

Uploaded by

aeryaery0

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

Artificial Intelligence II (CS4442 & CS9542)

A Brief Review of Mathematics for Machine Learning

Boyu Wang
Department of Computer Science
University of Western Ontario
Outline

If you do NOT have taken a linear algebra course (e.g., MATH 1600B), this course (at
least the first half) could be extremely difficult for you!

I Linear Algebra

I Probability

I Vector Calculus & Optimization

1
2
Why worry about the math?

I There are lots of easy-to-use machine learning packages out

there.

I However, to get really useful results, you need good

mathematical intuitions about certain machine learning
principles, as well as the inner workings of the individual
algorithms.

- Choose the right algorithm(s) for the problem

- Make good choices on parameter settings, validation
strategies
- Troubleshoot poor / ambiguous results
- Do a better job of coding algorithms
- Apply for a PhD at a top-tier university

3
Linear Algebra

4
Notation Reference

Table: Table of Notations

Notation Meaning
R set of real numbers (one-dimensional space)
Rn set of n-tuples of real numbers, (n-dimensional space)
Rm×n set of m × n matrix space
a a scalar or vector (i.e., x ∈ R or x ∈ Rn )
A a matrix (i.e., X ∈ Rm×n )
I identity matrix
A−1 inverse of a square matrix A: AA−1 = A−1 A = I
a> , A> transpose of a vector/matrix
dot product of vectors: ha, bi = a> b = ni=1 ai bi
P
ha, bi
||a||2 , ||a||1 `2 , `1 -norm of a
|A| determinant of a square matrix A
tr(A) trace of a square matrix A

5
Linear algebra applications

I Operations on or between vectors and matrices.

I Dimensionality reduction.

I Linear regression.

I Many others.

6
Why vectors and matrices?

Slide credit: Jeff Howbert

7
Vectors

I Definition: an n-tuple of values (usually real numbers).

I Can be written in column form or row form (column form is

conventional). Vector elements referenced by subscript.
 
a1
a =  ...  = [a1 , . . . , an ]>
 

I Can think of a vector as: a point in space or a directed line

segment with a magnitude and direction.

8
Vector arithmetic

I Addition of two vectors

- add corresponding elements: a + b = [a1 + b1 , . . . , an + bn ]>
- result is a vector
I Scalar multiplication of a vector
- multiply each element by scalar: λa = [λa1 , . . . , λan ]>
- result is a vector.
I Inner/Dot product of two vectors
- multiply corresponding elements,
P then add products:
n
ha, bi = a · b = a> b = b> a = i=1 ai bi
- result is a scalar.
I `2 -norm of a vector
p √ qP
n
- ||a|| = ha, ai = a> a = i=1 ai2

- a> b = ||a||||b|| cos(θ)

- Euclidean distance between two vectors: ||a − b|| 9
Matrices

A vector can be regarded as special case of a matrix, where one of

matrix dimensions = 1.

I Matrix transpose (denoted > ): swap columns and rows. m × n

matrix becomes n × m matrix

I Addition of two matrices

I Scalar multiplication of a matrix

10
Matrices multiplication
If A is an m × n matrix (i.e., A ∈ Rm×n ), and B is an n × p matrix (i.e.,
B ∈ Rn×p )
   
a11 a12 · · · a1n b11 b12 · · · b1p
 a21 a22 · · · a2n  b21 b22 · · · b2p 
A= . , B= .
   
. . . .. .. .. 
 .. .. .. ..   ..

. . . 
am1 am2 · · · amn bn1 bn2 · · · bnp

the matrix product C = AB (denoted without multiplication dots) is

defined to be the m × p matrix
 
c11 c12 · · · c1p
 c21 c22 · · · c2p 
C= . ..  ,
 
.. ..
 .. . . . 
cm1 cm2 ··· cmp
Pn
where cij = k=1 aik bkj : cij is given by the vector product between the
i-th row of A and the j-th column of B.
11
Matrices multiplication

Properties:

I A(BC) = (AB)C

I AB 6= BA (generally)

I (AB)> = B > A>

I RULE: In any chain of matrix multiplications, the column

dimension of one matrix in the chain must match the row
dimension of the following matrix in the chain.

12
Square matrices and symmetric matrices

A is a square matrix if it has the same number of rows and columns

(i.e., A ∈ Rn×n ). If A = A> , then A is also a symmetric matrix.

I Special kinds

- diagonal matrix
- identity matrix
- positive-definite matrix: x > Ax > 0 for any x
- invertible matrix and its inverse: A is invertible if or non-singular if
there exists a matrix B such that AB = BA = I. If B exists, it is
unique and is called the inverse matrix of A, denoted A−1 .
- orthogonal matrix: A is an orthogonal matrix if A> = A−1 , which
entails AA> = A> A = I

13
Eigenvalues and eigenvectors

Let A be a n × n square matrix, if we can find a scalar λ and a unit

vector v , such that

Av = λv

then, λ is an eigenvalue of A, and v is its corresponding eigenvector.

I A = V ΛV −1 is the eigendecomposition of A, where V is the

n × n matrix whose i-th column is the i-th eigenvector of A, and
Λ is the diagonal matrix whose diagonal elements are the
corresponding eigenvalues.

I If A is positive-definite ⇒ all the eigenvalues are positive

I If A is symmetric ⇒ V is an orthogonal matrix: V −1 = V >

14
Probability

15
Why probability

To characterize the uncertainties of the world!

I Uncertain data

- Missing data
- Noisy data

I Uncertain knowledge

- Incomplete enumeration of conditions or effects

- Incomplete knowledge of causality in the domain
- Stochastic effects

Probability theory provides powerful tools for modeling and dealing

with uncertainty.

16
Interpretations of probability

The probability that a coin will land heads is 0.5

I Frequentist interpretation: to represent long run frequencies of

events.

- If we flip the coin many times, we expect it to land heads about

half the time.

I Bayesian interpretation: to quantify our uncertainty about

something – related to information rather than repeated trials.

- We believe the coin is equally likely to land heads or tails on the

next toss.

17
Random variables

The expression p(A) denotes the probability that the event A is true.

I 0 ≤ p(A) ≤ 1
I p(A) denotes the probability that the event A is false:
p(A) = 1 − p(A).
I The probability of a disjunction is given by:

p(A ∨ B) = p(A) + p(B) − p(A ∧ B)

18
Fundamental concepts

I A union of two events – the probability of A or B:

p(A ∨ B) = p(A) + p(B) − p(A ∧ B). If A and B are mutually
exclusive, p(A ∨ B) = p(A) + p(B)

I Joint probability – the probability of the joint event A and B:

p(A, B) = p(A ∧ B) = p(A|B)p(B), where p(A|B) is the
conditional probability of event A given B:
p(A, B)
p(A|B) =
p(B)
If A and B are independent to each other, we have
p(A|B) = P(A).

I Chain rule:
p(X1:N ) = p(X1 )p(X2 |X1 )p(X3 |X1 , X2 ), . . . , p(XN |X1:N−1 )

19
Bayes Rule
Bayes Rule

p(B|A) × p(A)
p(A|B) =
p(B)
p(evidence|hypothesis) × p(hypothesis)
p(hypothesis|evidence) =
p(evidence)
likelihood × prior
posterior =
marginal distribution

I The most important formula in probabilistic machine learning

I Allows us to reason from evidence to hypotheses

1 1
- Example: p(headache) = 10 , p(flu) = 40 ,
1
p(headache|flu) = 2 . Given the evidence of headache,
what is p(flu|headache)?
20
21
22
23
Probability distributions

I Discrete distributions

- binomial and Bernoulli distributions

- multinomial and multinoulli distributions
- Poisson distribution

I Continuous distributions

- Gaussian distribution
- Laplace distribution
- gamma distribution

24
Gaussian distribution

1-dimensional Gaussian distribution

1 1 2
N (µ, σ 2 ) = √ e− 2σ2 (x−µ)
2πσ 2

25
Gaussian distribution

n-dimensional (multivariate) Gaussian distribution

1 1 > −1
N (µ, Σ) = e− 2 (x−µ) Σ (x−µ)
(2π)n/2 |Σ|1/2

26
Vector Calculus & Optimization

27
Fundamental concepts

I derivative: the sensitivity to change of the function value (output

value) with respect to a change in its argument (input value).
f (x + a) − f (x)
f 0 (x) = lim
a→0 a
second derivative – f 00 (x): the derivative of f 0 (x)

I convex function:

∀x1 , x2 , ∀t ∈ [0, 1] : f (tx1 + (1 − t)x2 ) ≤ tf (x1 ) + (1 − t)f (x2 )

00
- f (x) ≥ 0 ⇔ f (x) is a convex function
- If f (x) is a convex function, f 0 (x0 ) = 0 ⇒ x0 is the global
minimum point (i.e., f (x0 ) ≤ f (x))
I chain rule: Let F = f (g(x)), then F 0 (x) = f 0 (g(x))g 0 (x)

28
Vector Calculus

I gradient: the derivative of a multi-variable function f with respect

to x = [x1 , . . . , xn ]>
 ∂f 
∂x1
∇f (x) =  ... 
 
∂f
∂xn
∂f
where ∂xi is the partial derivative of f with respect to xi .

I f (x) = x12 + 3x2 + x2 x3 :

∂f ∂f ∂f
= 2x1 , = 3 + x3 , = x2
∂x1 ∂x2 ∂x3
⇒ ∇f (x) = [2x1 , 3 + x3 , x2 ]>

I Let f (x) = a> x, f (x) = x > Ax, what is ∇f (x)?

29
I Hessian matrix : the second derivative of a multi-variable function f with
respect to x = [x1 , . . . , xn ]>
 2
∂2f ∂2f

∂ f
2 ∂x1 ∂x2
··· ∂x1 ∂xn
 ∂x21
∂2f ∂2f 

 ∂ f ···
 ∂x2 ∂x1 ∂x22 ∂x2 ∂xn 
H= .. .. ..

 .. 

 . . . . 

∂2f ∂2f ∂2f
∂xn ∂x1 ∂xn ∂x2
··· ∂xn2
2
f
where ∂x∂ ∂x is the mixed partial derivative of f . The order of
i j
differentiation does not matter (Schwarz’s theorem).

I f (x) = x12 + 3x2 + x2 x3 :

∂f ∂f ∂f
= 2x1 , = 3 + x3 , = x2
∂x1 ∂x2 ∂x3
∂f ∂f ∂f ∂f ∂f ∂f
= 2, = = 0, = = 0, =1
∂x12 ∂x22 ∂x32 ∂x1 ∂x2 ∂x1 ∂x3 ∂x2 ∂x3
 
2 0 0
⇒ H = 0 0 1 
0 1 0
30
Convex multi-variable function

∀x1 , x2 , ∀t ∈ [0, 1] : f (tx1 + (1 − t)x2 ) ≤ tf (x1 ) + (1 − t)f (x2 )

I H is positive-definite ⇔ f (x) is a convex function

I If f (x) is a convex function, ∇f (x0 ) = 0 ⇒ x0 is the global
minimum point (i.e., f (x0 ) ≤ f (x))

I f (x) = x12 + 3x2 + x2 x3 :

 
2 0 0
H = 0 0 1
0 1 0

The eigenvalues of H are -1, 1, 2 ⇒ f (x) is not convex

31
Function minimization

I Most machine learning problems can be formulated as a

function minimization problem

32
Function minimization

I Most machine learning problems can be formulated as a

function minimization problem

I Solve the equation:

∇f (x) = 0 (1)

Done!

32
Function minimization

I Most machine learning problems can be formulated as a

function minimization problem

I Solve the equation:

∇f (x) = 0 (1)

Done!

I Really?

32
Function minimization

I Most machine learning problems can be formulated as a

function minimization problem

I Solve the equation:

∇f (x) = 0 (1)

Done!

I Really?

- If f is not convex, we obtain a suboptimal solution.

32
Function minimization

I Most machine learning problems can be formulated as a

function minimization problem

I Solve the equation:

∇f (x) = 0 (1)

Done!

I Really?

- If f is not convex, we obtain a suboptimal solution.

- We don’t have an analytical solution for (1).
Example: f (x) = x12 + ex2 + sin(x3 + log(x1 )).

32
Function minimization

I Most machine learning problems can be formulated as a

function minimization problem

I Solve the equation:

∇f (x) = 0 (1)

Done!

I Really?

- If f is not convex, we obtain a suboptimal solution.

- We don’t have an analytical solution for (1).
Example: f (x) = x12 + ex2 + sin(x3 + log(x1 )).
I Gradient descent!

32
Gradient descent

An optimization algorithm used to minimize a function by iteratively

moving in the direction of steepest descent as defined by the negative
of the gradient.

https://www.kodefork.com/learn/machine-learning/linear-regression-with-one-variable/
33

Math Review For ML
No ratings yet
Math Review For ML
41 pages
AIMLB PGP 2025 Session 4
No ratings yet
AIMLB PGP 2025 Session 4
38 pages
Mathematics For AI
No ratings yet
Mathematics For AI
5 pages
Deep-Learning
No ratings yet
Deep-Learning
28 pages
1 2.-Maths ML
No ratings yet
1 2.-Maths ML
18 pages
Maths$Stats NOTES
No ratings yet
Maths$Stats NOTES
50 pages
Machine Learning Notation: 1 Numbers & Arrays 4 Functions
No ratings yet
Machine Learning Notation: 1 Numbers & Arrays 4 Functions
2 pages
Mml-Book Removed
No ratings yet
Mml-Book Removed
295 pages
DL Notes Unit 1
No ratings yet
DL Notes Unit 1
28 pages
Essential Mathematics For Machine Learning
No ratings yet
Essential Mathematics For Machine Learning
43 pages
Data - Science and - Artificial - Intelligence
No ratings yet
Data - Science and - Artificial - Intelligence
106 pages
1 & 2 Linear Algebra and Probability Distribution
No ratings yet
1 & 2 Linear Algebra and Probability Distribution
11 pages
Background Material Crib-Sheet: 1 Probability Theory
No ratings yet
Background Material Crib-Sheet: 1 Probability Theory
4 pages
Data Science
No ratings yet
Data Science
74 pages
DL (Unit I)
No ratings yet
DL (Unit I)
25 pages
Mml-Book (1) Removed
No ratings yet
Mml-Book (1) Removed
371 pages
Pattern Classification
No ratings yet
Pattern Classification
41 pages
Math for ML Enthusiasts
No ratings yet
Math for ML Enthusiasts
100 pages
Math for ML: Vectors & Probability
No ratings yet
Math for ML: Vectors & Probability
1 page
Lec1 Mathreview
No ratings yet
Lec1 Mathreview
61 pages
Compre
No ratings yet
Compre
46 pages
Module 2 ML Chapter2
No ratings yet
Module 2 ML Chapter2
64 pages
Lecture 3 Introduction To Linear Algebra (Part 2)
No ratings yet
Lecture 3 Introduction To Linear Algebra (Part 2)
57 pages
Essential Math for AI Beginners
No ratings yet
Essential Math for AI Beginners
12 pages
Mathematics For Machine Learning
No ratings yet
Mathematics For Machine Learning
134 pages
ML Interview Questions and Answers
100% (1)
ML Interview Questions and Answers
105 pages
Tungban Machine Learning Math Course
No ratings yet
Tungban Machine Learning Math Course
124 pages
ML1 Skript 2023
No ratings yet
ML1 Skript 2023
97 pages
Matrix Methods for Engineers
No ratings yet
Matrix Methods for Engineers
164 pages
AI Module4
No ratings yet
AI Module4
17 pages
Math For AI
No ratings yet
Math For AI
29 pages
Complete UNIT III DEEP LEARNING
No ratings yet
Complete UNIT III DEEP LEARNING
126 pages
Machine Learning: The Basics
No ratings yet
Machine Learning: The Basics
288 pages
Vector Calculus in Machine Learning
No ratings yet
Vector Calculus in Machine Learning
46 pages
DL Unit 1
No ratings yet
DL Unit 1
65 pages
CSE465 T2 Mathematics For DL
No ratings yet
CSE465 T2 Mathematics For DL
29 pages
Ad3501 DL Notes
No ratings yet
Ad3501 DL Notes
16 pages
DL Unit 2
No ratings yet
DL Unit 2
29 pages
Math Essentials for ML Engineers
No ratings yet
Math Essentials for ML Engineers
38 pages
LinearAlgebra Lect2 Karan
No ratings yet
LinearAlgebra Lect2 Karan
62 pages
LANA Module 3.2
No ratings yet
LANA Module 3.2
13 pages
Module 2 Notes Bcs602
No ratings yet
Module 2 Notes Bcs602
19 pages
FL LectureNotes
No ratings yet
FL LectureNotes
92 pages
Ad3501 DL Unit 1
No ratings yet
Ad3501 DL Unit 1
7 pages
Thomas Minka - Note On Matrix Calculus and Algebra
No ratings yet
Thomas Minka - Note On Matrix Calculus and Algebra
19 pages
Digital Image Processing: Mathematical Tools
No ratings yet
Digital Image Processing: Mathematical Tools
46 pages
Review
No ratings yet
Review
15 pages
MLF Combined
No ratings yet
MLF Combined
84 pages
Quantitative Method of Finance
No ratings yet
Quantitative Method of Finance
54 pages
Linear Algebra
No ratings yet
Linear Algebra
19 pages
L3-7 Mathematical Foundations
No ratings yet
L3-7 Mathematical Foundations
25 pages
2 Probability and Linear Algebra
No ratings yet
2 Probability and Linear Algebra
21 pages
Sinz, F. Et Al. - Essential Mathematics For Neuroscience
No ratings yet
Sinz, F. Et Al. - Essential Mathematics For Neuroscience
124 pages
Data Science Unit - 3 - 31.8.23
No ratings yet
Data Science Unit - 3 - 31.8.23
62 pages
AI Teacher Training - Machine Learning Curriculum
No ratings yet
AI Teacher Training - Machine Learning Curriculum
34 pages
SM (1e) PDF
No ratings yet
SM (1e) PDF
212 pages
SolutionManual Ch1 2
100% (1)
SolutionManual Ch1 2
14 pages
Sparse Arrays Lower and Upper Triangular Matrices
No ratings yet
Sparse Arrays Lower and Upper Triangular Matrices
7 pages
Intro To Matrices - A
No ratings yet
Intro To Matrices - A
19 pages
JNTUH Mathematics - I Important Questions
No ratings yet
JNTUH Mathematics - I Important Questions
11 pages
NCERT Solutions Class 12 Math Chapter 4 Determinants
No ratings yet
NCERT Solutions Class 12 Math Chapter 4 Determinants
42 pages
UNIT II Eigenvalues and Eigenvectors
100% (2)
UNIT II Eigenvalues and Eigenvectors
18 pages
1.1 Matrics
No ratings yet
1.1 Matrics
26 pages
STSA 3732: Tutorial 2 (MEMO) : Matrix C Matrix C Matrix C Matrix C Matrix C Matrix C Matrix C Matrix C
100% (1)
STSA 3732: Tutorial 2 (MEMO) : Matrix C Matrix C Matrix C Matrix C Matrix C Matrix C Matrix C Matrix C
6 pages
Commutation Relations For Toeplitz and Hankel Matrices
No ratings yet
Commutation Relations For Toeplitz and Hankel Matrices
19 pages
1.sample Paper - IITM-UoB
No ratings yet
1.sample Paper - IITM-UoB
7 pages
Sa Maths 1a Formule
No ratings yet
Sa Maths 1a Formule
27 pages
UEM - Problem Set - Matrix - 1
No ratings yet
UEM - Problem Set - Matrix - 1
2 pages
Matrix Rank: Definitions & Examples
0% (1)
Matrix Rank: Definitions & Examples
11 pages
C++ Matrix Class
No ratings yet
C++ Matrix Class
11 pages
Experiment 2
No ratings yet
Experiment 2
3 pages
Linear Algebra and Analytical Geometry - Farkaleet Series
No ratings yet
Linear Algebra and Analytical Geometry - Farkaleet Series
143 pages
Matrix Rank & Linear Systems
No ratings yet
Matrix Rank & Linear Systems
30 pages
Matrices L2 12th Elite - DPP
No ratings yet
Matrices L2 12th Elite - DPP
79 pages
Advanced Engineering Math
No ratings yet
Advanced Engineering Math
3 pages
Chapter 2 - Solving Linear Equations
No ratings yet
Chapter 2 - Solving Linear Equations
82 pages
Assignment 4 SVD AND KL TRANSFORM
No ratings yet
Assignment 4 SVD AND KL TRANSFORM
2 pages
M& C r22 Question Bank (2024-25)
No ratings yet
M& C r22 Question Bank (2024-25)
10 pages
An Easy New Approach For Matrix Decomposition: Mohammed Hassan Elzubair
No ratings yet
An Easy New Approach For Matrix Decomposition: Mohammed Hassan Elzubair
3 pages
MATLAB Sample Questions Guide
No ratings yet
MATLAB Sample Questions Guide
7 pages
DIP LAB Record III - CS
No ratings yet
DIP LAB Record III - CS
16 pages
Linear Algebra Concepts Guide
No ratings yet
Linear Algebra Concepts Guide
2 pages
GQB CS PDF
100% (8)
GQB CS PDF
426 pages
MATH 261 Under Revision Updated
No ratings yet
MATH 261 Under Revision Updated
43 pages
App Xii Det Mat FC 1 2
No ratings yet
App Xii Det Mat FC 1 2
12 pages
Computational Linear Algebra Exam
No ratings yet
Computational Linear Algebra Exam
13 pages
Determinants
No ratings yet
Determinants
26 pages

Lecture 2 - Math

Uploaded by

Lecture 2 - Math

Uploaded by

Artificial Intelligence II (CS4442 & CS9542)

A Brief Review of Mathematics for Machine Learning

I Vector Calculus & Optimization

I There are lots of easy-to-use machine learning packages out

I However, to get really useful results, you need good

- Choose the right algorithm(s) for the problem

Table: Table of Notations

I Operations on or between vectors and matrices.

Slide credit: Jeff Howbert

I Definition: an n-tuple of values (usually real numbers).

I Can be written in column form or row form (column form is

I Can think of a vector as: a point in space or a directed line

I Addition of two vectors

- a> b = ||a||||b|| cos(θ)

A vector can be regarded as special case of a matrix, where one of

I Matrix transpose (denoted > ): swap columns and rows. m × n

I Addition of two matrices

I Scalar multiplication of a matrix

the matrix product C = AB (denoted without multiplication dots) is

I (AB)> = B > A>

I RULE: In any chain of matrix multiplications, the column

A is a square matrix if it has the same number of rows and columns

Let A be a n × n square matrix, if we can find a scalar λ and a unit

then, λ is an eigenvalue of A, and v is its corresponding eigenvector.

I A = V ΛV −1 is the eigendecomposition of A, where V is the

I If A is positive-definite ⇒ all the eigenvalues are positive

I If A is symmetric ⇒ V is an orthogonal matrix: V −1 = V >

To characterize the uncertainties of the world!

- Incomplete enumeration of conditions or effects

Probability theory provides powerful tools for modeling and dealing

The probability that a coin will land heads is 0.5

I Frequentist interpretation: to represent long run frequencies of

- If we flip the coin many times, we expect it to land heads about

I Bayesian interpretation: to quantify our uncertainty about

- We believe the coin is equally likely to land heads or tails on the

p(A ∨ B) = p(A) + p(B) − p(A ∧ B)

I A union of two events – the probability of A or B:

I Joint probability – the probability of the joint event A and B:

I The most important formula in probabilistic machine learning

I Allows us to reason from evidence to hypotheses

- binomial and Bernoulli distributions

1-dimensional Gaussian distribution

n-dimensional (multivariate) Gaussian distribution

I derivative: the sensitivity to change of the function value (output

∀x1 , x2 , ∀t ∈ [0, 1] : f (tx1 + (1 − t)x2 ) ≤ tf (x1 ) + (1 − t)f (x2 )

I gradient: the derivative of a multi-variable function f with respect

I f (x) = x12 + 3x2 + x2 x3 :

I Let f (x) = a> x, f (x) = x > Ax, what is ∇f (x)?

I f (x) = x12 + 3x2 + x2 x3 :

∀x1 , x2 , ∀t ∈ [0, 1] : f (tx1 + (1 − t)x2 ) ≤ tf (x1 ) + (1 − t)f (x2 )

I H is positive-definite ⇔ f (x) is a convex function

I f (x) = x12 + 3x2 + x2 x3 :

The eigenvalues of H are -1, 1, 2 ⇒ f (x) is not convex

I Most machine learning problems can be formulated as a

I Most machine learning problems can be formulated as a

I Solve the equation:

I Most machine learning problems can be formulated as a

I Solve the equation:

I Most machine learning problems can be formulated as a

I Solve the equation:

- If f is not convex, we obtain a suboptimal solution.

I Most machine learning problems can be formulated as a

I Solve the equation:

- If f is not convex, we obtain a suboptimal solution.

I Most machine learning problems can be formulated as a

I Solve the equation:

- If f is not convex, we obtain a suboptimal solution.

An optimization algorithm used to minimize a function by iteratively

You might also like