CS.
479/679, Fall 2015, Representation Learning Lectu re 2 - 09/01/2015
Review: L in ear Algeb ra
Lec ture r: Raman Arora Scribe: Sheng Zhang
1 Review: Basic Analysis
1.1 Set
D efinition. A set is a collection of d istin ct ob jects. Th e ob jects th at b elon g to th e set are called th e
elements of set.
1.1.1 Numbers
Bou rkaki: b u ild all of math ematics from set th eory.
(1) N = {0, 1, 2, ....} : Natu ral nu mb ers
(2) Z = {0, 1, −1, 2, −2 , ....} : I ntegers
(3) Q = { ab | a, b ∈ Z, b = 0} : Ration als
(4) R = {dr · · · d1 d0 .d−1 · · · | di ∈ {0 , 1, ..., 9}}: Reals
√
Exercise: Why is 2 irration al?
√
Proof. Supp ose 2 is rational. Th at mean s it can b e written as th e ratio of two integers a and b
√ a
2 = (1)
b
wh ere we may assu me th at a and b have no common factors . S qu arin g in equ ation (1) on b oth sid es
gives
a2
2 = 2 (2)
b
wh ich imp lies
a2 = 2b 2 (3)
Thu s a is even . Th e on ly way th is can b e tru e is th at a itself is even . Bu t th en a2 is actu ally d ivisib le by 4.
2
Hen ce b2 an d th erefore b mu st b e even . √
S o a and b are b oth even wh ich is a contrad iction to th e assu mp tion
th at th ey h ave n o common factors. S o 2 is irration al.
1.2 Function
D efinition. Let S and T b e sets. A function or map f : S → T is given by associatin g each element s ∈ S
to an element f (s ) ∈ T . I n ord er to ch eck th at a fu n ction is well defined, on e mu st ch eck th at
(1) If s ∈ S th en f (s ) ∈ T .
(2) If s1 = s2 th en f (s 1 ) = f (s 2 ).
A fu n ction f : S → T map s every el ement of S to an unique el ement of T .
(Th e examp les in th e lectu re2 P.11/42: (b ) an d (c) are n ot a fu n ction .)
D efinition. A sequence is a fu n ction f : N → X . Write xn = f (n ) for all n ∈ N.
1
Figure 1 : Th e examp les in th e lectu re2
2 Topology
D efinition. Given a set X , a topology T on X is a collection of su b sets of X su ch th at
(1) ∅ ∈ T , X ∈ T
(2) arb itrary u n ion of elements of T is in T
(3) fi n ite intersection of elements of T is in T
D efinition. Topological space is a set X togeth er with a collection of su b sets of X satisfyin g th e ab ove
axioms.
D efinition. Th e elements of T are referred to as open sets .
D efinition. Th e comp lement set of a op en set is referred to as closed sets .
Examples:
X = {1 , 2, 3, 4} an d collection τ = {{∅}, {1, 2, 3, 4}} of on ly th e two su b sets of X requ ired by th e axioms
form a top ology.
X = {1, 2, 3, 4} an d collection τ = {{∅}, {2}, {1, 2} , {2 , 3}, {1, 2, 3}, {1, 2, 3, 4}} of six su b sets of X form
an oth er top ology.
Th in k of top ology as d efi n in g n eighb orh ood s N(x ) of x ∈ X [1]:
(1) If N is a n eighb orh ood of x (i.e., N ∈ N(x) ), th en x ∈ N . I n oth er word s, each p oint b elon gs to every
on e of its n eighb orh ood s.
(2) If N is a su b set of X an d contain s a n eighb orh ood of x, th en N is a n eighb orh ood of x, i.e., every
su p erset of a n eighb orh ood of a p oint x in X is again a n eighb orh ood of x.
(3) Th e intersection of two n eighb orh ood s of x is a n eighb orh ood of x.
(4) Any n eighb orh ood N of x contain s a n eighb orh ood M of x su ch th at N is a n eighb orh ood of each p oint
of M .
D efinition. Let S b e a su b set of a top ological sp ace X . A p oint x in X is a limit point of S i f every
n eighb orh ood of x contain s at least on e p oint of S d ifferent from x itself.
2
2.1 Metric Spaces
D efinition. Metric Spaces : A metric sp ace is a set X with a d istan ce fu n ction d : X × X → R su ch th at
(1) non- negativity: ∀x , y ∈ X, d(x , y) ≥ 0
(2) definiteness: ∀x , y ∈ X, d(x , y) = 0 ⇔ x = y
(3) symme try: ∀x , y ∈ X, d(x , y) = d(y , x)
(4) trian g l e in eq ual ity: ∀x , y ∈ X, x + y ≤ x + y
D efinition. Open Ball: Let M = (X , d) b e a metric sp ace, x ∈ X , and ∈ R>0 , th e op en -b all of x in M
is d efi n ed as: B (x ) = {y ∈ X | d(x , y) ≤ }.
2.2 Continuous Functions
D efinition. Continous Functions: Let X and Y b e top ological sp aces. A fu n ction f : X → Y is continu ou s
if it satisfi es th e con d ition th at if B is an op en su b set of Y th en f −1 (B
) is an op en su b set of X . (some typ o
existin g in th e slice)
2.3 Compactness
D efinition. Compactness: A top ological sp ace is comp act if every op en cover h as a fi n ite su b co ver.
Exp licitly, th is mean s th at for every arb itrary collection {Uα }α ∈I of op en su b sets of X su ch th at X = Uα ,
U . (some typ o existin g in th e slice)
α ∈I
th ere is a fi n ite su b set of J of A su ch th at X = i
i∈ J
For metric sp aces, closed an d b ou n d ed n ess imp lies comp actn ess.
3 Vector Spaces
A vector space over reals is a n on -emp ty set X with p rop erties b elow, ad d ition :
(1) closu re: Performan ce of ad d ition on memb ers of th e set always p ro d u ces a memb er of th e same set.
(2) commu tative law: Ch an gin g th e ord er of th e op eran d s d oes n ot ch an ge th e resu lt.
(3) associative law: With in an exp ression contain in g two or more occu rren ces in a row of th e same associa-
tive op erator, th e ord er in wh ich th e op eration s are p erformed d oes n ot matter as lon g as th e sequ en ce
of th e op eran d s is n ot ch an ged .
(4) ad d itive id entity: Ad d a nu mb er by 0, th e resu lt is th at origin al nu mb er.
(5) ad d itive inverse: Wh en a nu mb er is ad d ed by its op p osite, th e resu lt is 0.
S calar mu ltip lication :
(1) closu re: Performan ce of mu ltip lication on memb ers of th e set always p rod u ces a memb er of th e same
set.
(2) associative law: With in an exp ression contain in g two or more occu rren ces in a row of th e same associa-
tive op erator, th e ord er in wh ich th e op eration s are p erformed d oes n ot matter as lon g as th e sequ en ce
of th e op eran d s is n ot ch an ged
(3) d istrib u tive law I : Given a set S , two b in ary op erators: ad d ition + an d mu ltip lication × on S , and any
el ements x, y, and z of S , x × (y + z) = x × y + x × z.
3
(4) d istrib u tive law I I : Given a set S , two b in ary op erators: ad d ition + an d mu ltip lication × on S , and
any elements x, y, and z of S , (x + y ) × z = x × z + y × z.
(5) mu ltip licative id entity: Mu ltip ly a nu mb er by 1, th e resu lt, or p rod u ct, is th at origin al nu mb er.
3.1 Normed Vector Spaces
D efinition. Normed Vector Spaces: Let X b e a vector sp ace over reals. A n orm on X is map p in g
· : X → R th at satisfi es
(1) p ositive d efi n iten ess: ∀x ∈ X, x ≥ 0 wh ere th e equ ality h old s on ly wh en x = 0
(2) h omogen eity: ∀a ∈ R, x ∈ X, ax = |a |x
(3) trian gle in equ ality: ∀x , y ∈ X, x + y ≤ x + y
D efinition. p -norm: For p ≥ 1, th e p n orm on X = Rd is d efi n ed as
x p = (
d |x |p ) , ∀x ∈ Rd
1
p
i
i= 1
3.2 Equivalence of Norms
Two n orms · and · are said to b e equ ivalent iff th ere exists α, β > 0 su ch th at for all x ∈ X ,
αx ≤ x ≤ βx .
Gen eral in equ alities relatin g 1 , 2 , and∞ n orms:
√
x 2 ≤ x 1 ≤ dx 2
Proof.
For th e left p art,
x 2 ≤ x 1 ⇔ x 22 ≤ x 21 ⇔
d x2 ≤ ( d |x |) 2
i
i= 1 i i = 1
d d |x x | ⇔ |x x | ≥ 0
⇔ x2i ≤ x2i + 2 i j i j
i= 1 i= 1 1≤ i <j ≤d 1≤ i <j ≤d
Obviou sly,
|x x | ≥ 0. Th at mean s x ≤ x .
i j 2 1
i <j ≤d
1≤
For th e right p art,
√ d d
x 1 ≤ dx 2 ⇔ x 21 ≤ dx 22 ⇔ ( |x i |) 2 ≤ d( x2i )
1
i= i= 1
d
⇔ x2i + 2
|x ||x | ≤ d( d x2 )
i j
i = 1 1 ≤i <j ≤d 1 i
i=
d
⇔ 0 ≤ (d − 1)( |x i |2 ) − 2 |x i ||x j |
i= 1 1 ≤i <j ≤d
⇔ 0 ≤ (| x i | − |x j |) 2
1≤ i <j ≤d
Obviou sly,
(| x | − |x |) 2 ≥ 0. Th at mean s x ≤ √dx .
i j 1 2
1≤ i <j ≤d
4
√
x ∞ ≤ x 2 ≤ dx ∞
Proof.
For th e left p art,
x2 ≥ ( max {|x| i })2 = x∞
1 ≤ i≤ d
For right part,
xi ≤ max |x | i ⇒ ( x2 )2 =
d x2 ≤ d ( max |x | )2 = d( x )2
1≤ i≤ d
i= 1
i
i=1 d i
1 ≤ i ≤
∞
x ∞ ≤ x 1 ≤ dx ∞
Proof.
Th e left p art is obviou sly tru e.
For th e right p art, for any x, y ∈ Rd , we d efi n e x ∗ y = (x i yi )i = 1,...,d ∈ Rd . Usin g Ho¨ld er’s in equ ality, wh ich
asserts th at for 1 ≤ p ≤ q satisfyin g 1q + 1r = p1 ,
x ∗ yp ≤ x q yr
By takin g y = (1, ..., 1), we h ave
1 1 1
x p ≤ d r x q ⇔ x p ≤ d p − q x q
Let p = 1, q = ∞, we h ave x 1 ≤ dx ∞ .
3.3 Inner Product
D efinition. Inner Product: An in n er-p rod u ct on X is a map ·, · : X × X → F, su ch th at
(1) con ju gate symmetry: x , y = y , x .
(2) h omogen eity: ∀x , y, z ∈ X , and ∀a , b ∈ Ra x + by , z = ax , z + by , z.
(3) p ositive d efi n iten ess: x , x ≥ 0 wh ere th e equ ality h old s on ly wh en x = 0.
In n er p rod u ct in d u ces a n orm on X : x := x , x 1 / 2 .
x, y are orthogonal if x , y = 0 .
x, y are orthonormal if th ey are orth ogon al an d x = y = 1.
D efinition. Cauchy Sequence: Given a metric sp ace (X, d), a sequ en ce is Cau chy, if for every p ositive
real nu mb er > 0 th ere is a p ositive integer N su ch th at for all p ositive integers m, n > N , th e d istan ce
d(x m
, xn ) < .
D efinition. Hilbert Space: A vector sp ace with an in n er p rod u ct is called an in n er-p rod u ct sp ace. A
Hilb ert sp ace is a real or comp lex in n er p rod u ct sp ace th at is also a complete metric sp ace with resp ect
to th e d istan ce fu n ction in d u ced by th e in n er p rod u ct (A comp lete metric sp ace is a metric sp ace in wh ich
every Cau chy sequ en ce is convergent.)
5
4 Subspace
D efinition. Vector Subpace: A su b sp ace of X is a su b set W ⊂ X th at forms a vector sp ace.
D efinition. Span: Given a su b set U ⊆ X , th e sp an of U is d efi n ed to b e th e set of all fi n ite lin ear
comb in ation s of elements of U :
span(U
) = {
n c x |n ∈ N, x ∈ X, c ∈ R}
k k k k
k =1
4.1 Basis
A su b set U ⊆ X , is lin early in d ep en d ent if wh en ever x1 , ..., xn is a fi n ite collection of vectors form U and
c1 , ..., cn ∈ R,
n c x = 0 implies c = 0, for all k
k k k
k =1
U ⊆ W is a b asis if W =span(U
), ui is lin early in d ep en d ent.
4.2 Orthogonal Basis
I s ran ge(U ) ⊥ null(U
) ? Yes
). Accord in g to th e d efi n ition , ∃y ∈ Rk
Proof. With ou t loss of gen erality, su p p ose x1 ∈ ran ge(U ), x2 ∈ null(U
s.t. x1 = U y an d we h ave U T x2 = 0, th en
xT1 x2 = (y T U T )x 2 = y T (U
T x2 ) = 0
wh ich mean s x1 ⊥ x2 for ∀x ∈ ran ge(U ) and ∀x 2 ∈ null(U
) i.e. ran ge(U
) ⊥ null(U
).
5 Matrices
Rank of a matrix X ∈ Rk×d is th e size of th e largest set of lin early in d ep en d ent colu mn s or rows.
Prop erties:
(1) Th e ran k of an m × n matrix is a n on n egative integer an d can n ot b e greater th an eith er m or n.
(2) I f A is a squ are matrix (i.e., m = n), A is invertib le if an d on ly if A h as ran k n (i.e., A h as fu ll ran k).
(3) Assu me A is any m-by-n matrix an d B is any n -by-k matrix, th en
ran k(A
B ) ≤ min (ran k(A), ran k(B
))
(4) If B h as fu ll row ran k, th en
ran k(A
B ) = ran k(A
)
Trace of a squ are matrix M ∈ Rd×d is th e su m of th e d iagon al elements: Mi i .
d
1
i=
Prop erties:
(1) tr(A + B) = tr(A ) + tr(B
)
(2) tr(c A) = ctr(A )
(3) tr(A ) = tr(A T )
(4) tr(A B C ) = tr(C AB ) = tr(B C A)
6
5.1 Matrix Norms
Operator Norm :
X p = sup Xwwp p
x=
0
5.2 Singular value decomposition1
S VD of a m × n real matrix is given by:
X = UΣV
wh ere U ∈
Rm ×m
an d V ∈ R n × n are orth ogon al matrices, an d Σ is an m × n rectan gu lar d iagon al matrix
with n on -n egative real sin gu lar valu es on th e d iagon al. Th e ith colu mn s in U or V are called th e left-sin gu lar
an d right sin gu lar vectors for th e corresp on d in g sin gu lar valu e σi (i th element in diag (Σ)).
I f th ere exist two set of vector u, v (wh ich are lin early in d ep en d ent) an d sin gu lar valu e σ wh ich satisfi es
X v = σ u , X T u = σ v
th en σ is a d egen erate sin gu lar valu e. I f X h as d egen erate sin gu lar valu es, th en its sin gu lar valu e d ecomp o-
sition is n ot u n iqu e; oth erwise if all sin gu lar valu es of X are n on -d egen erate an d n on -zero, th en its sin gu lar
valu e d ecomp osition is u n iqu e u p to mu ltip lication of a p air of corresp on d in g colu mn s of U an d V by (-1).
S in ce U an d V are orth ogon al matrices, we h ave
X X T = U ΣV T V ΣU T = U Σ2 U T , X T X = V Σ2 V T
S o given a S VD d ecomp osition of X, we cou ld ob tain th e eigenvalu e d ecomp osition s of matrix X X T and
X T X . Th e colu mn s of U are eigenvectors of X X T ; th e colu mn s of V are eigenvectors of X T X .
Wh en X is a p ositive semi-d efi n ite matrix, th e eigenvalu e d ecomp osition is also a sin gu lar valu e d ecomp osi-
tion.
With sin gu lar valu e d ecomp osition , we can recon stru ct X as follows:
r
X = uk σk vkT
k =1
wh ere r is th e ran k of X, an d uk , vk are th e k th colu mn of U an d V.
References
[1] Wikip ed ia contrib u tors.“Top ological sp ace.” Wikip ed ia, Th e Free En cyclop ed ia. Wikip ed ia, Th e Free
En cyclop ed ia, 4 S ep . 2015. Web . 10 S ep . 2015.
[2] Ben gio, Yoshu a an d Cou rville, Aaron an d Vin cent, Pascal, “Rep resentation learn in g: A review an d n ew
p ersp ectives”,C
amb ridg e Un iv e rsity Pre ss, 2006.
1 p artially scrib ed by Yuting Xu