Mathematical methods for Image Processing
François Malgouyres
Institut de Mathématiques de Toulouse, France
invitation by
Jidesh P., NITK Surathkal
funding
Global Initiative on Academic Network
Oct. 23–27
François Malgouyres (IMT) Mathematics for Image Processing Oct. 23–27 1 / 30
Plan
1 Sparsity prior : ℓ0 case
2 Sparsity prior : ℓ1 case
François Malgouyres (IMT) Mathematics for Image Processing Oct. 23–27 2 / 30
Dimensionality reduction
RN (N large)
u
François Malgouyres (IMT) Mathematics for Image Processing Oct. 23–27 3 / 30
Dimensionality reduction for approximation
RN (N large)
u
small
VK vector space (v.s.) of dimension K ≪ N
François Malgouyres (IMT) Mathematics for Image Processing Oct. 23–27 4 / 30
Dimensionality reduction for denoising
u+b
√
Nσ RN (N large)
u
small
VK v.s. of dimension K ≪ N
François Malgouyres (IMT) Mathematics for Image Processing Oct. 23–27 5 / 30
Dimensionality reduction for denoising
u+b
√
Nσ RN (N large)
u
small √
Kσ
VK v.s. of dimension K ≪ N
François Malgouyres (IMT) Mathematics for Image Processing Oct. 23–27 6 / 30
Dimensionality reduction
Applications:
Approximation
Denoising, compressed sensing/inverse problems (later in the talk)
Feature extraction
Questions:
Can we do it for images?
How to construct VK in a stable and realistic manner (from a numerical
standpoint)?
François Malgouyres (IMT) Mathematics for Image Processing Oct. 23–27 7 / 30
Images are smooth
Let B be an orthonormal basis (e.g. Fourier, ondelettes...)
We assume B is such that for any image u and x such that u = Bx.
kxkp is small for p small.
q = 10.00
q = 1.00
1 q = 0.20 1 1
0.5 0.5 0.5
0 0 0
−0.5 −0.5 −0.5
−1 −1 −1
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
p = 0.2 p=1 p = 10
Assymptotically : Assumption on Sobolev smoothness for Fourier, Besov
smoothness for Wavelets.
François Malgouyres (IMT) Mathematics for Image Processing Oct. 23–27 8 / 30
Images are smooth
We denote S the index of the K largest entries in x and
xk , if k ∈ S,
x̃k =
0 , otherwise.
x̃
B x̃ ∼ u B x̃ ∈ VK = Vect{B k , k ∈ S}
François Malgouyres (IMT) Mathematics for Image Processing Oct. 23–27 9 / 30
Non-linear approximation
Theorem
We have for 0 ≤ p ≤ 2,
kxkp
ku − B x̃k2 = kx − x̃k2 ≤ ,
Kr
1
for r = p − 21 .
Examples:
If p = 0.01, r = 99.5
If p = 0.1, r = 9.5
If p = 1, r = 0.5
François Malgouyres (IMT) Mathematics for Image Processing Oct. 23–27 10 / 30
Dimensionality reduction of smooth images
u+b
√
Nσ RN (N large)
u
kxkp
Kr
√
Kσ
VK = Vect{B k , k ∈ S}
François Malgouyres (IMT) Mathematics for Image Processing Oct. 23–27 11 / 30
Dimensionality reduction: Manipulating vector spaces
Implicit representation (Often called Analysis, co-sparsity prior): We have
an analysis linear operator
A : RN −→ RP
where typically P ≫ N and build a set of equation S ⊂ {1, . . . , P} such that
VK = {w ∈ RN |∀i ∈ S, (Aw )i = 0}
(Ex: A computes finite differences as in Total Variation prior)
Explicit representation (Often called Synthesis, sparsity prior): We have a
synthesis linear operator
D : RP −→ RN
where typically P ≫ N and build support S ⊂ {1, . . . , P} such that
VK = {Dx|x ∈ RP and Supp (x) ⊂ S}
(Ex: D is a wavelet transform)
François Malgouyres (IMT) Mathematics for Image Processing Oct. 23–27 12 / 30
Redundancy in the Synthesis prior
For P > N and kxk0 = |{i, xi 6= 0}|
minx∈RP kxk0 minx∈RP kDx − uk2
(P0 ) or (P0′ )
s.t.kDx − uk2 ≤ τ s.t. kxk0 ≤ K
Dx ∼ u
D
François Malgouyres (IMT) Mathematics for Image Processing Oct. 23–27 13 / 30
Redundancy: the picture in RN
Drawing for N = 2, K = 1
replacements
u 5
minx∈RP kxk0
D s.t. kDx − uk ≤ τ
D4
D3
D1 D2
Number of v.s. of dimension K :
+1)
CPK = P(P−1)...(P−K
K!
François Malgouyres (IMT) Mathematics for Image Processing Oct. 23–27 14 / 30
Redundancy: the picture in RP
x3
Drawing for P = 3, K = 1, τ = 0
Hyp.: There exists x0 s.t. kx0 k0 ≤ K and u = Dx0
x ∗ ∈ Argminx∈RP kxk0
s.t. kDx − uk = 0
x2
x0
x1
x0 + Ker (D)
∗
x ∈ (x0 + Ker (D)) ∩ ∪|S|≤K VS
But when dim(Ker (D)) = P − N and K ≪ N, the intersection of
x0 + Ker (D)
and any finite collection of vector space of dimension K
is unlikely in RP
François Malgouyres (IMT) Mathematics for Image Processing Oct. 23–27 15 / 30
Compressed sensing, noiseless ℓ0 case
Theorem
If
There exists x0 such that kx0 k0 ≤ K and u = Dx0
D is such that, for any x 6= 0 such that kxk0 ≤ 2K , Dx 6= 0
then x0 is the unique solution of (P0 ), when τ = 0.
Proof: Otherwise, there exists x with kxk0 ≤ kx0 k0 ≤ K and such that
Dx = Dx0 .
Therefore kx − x0 k0 ≤ 2K and D(x − x0 ) = 0. Contradiction.
Thanks to the sparsity hypothesis, we can invert D !
(But as such it might be unstable and (P0 ) is NP-Hard.)
François Malgouyres (IMT) Mathematics for Image Processing Oct. 23–27 16 / 30
Compressed sensing, noisy ℓ0 case
We assume the columns D are of norm 1.
Definition (Restricted Isometry Property)
We say that D satisfies the ”K -Restricted Isometry Property” (K -RIP)), where
K ∈ {1, . . . , N}, if there exists δK ∈ [0, 1) such that
(1 − δK )kxk2 ≤ kDK xk2 ≤ (1 + δK )kxk2 , ∀x ∈ RK
whatever the matrix DK extracted from D, of size N × K .
We can equivalently say
(1 − δK )kxk2 ≤ kDxk2 ≤ (1 + δK )kxk2 , ∀x ∈ RN s.t. kxk0 ≤ K
(Geometrically, in the parameter space, the (RIP) forces the singular vector of D
corresponding to small singular values to be almost orthogonal to all K -sparse
vectors.)
François Malgouyres (IMT) Mathematics for Image Processing Oct. 23–27 17 / 30
Compressed sensing, noisy ℓ0 case
Remind
x ∗ ∈ Argmin kxk0
(P0 ) :
x s. t. kDx − uk ≤ τ.
Theorem
Assume there exists K ≤ N and x0 such that kx0 k0 ≤ K and if we consider the
data
u = Dx0 + b.
If D satisfies the 2K -RIP with constant δ2K , then
2
kx0 − x ∗ k ≤ √ τ
1 − δ2K
where x ∗ is any solution of (P0 ), with kbk ≤ τ .
(b contains some approximation error and some noise)
François Malgouyres (IMT) Mathematics for Image Processing Oct. 23–27 18 / 30
Compressed sensing, noisy ℓ0 case, Thm proof
Notice first that since kbk ≤ τ , x0 satisfies the constraint kDx0 − uk ≤ τ .
Therefore
kx ∗ k0 ≤ kx0 k0 ≤ K
and
kx0 − x ∗ k0 ≤ 2K .
Therefore
(1 − δ2K )kx0 − x ∗ k2 ≤ kD(x0 − x ∗ )k2
= k(Dx0 − u) − (Dx ∗ − u)k2
≤ (kDx0 − uk + kDx ∗ − uk)2
≤ (2τ )2
Rq : The Upper-RIP (kDxk2 ≤ (1 + δK )kxk2 ) is not used.
François Malgouyres (IMT) Mathematics for Image Processing Oct. 23–27 19 / 30
ℓ0 minimization: A NP-Hard problem
Theorem
All the variants
x ∗ ∈ Argmin kxk0
x s. t. kDx − uk ≤ τ.
x ∗ ∈ Argmin kDx − uk2
x s. t. kxk0 ≤ K .
x ∗ ∈ Argmin kxk0 + λkDx − uk2
are NP-Hard problems.
(See Davies-Mallat-Avellaneda, ”Adaptive Greedy Approximation”, Constructive
approximation, 13:57-98, 1997.)
François Malgouyres (IMT) Mathematics for Image Processing Oct. 23–27 20 / 30
ℓ0 minimization: the proximal gradient algorithm
Theorem
For any x ∈ RP and any t ≥ 0
( √
xi , if |xi | ≥ √2t
proxtk.k0 (x)i =
0 , otherwise
It is a hard-thresholding.
Proof: By definition
t
kx − x ′ k22 + kx ′ k0
proxtk.k0 (x) = Argminx ′ ∈RP
2
This problem is separable and we can prove that, for all i = 1..P,
t
proxtk.k0 (x)i = Argminy∈R (xi − y )2 + 1y6=0 (y ).
2
We have
t 1 , if y = xi 6= 0
(xi − y )2 + 1y6=0 (y ) = t 2
2 x
2 i , if y = 0
Choosing the smallest objective value leads to the result.
François Malgouyres (IMT) Mathematics for Image Processing Oct. 23–27 21 / 30
ℓ0 minimization: the proximal gradient algorithm
Theorem
For any starting point and any stepsize 0 ≤ t < L1 , where L is the Lipschitz
constant of the gradient
x 7−→ 2λD ∗ (Dx − u)
(i.e. L = 2λσmax (D)2 where σmax (D) is the largest singular value of D), the
proximal point algorithm (also called ”Iterative Hard thresholding”)
x k+1 = proxtk.k0 (x k − t 2λD ∗ (Dx k − u))
converges to a critical point of
kxk0 + λkDx − uk2
It is a straightforward consequence of Bolte-Sabach-Teboulle, ”Proximal
alternating linearized minimization for nonconvex and nonsmooth problems”,
Math. Prog. serie A, 2013.
François Malgouyres (IMT) Mathematics for Image Processing Oct. 23–27 22 / 30
Application of compressed sensing: Inpainting
Top: Image with missing pixels; Bottom: restored and ideal image
François Malgouyres (IMT) Mathematics for Image Processing Oct. 23–27 23 / 30
Application of compressed sensing: Zoom
Un-zoomed; zoomed (x4) by two methods
François Malgouyres (IMT) Mathematics for Image Processing Oct. 23–27 24 / 30
Application of compressed sensing: Morphological
component analysis
Figure: Initial image, wavelet part, curvelet part (Courtasy of J.L. Starck)
François Malgouyres (IMT) Mathematics for Image Processing Oct. 23–27 25 / 30
Plan
1 Sparsity prior : ℓ0 case
2 Sparsity prior : ℓ1 case
François Malgouyres (IMT) Mathematics for Image Processing Oct. 23–27 26 / 30
ℓ1 minimization
We replace the nonconvex, not continuous, NP-hard problem by a convex,
non-differentiable problem
∗
x ∈ Argmin kxk1
(P1 ) :
x s. t. kDx − uk ≤ τ.
where τ ≥ 0 and for the ℓ1 norm defined by
P
X
kxk1 = |xi |.
i =1
Or a variant using a Lagrangian
x ∗ ∈ Argmin kxk1 + λkDx − uk2
for λ ≥ 0.
(They describe the same solution path, when λ and τ vary.)
François Malgouyres (IMT) Mathematics for Image Processing Oct. 23–27 27 / 30
ℓ1 minimization: Proximal gradient algorithm
The objective function
E(x) = λkDx − uk2 + kxk1
has the form E + R
Where E : W −→ R is convexe, differentiable, with a Lipschitz gradient1 of
constant L > 0
Where R is lower semi-continuous, proper, convex and coercive.
Beside the fact that E might not be coercive (which turn out not to be an issue),
this permit to guarantee that the iterates of the proximal gradient algorithm
converge (See lecture on ”non-smooth optimization”).
∀w , w ′ ∈ W , k∇E (w ′ ) − ∇E (w )k ≤ Lkw ′ − w k
François Malgouyres (IMT) Mathematics for Image Processing Oct. 23–27 28 / 30
ℓ1 minimization: Proximal gradient algorithm
Theorem (Iterative Soft Thresholding convergence)
The iterates
x k+1 = proxtk.k1 (x k − t 2λD ∗ (Dx k − u))
1
converge, for any t < L , where L is the Lipschitz constant of the gradient
x 7−→ 2λD ∗ (Dx − u)
(i.e. L = 2λσmax (D)2 where σmax (D) is the largest singular value of D).
Moreover,(E(x k ))k∈N is non-increasing and, for any minimizer w ∗ of E,
L 0
E(x k ) − E(x ∗ ) ≤ kx − x ∗ k2 .
2k
We remind that
1
, if xi′ > L1 .
′
xi − L
proxLk.k1 (x ′ )i = 0 , if − L1 ≤ xi′ ≤ L1 ,
1
, if xi′ < − L1 ,
′
xi + L
François Malgouyres (IMT) Mathematics for Image Processing Oct. 23–27 28 / 30
ℓ1 minimization
We remind
x ∗ ∈ Argmin kxk1
(P1 ) :
x s. t. kDx − uk ≤ τ.
Theorem (Compressed sensing: ℓ1 case)
Let K ≤ N and x0 be such that kx0 k0 ≤ K . We consider the datum
u = Dx0 + b.
If D satisfies the 4K -RIP and the constants δ3K and δ4K are such that
δ3K + 3δ4K < 2, then
kx0 − x ∗ k ≤ CK τ
where x ∗ is the solution of (P1 ), any τ such that kbk ≤ τ and
4
CK = √ √ .
3 − 3δ4K − 1 + δ3K
(b contains some approximation error and some noise)
François Malgouyres (IMT) Mathematics for Image Processing Oct. 23–27 29 / 30
Compressed sensing ℓ1 : Geometric intuition
Lemma
We have
kD(x ∗ − x0 )k ≤ 2τ
kx ∗ k1 ≤ kx0 k1
Proof:
kD(x ∗ − x0 )k = k(Dx ∗ − u) − (Dx0 − u)k
≤ kDx ∗ − uk + kDx0 − uk
≤ 2τ
The second inequality is a straightforward consequence of kbk ≤ τ .
The rest of the proof quantifies and argues that the ℓ1 ball is narrow. (See
Candes-Romberg-Tao, ”Stable Signal recovery from incomplete and inaccurate
measurements”, Comm. Pure Appl. Math. 2006. )
François Malgouyres (IMT) Mathematics for Image Processing Oct. 23–27 30 / 30