Fundamentals of Neural Network Image Restoration
Fundamentals of Neural Network Image Restoration
Fundamentals of Neural
Network Image Restoration
N
M
g(x, y) = f (α, β)h(x, y; α, β) + n(x, y) (2.1)
α β
where f (x, y) and g(x, y) are the original and degraded images, respectively, and
n(x, y) is the additive noise component of the degraded image. If h(x, y; α, β) is
a linear function then (2.1) may be restated by lexicographically ordering g(x, y),
f (x, y) and n(x, y) into column vectors of size NM. To lexicographically order
an image, we simply scan each pixel in the image row by row and stack them
one after another to form a single column vector. Alternately, we may scan the
image column by column to form the vector. For example, assume the image
f (x, y) looks like:
11 12 13 14
21 22 23 24
f (x, y) =
31 32 33 34
41 42 43 44
After lexicographic ordering the following column vector results:
f = [11 12 13 14 21 22 23 24 31 32 33 34 41 42 43 44]T
If we are consistent and order g(x, y), f (x, y) and n(x, y) and in the same
way, we may restate (2.1) as a matrix operation [4, 21]:
g = Hf + n (2.2)
where g and f are the lexicographically organized degraded and original image
vectors, n is the additive noise component and H is a matrix operator whose
elements are an arrangement of the elements of h(x, y; α, β) such that the matrix
multiplication of f with H performs the same operation as convolving f (x, y)
with h(x, y; α, β). In general, H may take any form. However, if h(x, y; α, β) is
spatially invariant with P min(N, M ) then h(x, y; α, β) becomes h(x−α, y−β)
in (2.1) and H takes the form of a block-Toeplitz matrix.
A Toeplitz matrix [2] is a matrix where every element lying on the same
diagonal line has the same value. Here is an example of a Toeplitz matrix:
1 2 3 4 5
2 1 2 3 4
3 2 1 2 3
4 3 2 1 2
5 4 3 2 1
1 1
L L L L
E= (gp − hpi fˆi )2 + λ ( dpi fˆi )2
2 p=1 i=1
2 p=1 i=1
1 1
L L L L L L
= (gp − hpi fˆi )(gp − hpj fˆj ) + λ ( dpi fˆi )( dpj fˆj )
2 p=1 i=1 j=1
2 p=1 i=1 j=1
1
L L L L
= ((gp )2 − 2gp hpi fˆi + hpi fˆi hpj fˆj )
2 p=1 i=1 i=1 j=1
1
L L L
+ λ ( dpi fˆi )( dpj fˆj )
2 p=1 i=1 j=1
1 L
1
L L L L L
= (gp )2 − gp hpi fˆi + hpi fˆi hpj fˆj
2 p=1 p=1 i=1
2 p=1 i=1 j=1
1
L L L
+ λ dpi fˆi dpj fˆj
2 p=1 i=1 j=1
1 1
L L L L L L
= hpj fˆj hpi fˆi + λ dpj fˆj dpi fˆi
2 p=1 i=1 j=1 2 p=1 i=1 j=1
L
L
1
L
− gp hpi fˆi + (gp )2
p=1 i=1
2 p=1
Hence
1
L
1
L L L L L L
E= hpj hpi + λ dpj dpi fˆi fˆj − gp hpi fˆi + (gp )2
2 i=1 j=1 p=1 p=1 i=1 p=1
2 p=1
(2.5)
Expanding (2.4) we get:
1
L L L
E=− wij fˆi fˆj − bi fˆi + c (2.6)
2 i=1 j=1 i=1
By equating the terms in equations (2.5) and (2.6) we find that the neural
network model can be matched to the constrained least square error cost function
by ignoring the constant, c, and setting:
L
L
wij = − hpi hpj − λ dpi dpj (2.7)
p=1 p=1
L
bi = gp hpi (2.8)
p=1
where wij is the interconnection strength between pixels i and j, and bi is the
bias input to neuron (pixel) i. In addition, hij is the (i, j)th element of matrix
H from equation (2.3) and dij is the (i, j)th element of matrix D.
Now let’s look at some neural networks in the literature to solve this problem.
S
fˆi = vi,k (2.9)
k=0
where vi,k is the state of the kth neuron of the ith pixel. Each neuron is visited
sequentially and has its input calculated according to:
L
ui = b i + wij fˆj (2.10)
j=1
where ui is the input to neuron i, and fˆi is the state of the jth neuron. Based
on ui , the neuron’s state is updated according to the following rule:
∆fˆi = G(ui )
where
1, u>0
G(u) = 0, u=0 (2.11)
−1, u < 0
The change in energy resulting from a change in neuron state of ∆fˆi is given
by:
1
∆E = − wii (∆fˆi )2 − ui ∆fˆi (2.12)
2
If ∆E < 0, then the neuron’s state is updated. This algorithm may be sum-
marized as:
repeat
{
For i = 1, . . . , L do
{
For k = 0, . . . , S do
{
L
ui = bi + j=1 wij fˆj
∆fˆi = G(ui )
1, u>0
where G(u) = 0, u=0
−1, u < 0
∆E = − 21 wii (∆fˆi )2 − ui ∆fˆi
If ∆E < 0, then vi,k = vi,k + ∆fˆi
S
fˆi = k=0 vi,k
}
}
t=t+1
}
until fˆi (t) = fˆi (t − 1)∀i = 1, . . . , L)
In the paper by Paik and Katsaggelos, Algorithm 2.1 was enhanced to re-
move the step where the energy reduction is checked following the calculation of
∆fˆi [105]. Paik and Katsaggelos presented an algorithm which made use of a
more complicated neuron. In their model, each pixel was represented by a single
neuron which takes discrete values between 0 and S, and is capable of updating
its value by ±1, or keeping the same value during a single step. A new method
for calculating ∆fˆi was also presented:
where
−1, u < −θi
Ǵi = 0, −θi ≤ u ≤ θi (2.13)
1, u > θi
where θi = − 21 wii > 0.
repeat
{
For i = 1, . . . , L do
{
L
ui = bi + j=1 wij fˆj
∆fˆi = Ǵi (ui )
−1, u < −θi
where Ǵi (u) = 0, −θi ≤ u ≤ θi
1, u > θi
where θi = − 21 wii > 0
fˆi (t + 1) = K(fˆi (t) + ∆fˆi )
0, u < 0
where K(u) = u, 0 ≤ u ≤ S
S, u ≥ S
}
t=t+1
}
until fˆi (t) = fˆi (t − 1)∀i = 1, . . . , L)
Algorithm 2.2 makes no specific check that energy has decreased during each
iteration and so in [105] they proved that Algorithm 2.2 would result in a decrease
of the energy function at each iteration. Note that in Algorithm 2.2, each pixel
only changes its value by ±1 during an iteration. In Algorithm 2.1, the pixel’s
value would change by any amount between 0 and S during an iteration since
each pixel was represented by S + 1 neurons. Although Algorithm 2.2 is much
more efficient in terms of the number of neurons used, it may take many more
iterations than Algorithm 2.1 to converge to a solution (although the time taken
may still be faster than Algorithm 2.1). If we consider that the value of each pixel
represents a dimension of the L dimensional energy function to be minimized,
then we can see that Algorithms 2.1 and 2.2 have slightly different approaches
to finding a local minimum. In Algorithm 2.1, the energy function is minimized
along each dimension in turn. The image can be considered to represent a single
point in the solution space. In Algorithm 2.1, this point moves to the function
minimum along each of the L axes of the problem until it eventually reaches
a local minimum of the energy function. In Algorithm 2.2, for each pixel, the
point takes a unit step in a direction that reduces the network energy along that
dimension. If the weight matrix is negative definite (−W is positive definite),
however, regardless of how these algorithms work, the end results must be similar
(if each algorithm ends at a minimum). The reason for this is that when the
weight matrix is negative definite, there is only the global minimum. That is,
the function has only one minimum. In this case the matrix W is invertible and
taking (2.4) we see that:
f̂ = −W−1 b (2.15)
(assuming that W−1 exists).
The f̂ is the only minimum and the only stationary point of this cost func-
tion, so we can state that if W is negative definite and Algorithm 2.1 and Algo-
rithm 2.2 both terminate at a local minimum, the resultant image must be close
to f̂ for both algorithms. Algorithm 2.1 approaches the minimum in a zigzag
fashion, whereas Algorithm 2.2 approaches the minimum with a smooth curve.
If W is not negative definite, then local minimum may exist and Algorithms 2.1
and 2.2 may not produce the same results. If Algorithm 2.2 is altered so that
instead of changing each neuron’s value by ±1 before going to the next neuron,
the current neuron is iterated until the input to that neuron is zero, then Algo-
rithms 2.1 and 2.2 will produce identical results. Each algorithm will terminate
in the same local minimum.
Theorem 2.1: For each neuron i in the network during each iteration, there
exists a state change ∆fˆi∗ such that the energy contribution of neuron i is mini-
mized.
Proof:
1
∆E = − wii (∆fˆi )2 − ui ∆fˆi (2.16)
2
Differentiating ∆E with respect to ∆fˆi gives us:
δ∆E
= −wii ∆fˆi − ui
δ fˆi
The value of ∆fˆi which minimizes (2.16) is given by:
0 = −wii ∆fˆi∗ − ui
Therefore,
−ui
∆fˆi∗ = (2.17)
wii
QED.
Algorithm 2.3.
repeat
{
For i = 1, . . . , L do
{
L
ui = bi + j=1 wij fˆj
∆fˆi = G(ui )
−1, u < 0
where G(u) = 0, u=0
1, u>0
1
∆Ess = − wii (∆fˆi )2 − ui ∆fˆi (2.18)
2
−ui
If ∆Ess < 0 then ∆fˆi∗ =
wii
ˆi (t) + ∆fˆ∗ )
fˆi (t + 1) = K(f i
0, u < 0
where K(u) = u, 0 ≤ u ≤ S
S, u ≥ S
}
Each neuron is visited sequentially and has its input calculated. Using the
input value, the state change needed to minimize the neuron’s energy contribu-
tion to the network is calculated. Note that since ∆fˆi ∈ {−1, 0, 1} and ∆fˆi and
∆fˆi∗ must be the same sign as ui , step (2.18) is equivalent to checking that at
least a unit step can be taken which will reduce the energy of the network. If
∆Ess < 0, then
2.5 Analysis
In the paper by Paik and Katsaggelos, it was shown that Algorithm 2.2 would
converge to a fixed point after a finite number of iterations and that the fixed
point would be a local minimum of E in (2.3) in the case of a sequential algo-
rithm [105]. Here we will show that Algorithm 2.3 will also converge to a fixed
point which is a local minimum of E in (2.3).
Algorithm 2.2 makes no specific check that energy has decreased during each
iteration and so in [105] they proved that Algorithm 2.2 would result in a decrease
of the energy function at each iteration. Algorithm 2.3, however, changes the
current neuron’s state if and only if an energy reduction will occur and |∆fˆi | = 1.
For this reason Algorithm 2.3 can only reduce the energy function and never
δE
− Wf̂ − b = −u (2.20)
δ f̂
where u is a vector whose ith element contains the current input to neuron i.
Note that during any iteration, u will always point in a direction that reduces
the energy function. If f̂ = f̂ then for at least one neuron a change in state must
be possible which would reduce the energy function. For this neuron, ui = 0.
The algorithm will then compute the change in state for this neuron to move
closer to the solution. If |∆fˆi∗ | > 21 the neuron’s state will be changed. In this
case we assume that no boundary conditions have been activated to stop neuron
i from changing value. Due to the discrete nature of the neuron states we see
that the step size taken by the network is never less than 1.
To restate the facts obtained so far:
• During each iteration Algorithm 2.3 will reduce the energy of the network.
• A reduction in the energy of the network implies that the network has
moved closer to a local minimum of the energy function.
• There is a lower bound to the step size taken by the network and a finite
range of neuron states. Since the network is restricted to changing state
only when an energy reduction is possible, the network cannot iterate for-
ever.
From these observations we can conclude that the network reaches a local
minimum in a finite number of iterations, and that the solution given by Al-
gorithm 2.3 will be close to the solution given by Algorithm 2.1 for the same
problem. The reason Algorithms 2.1 and 2.3 must approach the same local
minimum is the fact that they operate on the pixel in an identical manner. In
Algorithm 2.1 each of the S + 1 neurons associated with pixel i is adjusted to
reduce its contribution to the energy function. The sum of the contributions
of the S + 1 neurons associated with pixel i in Algorithm 2.1 equals the final
grayscale value of that neuron. Hence during any iteration of Algorithm 2.1
the current pixel can change to any allowable value. There are S + 1 possible
output values of pixel i and only one of these values results when the algorithm
minimizes the contribution of that pixel. Hence whether the pixel is represented
by S + 1 neurons or just a single neuron, the output grayscale value that occurs
when the energy contribution of that pixel is minimized during the current itera-
tion remains the same. Algorithms 2.1 and 2.3 both minimize the current pixel’s
energy contribution; hence they must both produce the same results. In practice
the authors have found that all three algorithms generally produce identical re-
sults, which suggests that for reasonable values of the parameter λ, only a single
global minimum is present.
1
|ui | < wii , ∀i ∈ {0, 1, . . . , L} (2.21)
2
In [105], Paik and Katsaggelos noticed this feature as well, since the same
termination conditions apply to Algorithm 2.2. The self-connection weight, wii ,
controls how close to the ideal solution the algorithm will approach before ter-
minating. Since increasing the value of λ increases the value of wii , we would
expect also that the algorithm would terminate more quickly and yet be less
accurate for larger values of λ. This is found to occur in practice. When λ is
increased, the number of iterations before termination drops rapidly.
2.7.1 Setup
In both these examples, the images were blurred using a Gaussian PSF with the
impulse response:
2
1 x y2
h(x, y) = exp − + 2 (2.22)
2πσx σy 2σx2 2σy
where σx and σy are the standard deviations of the PSF in the x and y directions,
respectively.
2.7.2 Efficiency
The time taken to restore an image was compared among Algorithms 2.1, 2.2
and 2.3. A degraded image was created by blurring a 256 by 256 image with a
Gaussian blur of size 5 by 5 and standard deviation 2.0. Noise of variance 4.22
was added to the blurred image. Each algorithm was run until at least 85% of the
pixels in the image no longer changed value or many iterations had passed with
the same number of pixels changing value during each iteration. Algorithm 2.1
was stopped after the sixth iteration when no further improvement was possible,
and took 6067 seconds to run on a SUN Ultra SPARC 1. Algorithm 2.2 was
stopped after the 30th iteration with 89% of pixels having converged to their
stable states and took 126 seconds to run. Algorithm 2.3 was stopped after the
18th iteration with 90% of pixels stable and took 78 seconds to run. Algorithm
2.3 is much faster than Algorithms 2.1 and 2.2, despite the fact that algorithms
2.1 and 2.3 approach the same local minimum and hence give the same results.
The computation time of Algorithm 2.3 can be expected to increase linearly with
the number of pixels in the image, as can the computation times of Algorithms 2.1
and 2.2. The single step neuron energy minimization technique of Algorithm 2.3
provides its superior speed and this trend holds for any size image. Various types