EXACT AND APPROXIMATE SPARSE SOLUTIONS OF
UNDERDETERMINED LINEAR EQUATIONS
SADEGH JOKAR AND MARC E. PFETSCH
Abstract. In this paper, we empirically investigate the NP-hard problem of finding sparsest solutions to linear equation systems, i.e., solutions
with as few nonzeros as possible. This problem has received considerable
interest in the sparse approximation and signal processing literature, recently. We use a branch-and-cut approach via the maximum feasible
subsystem problem to compute optimal solutions for small instances
and investigate the uniqueness of the optimal solutions. We furthermore
discuss five (modifications of) heuristics for this problem that appear in
different parts of the literature. For small instances, the exact optimal
solutions allow us to evaluate the quality of the heuristics, while for
larger instances we compare their relative performance. One outcome
is that the so-called basis pursuit heuristic performs worse, compared
to the other methods. Among the best heuristics are a method due to
Mangasarian and a bilinear approach.
1. Introduction
The area of sparse approximations has seen tremendous research progress
in the recent years. At the core of many of these activities are results that
ensure the efficient recovery of sparse solutions, i.e., solutions having few
nonzeros, to linear systems Ax = b with A ∈ m×n , b ∈ m . The basic idea is to let A be a composition of several bases of m and hence
n > m. This implies that b has several representations, often guaranteeing sparse solutions to the equation system: Under appropriate assumptions
on A and if b is known to have a sufficiently sparse representation, Donoho
and Huo [21] showed that the sparsest representation is unique. Moreover,
a striking result states that, under slightly stronger assumptions, one can
efficiently find this representation by solving a linear program (LP), which
models the minimization of the ℓ1 -norm of x subject to Ax = b. This fact
was first investigated empirically by Chen, Donoho, and Saunders [13] and
then theoretically studied by Donoho and Huo [21]. This approach is commonly called basis pursuit. Many strengthenings of this type of result have
been obtained, see Section 1.3 for more details.
A generalization to the sparse representation problem is to allow Ax deviate from b with respect to some ℓp -norm, i.e., to look for the sparsest
solution x satisfying kAx − bkp ≤ δ for given δ > 0, where p ∈ [1, ∞]. The
literature investigates conditions on A and b such that a solution minimizing the ℓ1 -norm under such constraints is near a sparsest solution. Donoho,
R
R
R
Date: March 2007.
Supported by the DFG Research Center Matheon “Mathematics for key technologies”
in Berlin.
1
2
SADEGH JOKAR AND MARC E. PFETSCH
Elad, and Temlyakov [20], Tropp [38], Donoho [16], Fuchs [25], and Candès,
Romberg, and Tao [9] deal with the case p = 2. Donoho and Elad [19] and
Fuchs [26] investigate p = ∞.
Another type of algorithm has its origin in the so-called matching pursuit
approach, see Mallat and Zhang [28]. Similar results to the above mentioned
can be proved for orthogonal matching pursuit, a variant of matching pursuit,
developed by Pati, Rezaiifar, and Krishnaprasad [31], and Davis, Mallat, and
Zhang [14]. For the results see Tropp [37], Donoho, Elad, and Temlyakov [20],
and Tropp and Gilbert [36].
The sparse representation problem is closely connected to the maximum
feasible subsystem problem (Max FS), in which one searches for a largest
feasible subsystem of an infeasible linear inequality system; see Section 2 for
a more detailed description and literature. In this context several heuristics
have been proposed, e.g., a method using a bilinear formulation by Bennett
and Bredensteiner [6] and the work of Mangasarian [29].
Unfortunately, the combinatorial optimization problem of finding a sparsest solution to an equation system is computationally hard, i.e., NP-hard
(the same holds for Max FS). The above mentioned results investigate conditions under which sparse solutions can nevertheless be found efficiently and
the analyses have become increasingly more exact. In practical applications,
however, one needs sparse solutions also when the conditions investigated
in the literature do not hold. One approach to this problem is to develop
an exact combinatorial optimization method that can compute an optimal
sparsest solution for practical problem sizes. The theoretical hardness results do not exclude such methods, since computational complexity captures
worst-case and asymptotical behavior only. It seems, however, that the literature on sparse representations generally believes such an approach to be
impossible. A second approach is to settle for algorithms that efficiently
compute good approximate solutions to the above problem, i.e., they find
solutions that have not much more nonzeros than the optimal solutions.
In this paper, we empirically investigate an exact branch-and-cut algorithm that was originally developed for Max FS. We furthermore investigate
the practical behavior of five approximate methods: Besides the above mentioned basis pursuit and orthogonal matching pursuit, we propose a variant
of orthogonal matching pursuit that improves upon its results, but is computationally more expensive. We also use the nonlinear optimization method
due to Mangasarian, mentioned above. Finally, we adapt the bilinear optimization approach of Bennett and Bredensteiner to the case of finding sparse
solutions. We consider sparse solutions satisfying the given equation system
Ax = b and also solutions that satisfy kAx − bk∞ ≤ δ for a given δ > 0.
The availability of the exact approach enables us to estimate the quality
of heuristics via the optimum or the lower bound that is produced by the
branch-and-cut approach. The exact algorithm can also be used to check
whether the sparsest solution is unique, a key feature of the investigated
problem.
The findings of this paper can be summarized as follows. We confirm
the common pessimism about the optimal solvability of the exact sparse
representation problem. It is – both theoretically and empirically – a very
EXACT AND APPROXIMATE SPARSE SOLUTIONS OF LINEAR EQUATIONS
3
hard problem: the exact branch-and-cut algorithm can only solve very small
instances to optimality, or instances in which the optimal solution has few
nonzeros. On the positive side, the heuristics perform quite well, in particular, Mangasarian’s approach. It is also a good idea to greedily decrease the
support of a solution produced by a heuristic, if possible. Basis pursuit turns
out to be the worst of the studied heuristics. If one wants to improve upon
basis pursuit or orthogonal matching pursuit, however, one has to pay the
price of solving several LPs.
The structure of this article is as follows. After fixing some notation and
introducing the basic problems of this paper, in Section 1.3 we review results
from the literature. This includes results on the computational complexity
of the problems and examples of the representation theorems that can be
proved in our context. In Section 2, we discuss our exact approach via the
maximum feasible subsystem problem. Section 3 contains the introduction of
the five heuristics mentioned above. In Section 4, we present computational
experiments with the exact and approximate algorithms. We close with some
final remarks.
1.1. Notation
We depict vectors in bold font, and for two vectors x and y, we write xT y
for their inner product. We denote by 1 the vector of all ones and by 0 the
zero vector of appropriate dimensions. For a positive integer n, we define
[n] := {1, . . . , n}. For x ∈ n , we denote by supp(x) = {j ∈ [n] : xj 6= 0}
the support of x and define the ℓ0 quasi-norm to be the size of the support,
i.e., kxk0 := #{supp(x)}. We will also use the ℓ1 , Euclidean ℓ2 , and max
ℓ∞ -norm:
n
n
1
X
X
2
|xj |,
kxk2 =
x2j , and kxk∞ = max |xj |,
kxk1 =
R
j=1
j∈[n]
j=1
respectively.
1.2. Basic Problems
The basic problem of this paper is to find the sparsest solution to Ax = b,
where A ∈ m×n is of full rank and b ∈ m . We therefore want to solve:
R
R
(P0 )
min {kxk0 : Ax = b}.
We assume that m < n, and hence the solutions of Ax = b are underdetermined. We also consider the variant of this problem, where we allow deviations from equality, i.e., we allow solutions x that satisfy kAx − bk∞ ≤ δ
for some δ ≥ 0. This problem can be posed as follows.
(P0δ )
min {kxk0 : b − δ · 1 ≤ Ax ≤ b + δ · 1}.
Clearly, problem (P0 ) is equivalent to (P00 ), but we will often distinguish
the two, since specialized algorithms for (P0 ) might be more efficient. For
convenience, we define the sets of feasible points
Rn : Ax = b} and
Qδ := {x ∈ Rn : b − δ · 1 ≤ Ax ≤ b + δ · 1}.
Q := {x ∈
4
SADEGH JOKAR AND MARC E. PFETSCH
We will always assume that Qδ 6= ∅ (for δ = 0 this is implied by the full
rank of A). The above problems only become interesting if 0 ∈
/ Qδ . For
0
Q = Q this is equivalent to the assumption that b 6= 0. For Qδ we need
δ < max{|b1 |, . . . , |bm |} = kbk∞ .
Remark. We take the ℓ∞ -norm as a measure for deviations of equality in
this paper, since it is the most suitable for linear programming based approaches. With some effort one could include deviations w.r.t. the ℓ1 -norm,
but using the ℓ2 -norm would turn (P0δ ) into a nonlinear mixed-integer problem, which would make our exact solution approach even more difficult. The
ℓ2 -norm is usually used in the literature, see, for instance, Donoho, Elad,
and Temlyakov [20], and Tropp [38].
Replacing the quasi-norm k·k0 by k·k1 in (P0 ) and (P0δ ) yields the following
two problems.
(P1 )
(P1δ )
min {kxk1 : x ∈ Q}
min {kxk1 : x ∈ Qδ }.
These problems are efficiently solvable by linear programming methods, see
Section 3.1. The problems (P1 ) and (P1δ ) will be used several times later on.
1.3. Results from the Literature
In this section, we collect results from the literature concerning sparse solutions of Ax = b. We first deal with the computational complexity properties
of the problem. These seem hardly to be known in the sparse representation
literature.
The decision version of (P0 ) is one of the classical NP-complete problems,
see Garey and Johnson [27], i.e., if P 6= NP there exists no polynomial
time algorithm that for every instance computes the optimal solution of
(P0 ). Since (P0δ ) contains (P0 ) as a special case, (P0δ ) is NP-hard as well;
Natarajan [30] presents an NP-hardness proof for δ > 0.
Moreover, both problems are hard to approximate, i.e., if a very likely
assumption is true (see below), there is no polynomial time algorithm that
1−ρ
computes a solution with a support that is at most 2log n times as large as
the support of an optimal solution, for any ρ > 0, see Amaldi and Kann [3]
for (P0 ) and Amaldi [2] for (P0δ ).
The precise formulation is as follows. Let DTIME(spolylog s ) be the class
of problems whose instances of size s can be solved in deterministic time
O(spolylog s ), where polylog s refers to any polynomial in log s. The assumption NP * DTIME(spolylog s ) is stronger than P 6= NP, but it is
also believed to be extremely likely; it amounts to the claim that not all
problems in NP can be solved in quasi-polynomial time. Then, assuming
NP * DTIME(spolylog s ), (P0 ) and (P0δ ) cannot be approximated in poly1−ρ
nomial time within a factor 2log n , for any ρ > 0 (n is the number of
columns of A). This shows that (P0 ) and (P0δ ) are indeed theoretically hard
optimization problems.
Having the above negative results in mind, it is quite surprising that strong
statements about efficient solutions to the sparse representation problem exist. We give a flavor of these results, for which we assume that A = [B 1 B 2 ],
EXACT AND APPROXIMATE SPARSE SOLUTIONS OF LINEAR EQUATIONS
5
where B 1 = [p1 , . . . , pm ] and B 2 = [q 1 , . . . , q m ] are two orthonormal bases
of m (and hence A ∈ m×2m ). The following number, called mutual incoherence, is central for the results:
R
R
M := M (B 1 , B 2 ) := max{|pT
i q j | : 1 ≤ i, j ≤ m}.
√
Note that M ≤ 1, because B 1 and B 2 are normalized and M ≥ 1/ m (see,
e.g., Donoho and Huo [21]). It turns out that the smaller the mutual incoherence, the nicer the properties that the matrix has with respect to sparse
representation problems. Matrices with a minimum mutual incoherence are
also called Grassmannian frames, see Strohmer and Heath [35].
R
Theorem 1 (Elad and Bruckstein [24]). Let A be as above, b ∈ m , and
1
. Then x⋆ is the
let x⋆ be an optimal solution of (P0 ) with α := kx⋆ k0 < M
unique solution of Ax = b with kxk0 ≤ α.
If we slightly decrease the bound on the number of nonzeros, we can
recover the unique optimal solution by solving (P1 ).
R
Theorem 2 (Elad and Bruckstein [24]). Let A be as above, b√∈ m , and
let x⋆ be the (unique) optimal solution of (P0 ) with kx⋆ k0 < ( 2 − 21 )/M .
Then the optimal solution of (P1 ) coincides with x⋆ .
1
) obtained by Donoho
This bound is an improvement of the bound 21 (1+ M
√
1
and Huo [21]. The bound ( 2 − 2 )/M ≈ 0.9142/M is only slightly smaller
1
of Theorem 2.
than the bound M
The statement of Theorem 2 can be strengthened for random matrices.
For instance, we have the following.
R
Theorem 3 (Donoho [17]). Let A ∈ m×n , with m < n, be a matrix whose
columns are unit vectors taken uniformly at random. Then there exists a
constant ρ(n) such that the following holds: Let x⋆ be the optimal solution
of (P0 ) for the system Ax = b, and assume that kx⋆ k0 < ρ(n) · n. Then the
optimal solution of (P1 ) coincides with x⋆ .
Hence, this results shows that for random matrices one can find optimal
solutions x⋆ of (P0 ) for appropriate α ∈ O(n) if kx⋆ k0 < α by√linear programming. In contrast, Theorem 2 only works for α = 1/M ∈ O( n). Thus,
asymptotically Theorem 3 is stronger than Theorem 2. However, known
bounds on ρ(n) are very small, see also Candès and Tao [11].
For further extensions and strengthenings of these result see, for instance,
Donoho and Elad [18], Candès and Romberg [7], Candès, Romberg, and
Tao [8], and Donoho [15].
R
Remark. A complementary problem to (P0 ) is as follows. Let B ∈ r×s
with r > s. Suppose that b := Bx′ + e, where x′ is some given vector and e
is a vector of errors. The goal is to recover x′ , given B and b. It turns out
that, if B has full rank, this problem is equivalent to (P0 ) and, similarly
to (P0 ), it can be solved by linear programming under certain assumptions
on B and if e is sparse. For more information see Candès et al. [10], and
Candès and Tao [11].
As mentioned in the introduction, there are several articles that deal with
the problem min {kxk0 : kAx − bk2 ≤ δ}. For the case with the ℓ∞ -norm
6
SADEGH JOKAR AND MARC E. PFETSCH
instead of the ℓ2 -norm (i.e., (P0δ )) there are much fewer results. We cite the
following.
Theorem 4 (Elad and Donoho [19]). Let A = [B1 B2 ] be as above. For
x̃ ∈ n and e ∈ m , with kek∞ ≤ ε, define b := Ax̃ + e. If for some γ ≥ 0
and δ ≥ ε, we have
1+M
√
kx̃k0 <
,
2M + 2 m(ε + δ)/γ
R
R
then the solution x⋆ to (P1δ ) satisfies kx⋆ − x̃k1 ≤ γ.
In other words, if there exists a sparse solution x̃ to the disturbed system,
then the optimal solution to (P1δ ) is not far from x̃. For the ℓ2 -norm case
mentioned above, it can also be proved that under appropriate conditions
the support of x⋆ is contained in the support of x̃.
2. Integer Programming Approaches
The sparse representation problem (P0δ ) can be formulated as a mixed
integer linear program as follows:
min 1T z
b − δ · 1 ≤ Ax ≤ b + δ · 1
−Z · z ≤ x ≤ Z · z
z ∈ {0, 1}n .
If Z > 0 is large enough, then zj = 1, j ∈ [n], places no restrictions on xj ,
while zj = 0 forces xj = 0. This shows that the above model indeed finds
optimal solutions for (P0δ ).
This approach, however, has usually suffers from serious numerical difficulties. On the one hand, Z has to be large enough to guarantee the correct
behavior of the model. On the other hand, Z has to be as small as possible,
since large values of Z can lead to wrong results, because of numerical problems. Furthermore, the larger Z is, the worse the LP-bounds in LP-based
integer programming solvers; this has negative effects on the performance.
While for special data A and b, good bounds on kxk∞ – and hence Z – may
be possible, a priori bounds are impractically large, see e.g. Schrijver [34].
The problems with the previous approach can be avoided by transformation to the so-called maximum feasible subsystem problem (Max FS). Here,
one is given an infeasible linear inequality system Dx ≤ d and wants to
find a feasible subsystem of maximal cardinality. See Amaldi, Pfetsch, and
Trotter [4] or Pfetsch [32] for more information.
For the case δ = 0, the transformation of an instance Ax = b of (P0 ) to an
instance of Max FS works as follows, see Amaldi and Kann [3]. Since A has
full rank, we can use Gaussian elimination to obtain the equivalent system
u + Rv = r, where R ∈ Rm×(m−n) , r ∈ m , u ∈ m , and v ∈ m−n . We
now consider the system
R
R
R
u + Rv = r, u = 0, v = 0,
which is infeasible (recall that we assumed b 6= 0 and hence r 6= 0). Eliminating u, we obtain Rv = r, v = 0. The Max FS instance is then
Rv ≤ r, −Rv ≤ −r, v ≤ 0, −v ≤ 0.
(1)
EXACT AND APPROXIMATE SPARSE SOLUTIONS OF LINEAR EQUATIONS
7
This system has 4m − 2n inequalities and m − n variables. Since every
point v satisfies at least one of the opposite inequalities, at most one of
each pair of inequalities has to be removed in order to obtain a feasible
subsystem. Moreover, if k inequalities are removed to yield a feasible system,
a solution v to the remaining system yields a solution (u, v) of u + Rv = r
having at most k nonzeros, where u = r − Rv. Conversely, a solution to
u+Rv = r with k nonzeros yields a feasible subsystem of (1) with k removed
inequalities.
If δ > 0, it seems that there exists no easy reduction to Max FS. We can,
however, proceed as follows. We consider the infeasible system
Ax ≤ b + δ · 1, −Ax ≤ −b + δ · 1, x ≤ 0, −x ≤ 0,
where we declare the first 2m inequalities as mandatory and hence only the
last 2n nonnegativity inequalities can be removed to obtain a feasible system.
Similar to above, a feasible subsystem including the first 2m constraints and
skipping k of the last 2n inequalities yields a solution x ∈ Qδ with at most k
nonzeros. Conversely, as solution x ∈ Qδ with k nonzeros yields a feasible
subsystem containing the first 2m constraints and skipping k of the last 2n
inequalities.
Computing optimal solutions to Max FS can be done by a branch-andcut algorithm. The handling of mandatory inequalities is straightforward.
We refer to Pfetsch [33] for an extensive discussion of this approach.
In principle, a direct branch-and-cut algorithm for the sparse representation problem could avoid the doubling of variables arising in the transformation to Max FS. However, more research would be necessary to turn this
into a practical approach.
3. Heuristic Approaches
In this section we present five heuristic approaches to solve (P0δ ). We
discuss basis pursuit, orthogonal matching pursuit, a nonlinear optimization
approach due to Mangasarian, and a bilinear formulation.
3.1. Basis Pursuit
The approach to replace the ℓ0 quasi-norm in (P0 ) and (P0δ ) by the more
tractable ℓ1 -norm to obtain problems (P1 ) and (P1δ ) is called basis pursuit (BP) and was pioneered by Chen, Donoho, and Saunders [13], see also
Chen [12].
There are several ways to rewrite these problems as linear programs. We
will only present the different formulations for (P1δ ) – the changes for (P1 )
are straightforward. A first easy representation is
min 1T y
b − δ · 1 ≤ Ax ≤ b + δ · 1
−y ≤ x ≤ y.
Note that here y is automatically nonnegative. This formulation has 2n
variables and 2(m + n) constraints. For many LP-solvers, e.g., simplex algorithm based solvers, the number of (ordinary) constraints is crucial for
speed, while bounds on variables are implicitly handled and provide little
8
SADEGH JOKAR AND MARC E. PFETSCH
additional computational cost. Therefore the following formulation, which
we use in our implementations, should be more efficient.
min 1T (x+ + x− )
A(x+ − x− ) + s = b
−δ · 1 ≤ s ≤ δ · 1
x+ , x− ≥ 0.
This system has 2n + m variables, m equations, and 2(n + m) bounds on the
variables. It arises by splitting x into its positive part x+ and its negative
part x− and introducing slack variables s. Note that in the optimum, for
−
every j ∈ [n], not both x+
j and xj are positive; hence, we have the corre−
+
−
spondence x+
j − xj = xj and xj + xj = |xj |. Clearly, in a formulation
for (P0 ), variables s are not necessary.
Remark. A general problem with all linear programming based approaches
is that the matrix A is usually dense in applications. This has negative
effects on the performance of LP-solvers, since the speed of solving the equation systems in LP-solvers approaches its worst case cubic behavior, and
techniques for handling sparse data do not help here. See Section 4 for more
information and, for instance, Yarmish [39] for an implementation of a dense
LP-solver.
3.2. Orthogonal Matching Pursuit
The orthogonal matching pursuit (OMP) approach to heuristically solve
(P0 ) was developed by Pati, Rezaiifar, and Krishnaprasad [31] and Davis,
Mallat, and Zhang [14]. We will not discuss the earlier proposed matching
pursuit introduced by Mallat and Zhang [28].
The idea of OMP to find a sparse solution of Ax = b is the following.
At the ith iteration, one is given an approximate point xi with residue r i =
b − Axi . We want to enlarge the support S i of xi by a single index and
therefore compute
i
i
j ⋆ = argmax {|AT
j r | : j ∈ [n] \ S },
where Aj is the jth column of A. Then one sets S i+1 = S i ∪ {j ⋆ } and solves
the least squares problem
X
λj Aj 2 : λj = 0 for j ∈
/ S i+1 }.
(2)
minn { b −
R
λ∈
j∈S i+1
The solution is taken as the new point xi+1 and one sets r i+1 = b − Axi+1 .
We stop the process if kr i k∞ is small enough. The number of iterations gives
the number of nonzeros of the computed solution to Ax = b.
To get a solution for (P0δ ), we perform OMP as just explained, but stop
as soon as kr i k∞ ≤ δ.
Remark. We use the ℓ∞ -norm for the stopping criterion in order get a fair
comparison with linear programming based approaches: In standard LPsolvers feasibility is checked w.r.t. the ℓ∞ -norm, using a feasibility tolerance
of around ε = 10−6 . We use this tolerance in all implementations.
EXACT AND APPROXIMATE SPARSE SOLUTIONS OF LINEAR EQUATIONS
9
There are examples in which OMP produces solutions arbitrarily far away
from the optimal solution of (P0 ), see Chen, Donoho, and Saunders [13].
Nevertheless, one can prove interesting results about the quality of OMP
solutions. For instance, Tropp and Gilbert [36] showed that if A consists
of independent random unit vectors and b has a sparse enough representation, then OMP computes this representation (which is unique) with high
probability.
Recently Donoho, Tsaig, Drori, and Starck [22] proposed stagewise orthogonal matching pursuit, a variant of OMP in which many coefficients can
enter the model at each step.
3.2.1. Generalizations of OMP. Orthogonal matching pursuit can be generalized as follows. We replace (2) by the following problem with S = S i+1 .
min 1T (u + w)
b − δ · 1 − Ax ≤ u
Ax − b − δ · 1 ≤ w
xj = 0 j ∈
/S
u, w ≥ 0.
(3)
This LP computes a solution closest to Qδ with respect to the ℓ1 -norm that
uses only variables in S (note that in an optimal solution at most one of uj
and wj is positive). The process is terminated if the objective function value
of (3) is small enough.
Similarly, one can replace (2) by
X
λj Aj ∞ : λj = 0 for j ∈
/ S i+1 }.
(4)
minn { b −
R
λ∈
j∈S i+1
This leads to an LP that is similar to (3):
min z
b − δ · 1 − Ax ≤ z · 1
Ax − b − δ · 1 ≤ z · 1
xj = 0
j∈
/ S,
where S = S i+1 again denotes the current support set. The process is
terminated when the optimal solution value is small enough.
In computational experiments it turned out that the results of these two
methods do not significantly differ from the results produced by OMP. We
therefore do not investigate these two variants further.
Another generalization yields a new method that we call greedy orthogonal
matching pursuit (GOMP). We present it for the case δ > 0 only; the modifications for δ = 0 are straightforward. At the ith iteration, we are given
an approximate solution xi with support S i and compute the best extension
of S i by solving (3) for each S = S i ∪ {j}, j ∈
/ S i . We choose j ⋆ to be
an index j that produces the smallest value in (3). The new point xi+1 is
given by the solution of the corresponding LP. GOMP naturally increases
the computational effort with respect to OMP, since it has to solve many
LPs in each iteration. See Section 4 for more information.
10
SADEGH JOKAR AND MARC E. PFETSCH
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-3
-2
-1
0
1
2
3
Figure 1: Plot of f2 (x), f4 (x), f8 (x), and f16 (x) for n = 1.
3.3. Mangasarian’s Approach
We now present an adaptation of a general method due to Mangasarian [29]. The idea is to replace kxk0 by the approximation
fα (x) =
n
X
j=1
1 − e−α|xj | ,
for some α > 0, see Figure 1. We clearly have fα (x) ≤ kxk0 and for x ∈
Rn ,
lim fα (x) = kxk0 .
α→∞
For a fixed α and δ, we consider the problem
(Mαδ )
where
S(δ) := {(x, y) ∈
min{fα (y) : (x, y) ∈ S(δ)},
Rn × Rn : b − δ · 1 ≤ Ax ≤ b + δ · 1, −y ≤ x ≤ y}.
Mangasarian [29] proved that a stationary point of (Mαδ ) can be reached
by a successive linearization algorithm in finite time. Furthermore, there
exists some α such that (Mαδ ) optimally solves (P0δ ).
The successive linearization algorithm works as follows. Given a solution
(xi , y i ) ∈ S(δ), we find a (vertex) solution (xi+1 , y i+1 ) of
min { cT y : (x, y) ∈ S(δ) },
where
i
cj = α e−αyj , j = 1, . . . , n.
Here, we replace |xj | by yj and use that
d
dyj (1
− e−α yj ) = αe−α yj .
In the above linear program the constant part of the linearization is skipped,
because it is irrelevant. We terminate the process once ky i+1 − y i k∞ is small
enough.
Although it is guaranteed that there exists α such that (Mαδ ) computes
an optimal solution to (P0δ ), there is no efficient way to compute it (as long
as P 6= NP). We therefore use the following simple approach. We start
with α = 0.1 and double it after every of 20 iterations, each time solving
the linearization with the successive linearization algorithm and storing the
sparsest solution found.
EXACT AND APPROXIMATE SPARSE SOLUTIONS OF LINEAR EQUATIONS
11
3.4. A Bilinear Formulation
In this section, we introduce another formulation for the problem (P0δ ) and
a heuristic way to solve this model. This approach follows ideas of Bennett
and Bredensteiner [6].
Lemma 5. The minimization problem (P0δ ) is equivalent to the following
bilinear program (with equilibrium constraints):
min 1T z
b − δ · 1 ≤ Ax ≤ b + δ · 1
xj (1 − zj ) = 0, j ∈ [n]
0 ≤ z ≤ 1.
Proof. Suppose that x⋆ is an optimal solution of (P0δ ) and (x̂, ẑ) is an optimal
solution of the above bilinear program. We want to show that 1T ẑ = kx⋆ k0 .
Defining zj⋆ = 1 if x⋆j 6= 0 and zj⋆ = 0 if x⋆j = 0, we have
kx⋆ k0 = kz ⋆ k0 = 1T z ⋆ ≥ 1T ẑ,
since (x⋆ , z ⋆ ) is a feasible point for the above problem. Conversely, since if
x̂j 6= 0 then ẑj = 1, we have kx̂k0 ≤ 1T ẑ. Hence, it follows that
kx⋆ k0 ≤ kx̂k0 ≤ 1T ẑ,
since x̂ is a feasible solution for (P0δ ).
Note that the set of feasible points of the bilinear formulation is not convex
anymore. Hence, there may exist many local optima.
A direct successive linearization method does not help for the above model,
since fixing x or z forces hard bounds for the remaining variables that cannot
be changed. We can, however, use a trick similar to the approach of Bennett
and Bredensteiner [6]. The idea is to introduce a bound β ∈ + on the
objective function and move the bilinear constraints to the objective. To
make the problem bounded, we note that xj (1 − zj ) = 0 if and only if
|xj |(1−zj ) = 0. Hence, we again introduce variables y modeling the absolute
values of x (see Section 3.1). This leads to the following formulation:
Z
(Bβδ )
Z
min y T (1 − z)
(x, y) ∈ S(δ)
0≤z ≤1
1T z ≤ β.
Given a fixed β ∈ + , a natural successive linearization method for this
formulation is as follows. At the ith iteration we have points (xi , y i ) ∈ S(δ)
and z i ∈ [0, 1]n . We then solve the linear program
δ )
(Bβ,1
min y T (1 − z i )
(x, y) ∈ S(δ)
to obtain xi+1 and y i+1 . Then we solve
δ )
(Bβ,2
min (y i+1 )T (1 − z)
0≤z≤1
1T z ≤ β,
12
SADEGH JOKAR AND MARC E. PFETSCH
to get z i+1 . The process is repeated as long as the objective function improves, i.e., (y i+1 )T (1 − z i+1 ) < (y i)T (1 − z i).
δ ) has a simple closed form
In fact, the second linearization problem (Bβ,2
solution, see also [6]; we skip the easy proof.
R
Lemma 6. Given the nonnegative vector y i+1 ∈ n , let j1 , . . . , jn be a
. Then the vector
permutation of [n], such that yji+1
≥ yji+1
≥ · · · ≥ yji+1
n
1
2
defined by zjk = 1 for k = 1, . . . , β and zjk = 0 for k = β + 1, . . . , n is an
δ ).
optimal solution to (Bβ,2
Bennett and Mangasarian [5] prove that (Bβδ ) has a vertex solution and
then provide the following finiteness result for the successive linearization
algorithm.
Theorem 7 (Bennett and Mangasarian [5]). The successive linearization
algorithm terminates in a finite number of steps at a global solution or a point
(xi+1 , y i+1 , z i) that satisfies the minimum principle necessary optimality
condition
(1 − z i )T (y − y i+1 ) + (y i+1 )T (z i − z) ≥ 0,
δ ) and z which is feasible for (B δ ).
for all (x, y) feasible for (Bβ,1
β,2
On top of the above linearization process one adds a binary search routine
to determine the optimal β. We use the secant method described in [6].
3.5. Postprocessing
Having obtained a solution x⋆ by a heuristic method, we can try to iteratively reduce the support of given points xi ∈ Qδ , where x0 := x⋆ . For this
we let S = supp(xi ) and check for each k ∈ S whether
{x ∈ Qδ : xj = 0, j ∈ [n] \ S, xk = 0} =
6 ∅.
If this is the case for some k, we let xi+1 be the solution of this linear
program. This approach sometimes helps to further reduce the support of
the solution.
4. Computational Results
We conducted several computational experiments with the algorithms discussed in this paper. We use deterministic and random matrices and construct instances that are guaranteed to have small support.
The heuristics are implemented as described in Section 3 and the branchand-cut algorithm as indicated in Section 2. All algorithms are implemented
in C/C++. The linear programs for the heuristics are solved via the primal simplex implementation of CPLEX 10.11. The implementation of the
branch-and-cut algorithm is described in detail in Pfetsch [33]; it is based on
the framework SCIP (Solving Constraint Integer Programs) version 0.90 by
Achterberg [1], using CPLEX 10.11 for solving the intermediate LPs. The
computations were performed on a 3.4 GHz Pentium 4 machine with 3 GB
of main memory and 2 MB cache, running Linux.
EXACT AND APPROXIMATE SPARSE SOLUTIONS OF LINEAR EQUATIONS
13
4.1. Experiments with Deterministic Matrices
In this section, we use deterministic matrices constructed as follows. We
take A = [I H] ∈ m×2m , where H is the normalized Hadamard matrix
of size m, and I is the m × m identity matrix; this matrix was also used
in Donoho, Elad, and Temlyakov [20]. By construction
√ of the Hadamard
matrix, m is a power of 2. The incoherence is M := 1/ m.
R
4.1.1. The case m = 16. We first investigate the case in which the number
of rows of A is m = 16. Hence, the number of columns is n = 32, and the
number of variables in the Max FS formulation is 64.
In a first experiment, we study the case in which we allow no deviation,
i.e., δ = 0. For each N = 1, 2, . . . , 16, we generated three random instances
as follows. Take a random vector x0 with N nonzeros. The support entries
are chosen uniformly at random, and the values are drawn from a normal
distribution with zero mean and variance 1. Then take b = Ax0 . Hence, it
is guaranteed that Ax = b has a solution with N nonzeros, but it is possible
that there exist solutions with fewer nonzeros – see the results below. Note
that from N = 16 on, there trivially are solutions with at most 16 nonzeros.
Table 1 shows the results of the branch-and-cut algorithm and the heuristics. The columns have the following meaning: “nodes” gives the number of
nodes/subproblems in the branch-and-cut algorithm, “time” is the CPU time
of the branch-and-cut algorithm in seconds, “sols” lists the number of optimal
solutions, “opt” gives the number of nonzeros in an optimal solution, “BP”
shows the results of basis pursuit, “MG” of Mangasarian’s approach, “OMP”
of orthogonal matching pursuit, “GOMP” of the greedy OMP, and “BL” of
the bilinear approach. Postprocessing never improved the solutions. Heuristical solutions that are optimal are marked in bold face. For computing the
number of optimal solutions we set a time limit of 10 hours. Numbers appearing in italics in the corresponding column were obtained when the time
limit was reached. The last row contains the geometric means over each
column. The computation times of the heuristics are well below one second
and are therefore not listed in the table. We now discuss these results in
more detail.
The results show that the larger N gets, the more time the branch-andcut algorithms needs. For N ≤ 13 all instances can be solved in a couple of
seconds, while for N = 16 the execution almost needs up to six hours. In
fact, the solution time is correlated to the size of the optimal solution – if
it is at least 15, the instances get very hard to solve. Note that for N ≤ 12
the size of each optimal solution is N , while for larger N there often exists
a solution of size smaller than N .
We used a modification of the branch-and-cut algorithm to count the number of optimal solutions. Table 1 shows that for N ≤ 11 (with one exception
for N = 9) the optimal solutions are unique,
although Theorem 1 guaran√
tees uniqueness only for N ≤ 1/M = 16 = 4. These results complement
the experiments of Elad [23], who investigated uniqueness for matrices of
size 8 × 20 by enumeration.
The heuristics perform very well for small N , usually finding the optimal
solution; an exception is the second instance for N = 6, where all heuristics
14
SADEGH JOKAR AND MARC E. PFETSCH
Table 1: Results of the branch-and-cut algorithm (columns 2–5) and heuristics (BP, MG,
OMP, GOMP, and BL) for Hadamard matrices with m = 16 and δ = 0. Column “opt”
gives the optimal solution, and numbers in bold are heuristical solutions that are optimal.
The last row contains the geometric mean of the entries in each column.
N
nodes
time
sols
opt
BP
MG
OMP
GOMP
BL
1
1
1
2
2
2
3
3
3
4
4
4
5
5
5
6
6
6
7
7
7
8
8
8
9
9
9
10
10
10
11
11
11
12
12
12
13
13
13
14
14
14
15
15
15
16
16
16
1
1
1
1
2
1
1
2
2
4
1
2
1
1
4
1
17
1
1
4
15
13
14
53
167
123
227
915
633
296
2751
1539
719
8827
4772
2784
20505
3248
18007
178872
190032
171931
22553939
162437
132454
23272940
22604125
19706327
0.01
0.00
0.00
0.00
0.00
0.01
0.01
0.01
0.01
0.01
0.00
0.01
0.01
0.01
0.03
0.02
0.15
0.02
0.04
0.08
0.09
0.18
0.20
0.34
0.56
0.48
1.09
4.58
3.40
1.42
14.92
10.37
3.50
38.17
28.90
14.39
63.26
14.50
51.11
286.82
313.60
293.62
20881.82
320.93
220.60
21847.21
21133.88
18380.20
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
8
1
1
1
1
1
1
1
1
8
1
48
48
288
1296
1296
48
54609
1296
970
63691
55278
50
1
1
1
2
2
2
3
3
3
4
4
4
5
5
5
6
6
6
7
7
7
8
8
8
9
9
9
10
10
10
11
11
11
12
12
12
13
12
13
14
14
14
15
14
14
15
15
15
1
1
1
2
2
2
3
3
3
4
4
4
5
5
5
8
14
6
7
7
7
8
11
10
9
12
13
14
15
13
16
16
14
16
16
16
14
16
15
16
16
16
16
16
16
15
16
16
1
1
1
2
2
2
3
3
3
4
4
4
5
5
5
6
11
6
7
7
7
8
8
9
9
9
12
11
13
12
13
16
11
14
16
13
14
13
15
15
16
15
16
15
16
15
16
16
1
1
1
2
2
2
3
3
3
4
4
4
5
5
5
6
9
6
7
7
7
9
8
8
9
9
11
12
13
10
14
16
15
14
16
14
16
12
16
15
16
16
16
16
16
16
16
16
1
1
1
2
2
2
3
3
3
4
4
4
5
5
5
6
9
6
7
7
7
8
10
8
12
9
11
12
13
13
14
12
14
14
15
14
14
12
16
14
16
16
16
15
15
16
16
16
1
1
1
2
2
2
3
3
3
4
4
4
5
5
5
6
11
6
7
7
7
8
8
9
9
9
12
11
14
12
12
16
11
14
15
15
13
12
15
14
16
15
16
15
16
15
16
16
G
219.7
1.84
6.1
6.7
7.7
7.3
7.3
7.3
7.2
EXACT AND APPROXIMATE SPARSE SOLUTIONS OF LINEAR EQUATIONS
15
fail to find solutions close to optimal. Basis pursuit always finds the optimal
solutions for N ≤ √
5. The maximum size of N for which this is guaranteed
by Theorem 2 is ( 2 − 21 )/M ≈ 0.91 · 4 = 3.64. Hence, in this case, the
theoretical bound is quite close to the empirical bound. The other heuristics
all improve upon basis pursuit and often find optimal solutions for N ≤ 9.
For large N , their performance does not show a clear picture; quite frequently
they only find trivial solutions of size 16. There is no clear winner among
the heuristics (excluding basis pursuit), although the bilinear method seems
to be slightly better than the other heuristics (the geometric mean of its
solution values is 7.2, compared to 7.3 for the others). Furthermore, the
bilinear method finds 29 optimal solutions, compared to 22 for basis pursuit,
26 for Mangasarian’s method, 26 for OMP, and 25 for GOMP.
In the second experiment, we allow a deviation of δ = 0.1. Table 2 shows
the results, the notation being the same as in Table 1. Column “nz” gives
the number of nonzeros of the computed solution, and we additionally report
in column “p”, for each heuristic, the number of nonzeros obtained by the
postprocessing method of Section 3.5. Unlike the δ = 0 case, here this
method can often improve the solution. Hence, it seems to be a good idea
to use it after any of the heuristics. The computation times of the heuristics
are again negligible.
Because of the allowed deviation, the optimal solution values are usually
smaller than N . For instance, no optimal solution has more than 10 nonzeros. As a consequence, each instance can be solved by the branch-and-cut
algorithm within two seconds.
The optimal solutions are unique only for N ≤ 5. An explanation is that
the admittance of solutions in a neighborhood around the solution with N
nonzeros allows more candidates. On the other hand, the number of optimal
solutions stays small. It seems that these solutions are more isolated, since
their support is smaller (because of the deviation), and solutions to the
equations with few nonzeros tend to be unique as observed above.
Again the heuristics perform quite well; for N ≤ 10, at least one heuristic (with postprocessing) finds the optimal solution. When comparing the
heuristics, it again turns out that basis pursuit, although its results can be
considerably improved by postprocessing, in general produces worse solutions
than the other heuristics. The best heuristic with respect to the geometric
mean over the number of produced nonzeros is GOMP, followed by OMP,
Mangasarian’s method, the bilinear approach, and finally basis pursuit, but
the numbers are quite close together for the first four. Note that the geometric mean of 5.3 for GOMP with postprocessing is not far from the geometric
mean over all optimal solutions with 5.1, emphasizing the good quality of the
produced solutions. Furthermore, basis pursuit finds 21 optimal solutions after postprocessing, Mangasarian’s method 32, OMP 35, GOMP 39, and the
bilinear method finds 27 optimal solutions after postprocessing. Hence, for
these instances, GOMP performs better than the other heuristics, but as we
will see later, it is also quite time consuming for larger instances.
4.1.2. The case m = 128. We experimented with the branch-and-cut algorithm for larger instances. It turned out that already for m = 32, it
cannot solve the instances with large N to optimality within any reasonable
16
SADEGH JOKAR AND MARC E. PFETSCH
Table 2: Results of the branch-and-cut algorithm and heuristics for Hadamard matrices
with m = 16 and δ = 0.1. The notation is as in Table 1. Column “nz” gives the number
of nonzeros in a solution and column “p” the number of nonzeros after the postprocessing
method of Section 3.5.
BP
nz p
MG
nz p
OMP
nz p
GOMP
nz p
BL
nz p
1
0
1
2
2
2
2
3
3
3
4
3
3
4
2
5
5
6
5
6
4
7
6
6
6
7
7
7
9
8
7
7
8
9
8
10
9
10
9
10
10
9
9
10
10
9
10
10
5
0
4
6
6
2
6
8
7
8
7
6
5
13
6
9
9
12
10
13
11
10
8
12
8
11
12
13
15
12
13
11
13
14
15
11
12
16
14
15
14
13
15
16
14
14
16
16
1
0
1
2
2
2
2
3
3
3
4
3
3
4
3
5
5
8
5
7
4
9
7
8
6
7
7
7
9
8
7
9
10
13
9
10
10
15
9
10
11
10
10
12
11
13
10
11
1
1
1
2
2
2
2
3
3
3
4
3
3
4
3
5
5
6
5
7
4
9
6
9
7
7
7
8
10
8
7
8
10
10
9
11
11
12
10
12
11
11
11
11
10
10
11
12
1
1
1
2
2
2
2
3
3
3
4
3
3
4
2
5
5
6
5
6
4
7
6
6
6
7
7
7
9
8
7
8
9
9
9
10
11
11
9
10
10
10
10
10
11
10
10
11
1
0
1
2
2
2
2
3
3
3
4
3
3
4
4
5
5
9
5
10
7
9
7
11
6
7
10
7
10
8
9
9
9
11
11
10
11
11
13
11
11
12
12
13
10
11
11
11
5.1
9.5 6.1
N
nodes
time
sols
opt
1
1
1
2
2
2
3
3
3
4
4
4
5
5
5
6
6
6
7
7
7
8
8
8
9
9
9
10
10
10
11
11
11
12
12
12
13
13
13
14
14
14
15
15
15
16
16
16
1
1
1
2
2
1
2
4
1
2
2
2
4
2
2
4
4
10
4
31
4
4
15
31
11
27
5
15
165
35
27
27
62
176
185
431
183
388
103
377
430
160
181
352
369
153
463
375
0.00
0.00
0.00
0.01
0.01
0.00
0.01
0.02
0.02
0.01
0.02
0.01
0.01
0.02
0.01
0.03
0.03
0.05
0.03
0.11
0.02
0.09
0.07
0.10
0.05
0.11
0.15
0.10
0.53
0.21
0.09
0.09
0.32
0.52
0.56
1.44
0.57
1.56
0.45
1.73
1.69
0.49
0.57
1.47
1.22
0.72
1.78
1.42
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
10
3
2
1
2
1
1
9
1
2
2
3
8
42
5
11
1
2
10
1
45
3
87
6
44
78
3
2
12
18
2
76
32
G
18.5
0.14
3.3
1
0
1
2
2
2
2
3
3
3
4
3
3
4
2
5
5
9
5
7
7
10
7
11
6
7
12
10
10
9
10
9
9
13
12
10
11
13
11
13
11
13
13
13
12
13
11
14
1
0
1
2
2
2
2
3
3
3
4
3
3
4
3
5
5
6
5
7
4
7
7
8
6
7
7
7
9
8
7
8
10
11
9
10
10
14
9
10
11
10
10
12
10
12
10
11
5.6 5.5
1
0
1
2
2
2
2
3
3
3
4
3
3
4
2
5
5
6
5
7
4
9
6
8
6
7
7
7
9
8
7
8
10
9
9
11
11
12
9
10
10
9
10
11
10
9
11
11
5.6 5.4
1
0
1
2
2
2
2
3
3
3
4
3
3
4
2
5
5
6
5
6
4
7
6
6
6
7
7
7
9
8
7
8
9
9
9
10
11
11
9
10
10
10
10
10
11
9
10
11
5.3 5.3
1
0
1
2
2
2
2
3
3
3
4
3
3
4
3
5
5
8
5
8
7
7
7
11
6
7
9
7
9
8
7
8
9
11
11
10
11
11
11
10
11
12
12
12
10
11
11
11
5.9 5.7
EXACT AND APPROXIMATE SPARSE SOLUTIONS OF LINEAR EQUATIONS
N
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
G
BP
nz time
10.0
15.0
20.0
25.0
30.0
35.0
40.0
55.7
70.1
100.3
127.2
127.9
128.0
128.0
128.0
(0.07)
(0.06)
(0.07)
(0.07)
(0.07)
(0.08)
(0.08)
(0.08)
(0.09)
(0.09)
(0.08)
(0.08)
(0.08)
(0.08)
(0.08)
51.1 (0.08)
MG
nz time
10.0
15.0
20.0
25.0
30.0
35.0
40.0
45.0
50.9
55.0
60.0
67.7
74.9
118.8
113.7
(0.10)
(0.10)
(0.12)
(0.12)
(0.12)
(0.12)
(0.13)
(0.13)
(0.15)
(0.15)
(0.17)
(0.18)
(0.22)
(0.33)
(0.33)
41.1 (0.15)
nz
OMP
time
p
nz
GOMP
time
p
10.0
15.0
20.0
25.0
30.0
35.0
40.0
45.0
50.0
56.7
60.4
80.5
91.2
117.8
122.5
(0.01)
(0.01)
(0.01)
(0.02)
(0.02)
(0.03)
(0.03)
(0.04)
(0.05)
(0.07)
(0.08)
(0.20)
(0.27)
(0.46)
(0.51)
10.0
15.0
20.0
25.0
30.0
35.0
40.0
45.0
50.0
56.0
60.0
79.5
89.6
117.2
121.9
10.0
15.0
20.0
25.0
30.0
35.0
40.0
45.0
50.1
56.6
60.3
78.0
99.9
113.2
120.2
(0.11)
(0.33)
(0.72)
(1.31)
(2.13)
(3.24)
(4.63)
(6.33)
(8.41)
(11.62)
(13.61)
(28.87)
(50.85)
(65.26)
(73.78)
10.0
15.0
20.0
25.0
30.0
35.0
40.0
45.0
50.0
55.9
60.0
77.2
98.8
113.0
119.8
42.4 (0.05) 42.2
42.4 (5.26) 42.3
17
BL
nz time
10.0
15.0
20.0
25.0
30.0
35.0
40.0
45.0
50.0
56.0
65.6
69.7
94.2
123.4
127.0
(0.09)
(0.10)
(0.11)
(0.12)
(0.12)
(0.13)
(0.14)
(0.13)
(0.15)
(0.15)
(0.16)
(0.14)
(0.18)
(0.18)
(0.17)
42.5 (0.14)
Table 3: Results of the heuristics for Hadamard matrices with m = 128 and δ = 0.
Columns “p” give number of nonzeros after postprocessing. The entries are averages over
ten instances. The last row contains the geometric mean of the entries in each column.
time. Furthermore, the dual bounds can only provide meaningful information for very small N (approx. N ≤ 35). We therefore decided to test
only the heuristics on larger instances and present results for m = 128 and
N = {10, 15, 20, . . . , 80} for δ = 0 and N = {10, 15, . . . , 100} for δ = 0.1.
For each N we generated ten instances (i.e., random right hand sides). The
entries in the tables report average results.
Table 3 shows the results of the heuristics for δ = 0. We skipped the
columns for the postprocessing method, in which no result could be improved. The last row again contains the geometric mean of each column.
We observe that all heuristics could find a solution of value N for N ≤ 40.
It turns out that Mangasarian’s approach produces the sparsest solutions
in geometric mean, closely followed by OMP, GOMP, the bilinear method,
and finally basis pursuit. Note that GOMP does not improve upon OMP
in the geometric mean, but this is mainly because of the bad performance
for N = 70. For all other N , GOMP is better than OMP on average. GOMP,
however, needs much larger computation times. Note also that basis pursuit
only produces useless results for N ≥ 55. Furthermore, the geometric mean
of Mangasarian’s approach with 41.1 is not too far from the geometric mean
over all N with 38.6, indicating the good performance of this heuristic.
Table 4 shows the results for the case δ = 0.1. Because of space constraints, we skipped the information about the running times, except for
GOMP; the other running times are similarly small as in Table 3. The results again show that postprocessing pays off. Altogether, GOMP produces
the sparsest solutions in geometric mean after postprocessing, followed Mangasarian’s approach, OMP, and the bilinear method; basis pursuit performs
clearly worse than the other methods. All heuristics produce solutions whose
number of nonzeros in the geometric mean are below the geometric mean over
all N with 46.4.
18
SADEGH JOKAR AND MARC E. PFETSCH
BP
nz
N
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
G
12.7
20.2
26.7
34.4
42.0
53.8
57.9
67.8
76.4
82.1
91.4
96.5
100.4
103.5
105.1
108.0
109.7
112.2
114.0
MG
nz
p
p
7.2
10.8
16.4
20.1
23.7
31.5
35.3
42.9
50.6
53.2
65.3
70.7
78.6
84.9
85.6
87.0
89.8
90.5
93.9
7.3
11.1
15.6
20.7
23.9
29.3
31.6
35.9
39.2
43.2
48.6
53.2
57.8
62.9
65.8
70.8
72.7
73.1
72.8
63.7 43.5
OMP
nz
p
7.2
10.9
15.6
19.9
23.4
29.3
31.3
35.5
38.9
42.7
48.3
53.0
57.4
62.6
65.8
70.3
72.6
72.6
72.7
8.4
12.6
18.0
22.0
26.3
32.0
34.3
39.4
43.1
48.9
51.8
61.8
65.9
70.4
71.4
77.2
76.7
78.3
78.7
36.9 36.6
7.3
11.2
15.7
19.5
23.5
29.0
30.9
35.5
39.1
43.9
47.4
56.3
59.1
65.4
66.7
71.7
71.4
72.9
73.4
40.7 36.9
nz
GOMP
time
p
7.3
10.9
15.2
19.6
23.1
28.5
30.2
35.0
38.3
41.8
47.3
52.1
59.4
62.1
63.6
68.3
69.7
71.7
72.6
(0.51)
(0.94)
(1.55)
(2.33)
(2.83)
(4.55)
(4.94)
(6.32)
(8.03)
(8.89)
(11.07)
(12.68)
(15.09)
(15.71)
(17.72)
(17.83)
(19.40)
(20.46)
(19.79)
7.3
10.8
15.2
19.6
23.1
28.5
30.2
35.0
38.1
41.7
47.2
51.9
59.1
62.1
63.6
68.3
69.5
71.5
72.6
36.0 (6.63) 36.0
BL
nz
8.1
11.0
15.5
21.0
23.7
29.1
30.7
37.8
39.6
43.4
50.5
55.2
61.9
69.1
68.6
73.6
75.7
76.1
77.4
p
7.4
10.9
15.2
20.2
23.4
28.7
30.2
36.8
39.2
42.6
49.8
54.6
60.5
68.3
67.3
72.3
75.0
75.3
76.5
38.1 37.3
Table 4: Results of the heuristics for Hadamard matrices with m = 128 and δ = 0.1.
N
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
G
BP
nz time
10.0
15.0
20.0
25.0
30.0
35.0
40.0
45.0
89.0
120.7
128.0
128.0
128.0
128.0
128.0
(0.16)
(0.16)
(0.18)
(0.17)
(0.18)
(0.19)
(0.20)
(0.21)
(0.21)
(0.20)
(0.20)
(0.20)
(0.20)
(0.20)
(0.20)
51.8 (0.19)
MG
nz time
10.0
15.0
20.0
25.0
30.0
35.0
40.0
45.0
50.0
55.0
60.0
71.3
81.6
106.8
113.5
(0.29)
(0.29)
(0.35)
(0.31)
(0.31)
(0.34)
(0.36)
(0.40)
(0.38)
(0.43)
(0.45)
(0.57)
(0.67)
(0.87)
(0.92)
41.2 (0.43)
nz
OMP
time
p
nz
GOMP
time
p
10.0
15.0
20.1
25.3
30.4
35.9
41.6
48.0
59.8
82.4
108.7
121.3
127.2
127.3
127.3
(0.00)
(0.01)
(0.01)
(0.01)
(0.02)
(0.02)
(0.03)
(0.05)
(0.10)
(0.22)
(0.40)
(0.49)
(0.53)
(0.54)
(0.53)
10.0
15.0
20.0
25.0
30.0
35.0
40.0
45.0
57.7
76.4
106.9
121.0
127.2
127.3
127.3
10.0
15.0
20.1
25.2
30.1
35.2
41.5
47.6
52.4
65.2
102.2
125.9
122.0
126.0
125.7
(0.11)
(0.33)
(0.72)
(1.32)
(2.16)
(3.25)
(5.05)
(7.38)
(9.42)
(19.11)
(53.99)
(78.55)
(73.96)
(78.63)
(78.24)
10.0
15.0
20.0
25.0
30.0
35.0
40.0
45.0
50.0
62.1
99.3
125.9
120.1
126.0
125.7
49.0 (0.09) 48.0
47.4 (6.81) 46.6
BL
nz time
10.0
15.0
20.0
25.0
30.0
35.0
40.0
45.0
50.0
62.2
80.3
96.3
104.6
117.4
127.7
(0.24)
(0.24)
(0.31)
(0.28)
(0.28)
(0.31)
(0.32)
(0.36)
(0.36)
(0.44)
(0.42)
(0.42)
(0.44)
(0.45)
(0.44)
44.5 (0.35)
Table 5: Results of the heuristics for random matrices with m = 128 and δ = 0.
4.2. Experiments with Random Matrices
In our final two experiments, we constructed random matrices by taking 256 unit random vectors of size 128. Similar random matrices and experiments have been applied in many articles, see, e.g., Candès and Tao [11].
We ran the heuristics for ten instances each and report the averages in Table 5 for the case of δ = 0 and in Table 6 for δ = 0.1. The notation is similar
to Tables 3 and 4.
EXACT AND APPROXIMATE SPARSE SOLUTIONS OF LINEAR EQUATIONS
N
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
G
BP
nz
p
13.7
21.6
30.3
37.3
43.3
47.0
54.9
60.2
62.9
70.2
75.9
78.8
80.4
83.2
86.0
90.2
89.1
90.8
94.3
5.8
10.5
15.0
25.3
27.8
34.1
40.0
45.1
50.5
54.1
60.5
61.9
63.4
67.8
70.3
74.3
69.8
73.1
75.8
57.1 40.5
MG
nz
p
6.0
7.7
11.1
14.9
17.9
20.6
23.4
27.6
30.6
32.2
36.6
38.9
42.6
44.3
46.4
47.0
48.9
51.0
53.1
5.6
7.6
10.9
14.9
17.6
20.2
23.1
26.9
30.4
31.9
36.2
38.6
42.3
44.2
46.1
46.4
48.7
50.5
52.7
26.8 26.5
OMP
nz
p
5.3
8.4
12.5
17.8
20.7
24.3
29.1
33.6
36.4
44.2
51.7
54.6
60.6
60.6
63.2
66.8
66.8
66.0
71.1
5.1
7.6
10.6
15.2
18.2
21.4
24.6
29.3
31.3
36.8
44.4
46.6
51.6
53.0
55.1
57.7
57.1
58.7
60.5
33.7 29.3
nz
GOMP
time
p
5.3
7.6
10.8
15.3
17.9
21.0
24.2
29.0
32.8
36.3
38.8
44.3
49.6
50.0
51.7
55.3
53.6
55.4
56.4
(1.53)
(2.20)
(3.38)
(4.97)
(6.49)
(8.33)
(10.37)
(12.97)
(15.88)
(19.12)
(21.44)
(26.54)
(31.65)
(31.85)
(34.56)
(37.69)
(36.07)
(38.13)
(40.34)
5.3
7.6
10.6
15.2
17.9
20.7
24.0
28.8
31.9
35.8
38.6
43.4
48.6
49.7
50.6
53.7
52.7
53.9
55.4
28.5 (13.89) 28.1
19
BL
nz
p
6.4
7.8
11.0
15.8
19.2
21.5
24.6
29.5
32.6
34.6
38.1
40.2
44.8
46.3
49.1
51.0
51.2
53.8
54.1
5.4
7.7
10.7
15.3
18.1
20.7
24.3
28.3
31.4
33.4
37.7
39.6
43.9
45.0
48.7
50.1
50.3
52.6
53.3
28.2 27.3
Table 6: Results of the heuristics for random matrices with m = 128 and δ = 0.1.
The general outcome of the experiments for δ = 0 are similar to the deterministic case. Basis pursuit fails to recover all solutions of size N beginning
at N = 50, while Mangasarian’s approach yields perfect recovery for N ≤ 60.
The best performance in the geometric mean is achieved by Mangasarian’s
approach. Recall that the geometric mean over all N is 38.6, and hence
Mangasarian’s approach with 41.2 yields good results.
The results for δ = 0.1 confirm the above results. Clearly the best heuristic
for these random instances is Mangasarian’s approach, followed by the bilinear method, GOMP (which is again much more expensive than the other
heuristics), OMP, and, clearly the worst heuristic, basis pursuit. A comparison to the geometric mean of 46.4 over all N shows that the solutions
of the heuristics are of considerable smaller size than N , in the geometric
mean. The results also show that the heuristics, in general, produce very
good results (up to about N = 70 for δ = 0 and generally for δ = 0.1).
Comparing the results for Hadamard and random matrices, the picture is
not clear. Surprisingly, for δ = 0, the heuristics produce solutions with more
nonzeros in the geometric mean for random matrices than for Hadamard
matrices, while for δ = 0.1 the reverse holds. The results for δ = 0 are not in
conflict with theory, since the known good behavior of random matrices, for
instance in Theorem 3, hold only asymptotically and with high probability.
5. Conclusions
The results of this paper can be summarized as follows. An exact solution
of the sparse representation problem by branch-and-cut seems to be hard,
especially if the optimal solutions have large support. Nevertheless, the
optimal solutions obtained in this way allow to evaluate the performance
20
SADEGH JOKAR AND MARC E. PFETSCH
52
44
42
48
40
44
38
36
40
BP
MG
OMP GOMP
BL
BP
MG
OMP GOMP
(a) Hadamard matrices, δ = 0
(b) Hadamard matrices, δ = 0.1
52
44
BL
40
48
36
32
44
28
40
28
BP
MG
OMP GOMP
BL
BP
MG
OMP GOMP
BL
(c) Random matrices, δ = 0
(d) Random matrices, δ = 0.1
Figure 2: Comparison of the geometric means for the experiments of Section 4.1.2 and 4.2
with m = 128.
of heuristics. Furthermore, this exact approach can be used to investigate
the uniqueness of the optimal solutions. The results show that uniqueness
occurs for instances with a much larger number of nonzeros than guaranteed
by theory.
Concerning the heuristics, Figure 2 shows a comparison of the results of
all experiments with m = 128. It turns out that basis pursuit consistently
is the worst heuristic. In total, Mangasarian’s approach seems to be the
best heuristic. GOMP improves the results of OMP, but is computationally
too expensive for larger instances. OMP itself yields quite good results as
well. The bilinear method is somewhat in between the other approaches; for
random instances it works better than for Hadamard matrices. If deviations
are allowed, postprocessing usually helps to reduce the number of nonzeros
for all heuristics.
Finally, the computation times of the heuristics (except GOMP) are quite
reasonable for the problem sizes we consider. This of course might change
for larger practically relevant instances. OMP, which does not use linear
programming, is in general faster than the other approaches and might be
of use for these cases. But OMP is also a bit worse in terms of quality of
solutions.
References
[1] T. Achterberg, SCIP – A framework to integrate constraint and mixed integer programming, Report 04–19, ZIB, 2004. http://www.zib.de/Publications/abstracts/
ZR-04-19/.
[2] E. Amaldi, On the complexity of designing compact perceptrons and some consequences, in Electronic Proc. Fifth International Symposium on Artificial Intelligence
and Mathematics, Fort Lauderdale, Florida, E. Boros and R. Greiner, eds., RUTCOR,
1999.
EXACT AND APPROXIMATE SPARSE SOLUTIONS OF LINEAR EQUATIONS
21
[3] E. Amaldi and V. Kann, On the approximability of minimizing nonzero variables
or unsatisfied relations in linear systems, Theor. Comput. Sci. 209, no. 1–2 (1998),
pp. 237–260.
[4] E. Amaldi, M. E. Pfetsch, and L. E. Trotter, Jr., On the maximum feasible
subsystem problem, IISs, and IIS-hypergraphs, Math. Program. 95, no. 3 (2003),
pp. 533–554.
[5] K. Bennett and O. Mangasarian, Bilinear separation of two sets in n-space,
Comput. Optim. Appl. 2 (1993), pp. 207–227.
[6] K. P. Bennett and E. J. Bredensteiner, A parametric optimization method for
maschine learning, INFORMS J. Comput. 9, no. 3 (1997), pp. 311–318.
[7] E. J. Candès and J. Romberg, Quantitative robust uncertainty principles and
optimally sparse decompositions, Found. Comput. Math. 6, no. 2 (2006), pp. 227–
254.
[8] E. J. Candès, J. Romberg, and T. Tao, Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information, IEEE Trans. Inform.
Theory 52, no. 2 (2006), pp. 489–509.
[9] E. J. Candès, J. Romberg, and T. Tao, Stable signal recovery from incomplete
and inaccurate measurements, Comm. Pure Appl. Math. 59, no. 8 (2006), pp. 1207–
1223.
[10] E. J. Candés, M. Rudelson, T. Tao, and R. Vershynin, Error correction via
linear programming, in Proc. 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS05), IEEE, 2005, pp. 295–308.
[11] E. J. Candès and T. Tao, Decoding by linear programming, IEEE Trans. Inform.
Theory 51, no. 12 (2005), pp. 4203–4215.
[12] S. S. Chen, Basis Pursuit, PhD thesis, Standford University, 1995.
[13] S. S. Chen, D. L. Donoho, and M. A. Saunders, Atomic decmposition by basis
pursuit, SIAM J. Sci. Comput. 20, no. 1 (1999), pp. 33–61.
[14] G. Davis, S. Mallat, and Z. Zhang, Adaptive time-frequency decompositions with
matching pursuits, Opt. Eng. 33, no. 7 (1994), pp. 2183–2191.
[15] D. L. Donoho, Compressed sensing, IEEE Trans. Inform. Theory 52, no. 4 (2006),
pp. 1289–1306.
[16] D. L. Donoho, For most large underdetermined systems of equations, the minimal
ℓ1 -norm near-solution approximates the sparsest near-solution, Comm. Pure Appl.
Math. 59, no. 7 (2006), pp. 907–934.
[17] D. L. Donoho, For most large underdetermined systems of linear equations the
minimal ℓ1 -norm solution is also the sparsest solution, Comm. Pure Appl. Math. 59,
no. 6 (2006), pp. 797–829.
[18] D. L. Donoho and M. Elad, Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization, Proc. Natl. Acad. Sci. USA 100, no. 5
(2003), pp. 2197–2202.
[19] D. L. Donoho and M. Elad, On the stability of the basis pursuit in the presence
of noise, Signal Process. 86 (2006), pp. 511ŋ–532.
[20] D. L. Donoho, M. Elad, and V. Temlyakov, Stable recovery of sparse overcomplete representations in the presence of noise, IEEE Trans. Inform. Theory 52, no. 1
(2006), pp. 6–18.
[21] D. L. Donoho and X. Huo, Uncertainty principles and ideal atomic decomposition,
IEEE Trans. Inform. Theory 47, no. 7 (2001), pp. 2845–2862.
[22] D. L. Donoho, Y. Tsaig, I. Drori, and J.-L. Starck, Sparse solution of underdetermined linear equations by stagewise orthogonal matching pursuit, Tech. Report
2006-02, Standford, Department of Statistics, 2006.
[23] M. Elad, Sparse representations are most likely to be the sparsest possible, EURASIP
J. Appl. Signal Process. 2006 (2006), pp. 1ŋ–12.
[24] M. Elad and A. M. Bruckstein, A generalized uncertainty principle and sparse
representation in pairs of bases, IEEE Trans. Inform. Theory 48, no. 9 (2002),
pp. 2558–2567.
[25] J. J. Fuchs, Recovery of exact sparse representations in the presence of bounded
noise, IEEE Trans. Inform. Theory 51, no. 10 (2005).
22
SADEGH JOKAR AND MARC E. PFETSCH
[26] J.-J. Fuchs, Recovery conditions of sparse representations in the presence of noise,
in Proc. ICASSP 2006, IEEE, 2006, pp. 337–340.
[27] M. R. Garey and D. S. Johnson, Computers and Intractability. A Guide to the
Theory of NP-Completeness, W. H. Freeman and Company, New York, 1979.
[28] S. Mallat and Z. Zhang, Matching pursuit in a time-frequency dictionary, IEEE
Trans. Inform. Theory 41 (1993), pp. 3397–3415.
[29] O. L. Mangasarian, Minimum-support solutions of polyhedral concave programs,
Optimization 45 (1999), pp. 149–162.
[30] B. K. Natarajan, Sparse approximate solutions to linear systems, SIAM J. Comput.
24, no. 2 (1995), pp. 227–234.
[31] Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad, Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decompositions,
in Proc. 27th Asilomar Conference on Signals, Systems and Computers, A. Singh,
ed., 1993.
[32] M. E. Pfetsch, The Maximum Feasible Subsystem Problem and Vertex-Facet Incidence of Polyhedra, PhD thesis, TU Berlin, 2002.
[33] M. E. Pfetsch, Branch-and-cut for the maximum feasible subsystem problem, ZIBReport 05-46, Zuse Institute Berlin, 2005.
[34] A. Schrijver, Theory of linear and integer programming, John Wiley & Sons, Chichester, 1986. Reprint 1998.
[35] T. Strohmer and j. Robert W. Heath, Grassmannian frames with applications to
coding and communication, Appl. Comput. Harmon. Anal. 14, no. 3 (2003), pp. 257–
275.
[36] J. Tropp and A. C. Gilbert, Signal recovery from partial information via orthogonal matching pursuit. Preprint, Apr. 2005.
[37] J. A. Tropp, Greed is good: Algorithmic results for sparse approximations, IEEE
Trans. Inform. Theory 50, no. 10 (2004), pp. 2231–2242.
[38] J. A. Tropp, Just relax: Convex programming methods for identifying sparse signals
in noise, IEEE Trans. Inform. Theory 52, no. 3 (2006), pp. 1030–1051.
[39] G. Yarmish, Wavelet decomposition via the standard tableau simplex method of linear
programming. Preprint, available at www.optimization-online.org, 2006.
Sadegh Jokar, Insitute of Mathematics, TU Berlin, Straße des 17. Juni
136, 10623 Berlin, Germany
E-mail address: jokar@math.tu-berlin.de
Marc E. Pfetsch, Zuse Institute Berlin, Takustr. 7, 14195 Berlin, Germany
E-mail address: pfetsch@zib.de
View publication stats