[go: up one dir, main page]

Academia.eduAcademia.edu
EXACT AND APPROXIMATE SPARSE SOLUTIONS OF UNDERDETERMINED LINEAR EQUATIONS SADEGH JOKAR AND MARC E. PFETSCH Abstract. In this paper, we empirically investigate the NP-hard problem of finding sparsest solutions to linear equation systems, i.e., solutions with as few nonzeros as possible. This problem has received considerable interest in the sparse approximation and signal processing literature, recently. We use a branch-and-cut approach via the maximum feasible subsystem problem to compute optimal solutions for small instances and investigate the uniqueness of the optimal solutions. We furthermore discuss five (modifications of) heuristics for this problem that appear in different parts of the literature. For small instances, the exact optimal solutions allow us to evaluate the quality of the heuristics, while for larger instances we compare their relative performance. One outcome is that the so-called basis pursuit heuristic performs worse, compared to the other methods. Among the best heuristics are a method due to Mangasarian and a bilinear approach. 1. Introduction The area of sparse approximations has seen tremendous research progress in the recent years. At the core of many of these activities are results that ensure the efficient recovery of sparse solutions, i.e., solutions having few nonzeros, to linear systems Ax = b with A ∈ m×n , b ∈ m . The basic idea is to let A be a composition of several bases of m and hence n > m. This implies that b has several representations, often guaranteeing sparse solutions to the equation system: Under appropriate assumptions on A and if b is known to have a sufficiently sparse representation, Donoho and Huo [21] showed that the sparsest representation is unique. Moreover, a striking result states that, under slightly stronger assumptions, one can efficiently find this representation by solving a linear program (LP), which models the minimization of the ℓ1 -norm of x subject to Ax = b. This fact was first investigated empirically by Chen, Donoho, and Saunders [13] and then theoretically studied by Donoho and Huo [21]. This approach is commonly called basis pursuit. Many strengthenings of this type of result have been obtained, see Section 1.3 for more details. A generalization to the sparse representation problem is to allow Ax deviate from b with respect to some ℓp -norm, i.e., to look for the sparsest solution x satisfying kAx − bkp ≤ δ for given δ > 0, where p ∈ [1, ∞]. The literature investigates conditions on A and b such that a solution minimizing the ℓ1 -norm under such constraints is near a sparsest solution. Donoho, R R R Date: March 2007. Supported by the DFG Research Center Matheon “Mathematics for key technologies” in Berlin. 1 2 SADEGH JOKAR AND MARC E. PFETSCH Elad, and Temlyakov [20], Tropp [38], Donoho [16], Fuchs [25], and Candès, Romberg, and Tao [9] deal with the case p = 2. Donoho and Elad [19] and Fuchs [26] investigate p = ∞. Another type of algorithm has its origin in the so-called matching pursuit approach, see Mallat and Zhang [28]. Similar results to the above mentioned can be proved for orthogonal matching pursuit, a variant of matching pursuit, developed by Pati, Rezaiifar, and Krishnaprasad [31], and Davis, Mallat, and Zhang [14]. For the results see Tropp [37], Donoho, Elad, and Temlyakov [20], and Tropp and Gilbert [36]. The sparse representation problem is closely connected to the maximum feasible subsystem problem (Max FS), in which one searches for a largest feasible subsystem of an infeasible linear inequality system; see Section 2 for a more detailed description and literature. In this context several heuristics have been proposed, e.g., a method using a bilinear formulation by Bennett and Bredensteiner [6] and the work of Mangasarian [29]. Unfortunately, the combinatorial optimization problem of finding a sparsest solution to an equation system is computationally hard, i.e., NP-hard (the same holds for Max FS). The above mentioned results investigate conditions under which sparse solutions can nevertheless be found efficiently and the analyses have become increasingly more exact. In practical applications, however, one needs sparse solutions also when the conditions investigated in the literature do not hold. One approach to this problem is to develop an exact combinatorial optimization method that can compute an optimal sparsest solution for practical problem sizes. The theoretical hardness results do not exclude such methods, since computational complexity captures worst-case and asymptotical behavior only. It seems, however, that the literature on sparse representations generally believes such an approach to be impossible. A second approach is to settle for algorithms that efficiently compute good approximate solutions to the above problem, i.e., they find solutions that have not much more nonzeros than the optimal solutions. In this paper, we empirically investigate an exact branch-and-cut algorithm that was originally developed for Max FS. We furthermore investigate the practical behavior of five approximate methods: Besides the above mentioned basis pursuit and orthogonal matching pursuit, we propose a variant of orthogonal matching pursuit that improves upon its results, but is computationally more expensive. We also use the nonlinear optimization method due to Mangasarian, mentioned above. Finally, we adapt the bilinear optimization approach of Bennett and Bredensteiner to the case of finding sparse solutions. We consider sparse solutions satisfying the given equation system Ax = b and also solutions that satisfy kAx − bk∞ ≤ δ for a given δ > 0. The availability of the exact approach enables us to estimate the quality of heuristics via the optimum or the lower bound that is produced by the branch-and-cut approach. The exact algorithm can also be used to check whether the sparsest solution is unique, a key feature of the investigated problem. The findings of this paper can be summarized as follows. We confirm the common pessimism about the optimal solvability of the exact sparse representation problem. It is – both theoretically and empirically – a very EXACT AND APPROXIMATE SPARSE SOLUTIONS OF LINEAR EQUATIONS 3 hard problem: the exact branch-and-cut algorithm can only solve very small instances to optimality, or instances in which the optimal solution has few nonzeros. On the positive side, the heuristics perform quite well, in particular, Mangasarian’s approach. It is also a good idea to greedily decrease the support of a solution produced by a heuristic, if possible. Basis pursuit turns out to be the worst of the studied heuristics. If one wants to improve upon basis pursuit or orthogonal matching pursuit, however, one has to pay the price of solving several LPs. The structure of this article is as follows. After fixing some notation and introducing the basic problems of this paper, in Section 1.3 we review results from the literature. This includes results on the computational complexity of the problems and examples of the representation theorems that can be proved in our context. In Section 2, we discuss our exact approach via the maximum feasible subsystem problem. Section 3 contains the introduction of the five heuristics mentioned above. In Section 4, we present computational experiments with the exact and approximate algorithms. We close with some final remarks. 1.1. Notation We depict vectors in bold font, and for two vectors x and y, we write xT y for their inner product. We denote by 1 the vector of all ones and by 0 the zero vector of appropriate dimensions. For a positive integer n, we define [n] := {1, . . . , n}. For x ∈ n , we denote by supp(x) = {j ∈ [n] : xj 6= 0} the support of x and define the ℓ0 quasi-norm to be the size of the support, i.e., kxk0 := #{supp(x)}. We will also use the ℓ1 , Euclidean ℓ2 , and max ℓ∞ -norm: n n 1 X X 2 |xj |, kxk2 = x2j , and kxk∞ = max |xj |, kxk1 = R j=1 j∈[n] j=1 respectively. 1.2. Basic Problems The basic problem of this paper is to find the sparsest solution to Ax = b, where A ∈ m×n is of full rank and b ∈ m . We therefore want to solve: R R (P0 ) min {kxk0 : Ax = b}. We assume that m < n, and hence the solutions of Ax = b are underdetermined. We also consider the variant of this problem, where we allow deviations from equality, i.e., we allow solutions x that satisfy kAx − bk∞ ≤ δ for some δ ≥ 0. This problem can be posed as follows. (P0δ ) min {kxk0 : b − δ · 1 ≤ Ax ≤ b + δ · 1}. Clearly, problem (P0 ) is equivalent to (P00 ), but we will often distinguish the two, since specialized algorithms for (P0 ) might be more efficient. For convenience, we define the sets of feasible points Rn : Ax = b} and Qδ := {x ∈ Rn : b − δ · 1 ≤ Ax ≤ b + δ · 1}. Q := {x ∈ 4 SADEGH JOKAR AND MARC E. PFETSCH We will always assume that Qδ 6= ∅ (for δ = 0 this is implied by the full rank of A). The above problems only become interesting if 0 ∈ / Qδ . For 0 Q = Q this is equivalent to the assumption that b 6= 0. For Qδ we need δ < max{|b1 |, . . . , |bm |} = kbk∞ . Remark. We take the ℓ∞ -norm as a measure for deviations of equality in this paper, since it is the most suitable for linear programming based approaches. With some effort one could include deviations w.r.t. the ℓ1 -norm, but using the ℓ2 -norm would turn (P0δ ) into a nonlinear mixed-integer problem, which would make our exact solution approach even more difficult. The ℓ2 -norm is usually used in the literature, see, for instance, Donoho, Elad, and Temlyakov [20], and Tropp [38]. Replacing the quasi-norm k·k0 by k·k1 in (P0 ) and (P0δ ) yields the following two problems. (P1 ) (P1δ ) min {kxk1 : x ∈ Q} min {kxk1 : x ∈ Qδ }. These problems are efficiently solvable by linear programming methods, see Section 3.1. The problems (P1 ) and (P1δ ) will be used several times later on. 1.3. Results from the Literature In this section, we collect results from the literature concerning sparse solutions of Ax = b. We first deal with the computational complexity properties of the problem. These seem hardly to be known in the sparse representation literature. The decision version of (P0 ) is one of the classical NP-complete problems, see Garey and Johnson [27], i.e., if P 6= NP there exists no polynomial time algorithm that for every instance computes the optimal solution of (P0 ). Since (P0δ ) contains (P0 ) as a special case, (P0δ ) is NP-hard as well; Natarajan [30] presents an NP-hardness proof for δ > 0. Moreover, both problems are hard to approximate, i.e., if a very likely assumption is true (see below), there is no polynomial time algorithm that 1−ρ computes a solution with a support that is at most 2log n times as large as the support of an optimal solution, for any ρ > 0, see Amaldi and Kann [3] for (P0 ) and Amaldi [2] for (P0δ ). The precise formulation is as follows. Let DTIME(spolylog s ) be the class of problems whose instances of size s can be solved in deterministic time O(spolylog s ), where polylog s refers to any polynomial in log s. The assumption NP * DTIME(spolylog s ) is stronger than P 6= NP, but it is also believed to be extremely likely; it amounts to the claim that not all problems in NP can be solved in quasi-polynomial time. Then, assuming NP * DTIME(spolylog s ), (P0 ) and (P0δ ) cannot be approximated in poly1−ρ nomial time within a factor 2log n , for any ρ > 0 (n is the number of columns of A). This shows that (P0 ) and (P0δ ) are indeed theoretically hard optimization problems. Having the above negative results in mind, it is quite surprising that strong statements about efficient solutions to the sparse representation problem exist. We give a flavor of these results, for which we assume that A = [B 1 B 2 ], EXACT AND APPROXIMATE SPARSE SOLUTIONS OF LINEAR EQUATIONS 5 where B 1 = [p1 , . . . , pm ] and B 2 = [q 1 , . . . , q m ] are two orthonormal bases of m (and hence A ∈ m×2m ). The following number, called mutual incoherence, is central for the results: R R M := M (B 1 , B 2 ) := max{|pT i q j | : 1 ≤ i, j ≤ m}. √ Note that M ≤ 1, because B 1 and B 2 are normalized and M ≥ 1/ m (see, e.g., Donoho and Huo [21]). It turns out that the smaller the mutual incoherence, the nicer the properties that the matrix has with respect to sparse representation problems. Matrices with a minimum mutual incoherence are also called Grassmannian frames, see Strohmer and Heath [35]. R Theorem 1 (Elad and Bruckstein [24]). Let A be as above, b ∈ m , and 1 . Then x⋆ is the let x⋆ be an optimal solution of (P0 ) with α := kx⋆ k0 < M unique solution of Ax = b with kxk0 ≤ α. If we slightly decrease the bound on the number of nonzeros, we can recover the unique optimal solution by solving (P1 ). R Theorem 2 (Elad and Bruckstein [24]). Let A be as above, b√∈ m , and let x⋆ be the (unique) optimal solution of (P0 ) with kx⋆ k0 < ( 2 − 21 )/M . Then the optimal solution of (P1 ) coincides with x⋆ . 1 ) obtained by Donoho This bound is an improvement of the bound 21 (1+ M √ 1 and Huo [21]. The bound ( 2 − 2 )/M ≈ 0.9142/M is only slightly smaller 1 of Theorem 2. than the bound M The statement of Theorem 2 can be strengthened for random matrices. For instance, we have the following. R Theorem 3 (Donoho [17]). Let A ∈ m×n , with m < n, be a matrix whose columns are unit vectors taken uniformly at random. Then there exists a constant ρ(n) such that the following holds: Let x⋆ be the optimal solution of (P0 ) for the system Ax = b, and assume that kx⋆ k0 < ρ(n) · n. Then the optimal solution of (P1 ) coincides with x⋆ . Hence, this results shows that for random matrices one can find optimal solutions x⋆ of (P0 ) for appropriate α ∈ O(n) if kx⋆ k0 < α by√linear programming. In contrast, Theorem 2 only works for α = 1/M ∈ O( n). Thus, asymptotically Theorem 3 is stronger than Theorem 2. However, known bounds on ρ(n) are very small, see also Candès and Tao [11]. For further extensions and strengthenings of these result see, for instance, Donoho and Elad [18], Candès and Romberg [7], Candès, Romberg, and Tao [8], and Donoho [15]. R Remark. A complementary problem to (P0 ) is as follows. Let B ∈ r×s with r > s. Suppose that b := Bx′ + e, where x′ is some given vector and e is a vector of errors. The goal is to recover x′ , given B and b. It turns out that, if B has full rank, this problem is equivalent to (P0 ) and, similarly to (P0 ), it can be solved by linear programming under certain assumptions on B and if e is sparse. For more information see Candès et al. [10], and Candès and Tao [11]. As mentioned in the introduction, there are several articles that deal with the problem min {kxk0 : kAx − bk2 ≤ δ}. For the case with the ℓ∞ -norm 6 SADEGH JOKAR AND MARC E. PFETSCH instead of the ℓ2 -norm (i.e., (P0δ )) there are much fewer results. We cite the following. Theorem 4 (Elad and Donoho [19]). Let A = [B1 B2 ] be as above. For x̃ ∈ n and e ∈ m , with kek∞ ≤ ε, define b := Ax̃ + e. If for some γ ≥ 0 and δ ≥ ε, we have 1+M √ kx̃k0 < , 2M + 2 m(ε + δ)/γ R R then the solution x⋆ to (P1δ ) satisfies kx⋆ − x̃k1 ≤ γ. In other words, if there exists a sparse solution x̃ to the disturbed system, then the optimal solution to (P1δ ) is not far from x̃. For the ℓ2 -norm case mentioned above, it can also be proved that under appropriate conditions the support of x⋆ is contained in the support of x̃. 2. Integer Programming Approaches The sparse representation problem (P0δ ) can be formulated as a mixed integer linear program as follows: min 1T z b − δ · 1 ≤ Ax ≤ b + δ · 1 −Z · z ≤ x ≤ Z · z z ∈ {0, 1}n . If Z > 0 is large enough, then zj = 1, j ∈ [n], places no restrictions on xj , while zj = 0 forces xj = 0. This shows that the above model indeed finds optimal solutions for (P0δ ). This approach, however, has usually suffers from serious numerical difficulties. On the one hand, Z has to be large enough to guarantee the correct behavior of the model. On the other hand, Z has to be as small as possible, since large values of Z can lead to wrong results, because of numerical problems. Furthermore, the larger Z is, the worse the LP-bounds in LP-based integer programming solvers; this has negative effects on the performance. While for special data A and b, good bounds on kxk∞ – and hence Z – may be possible, a priori bounds are impractically large, see e.g. Schrijver [34]. The problems with the previous approach can be avoided by transformation to the so-called maximum feasible subsystem problem (Max FS). Here, one is given an infeasible linear inequality system Dx ≤ d and wants to find a feasible subsystem of maximal cardinality. See Amaldi, Pfetsch, and Trotter [4] or Pfetsch [32] for more information. For the case δ = 0, the transformation of an instance Ax = b of (P0 ) to an instance of Max FS works as follows, see Amaldi and Kann [3]. Since A has full rank, we can use Gaussian elimination to obtain the equivalent system u + Rv = r, where R ∈ Rm×(m−n) , r ∈ m , u ∈ m , and v ∈ m−n . We now consider the system R R R u + Rv = r, u = 0, v = 0, which is infeasible (recall that we assumed b 6= 0 and hence r 6= 0). Eliminating u, we obtain Rv = r, v = 0. The Max FS instance is then Rv ≤ r, −Rv ≤ −r, v ≤ 0, −v ≤ 0. (1) EXACT AND APPROXIMATE SPARSE SOLUTIONS OF LINEAR EQUATIONS 7 This system has 4m − 2n inequalities and m − n variables. Since every point v satisfies at least one of the opposite inequalities, at most one of each pair of inequalities has to be removed in order to obtain a feasible subsystem. Moreover, if k inequalities are removed to yield a feasible system, a solution v to the remaining system yields a solution (u, v) of u + Rv = r having at most k nonzeros, where u = r − Rv. Conversely, a solution to u+Rv = r with k nonzeros yields a feasible subsystem of (1) with k removed inequalities. If δ > 0, it seems that there exists no easy reduction to Max FS. We can, however, proceed as follows. We consider the infeasible system Ax ≤ b + δ · 1, −Ax ≤ −b + δ · 1, x ≤ 0, −x ≤ 0, where we declare the first 2m inequalities as mandatory and hence only the last 2n nonnegativity inequalities can be removed to obtain a feasible system. Similar to above, a feasible subsystem including the first 2m constraints and skipping k of the last 2n inequalities yields a solution x ∈ Qδ with at most k nonzeros. Conversely, as solution x ∈ Qδ with k nonzeros yields a feasible subsystem containing the first 2m constraints and skipping k of the last 2n inequalities. Computing optimal solutions to Max FS can be done by a branch-andcut algorithm. The handling of mandatory inequalities is straightforward. We refer to Pfetsch [33] for an extensive discussion of this approach. In principle, a direct branch-and-cut algorithm for the sparse representation problem could avoid the doubling of variables arising in the transformation to Max FS. However, more research would be necessary to turn this into a practical approach. 3. Heuristic Approaches In this section we present five heuristic approaches to solve (P0δ ). We discuss basis pursuit, orthogonal matching pursuit, a nonlinear optimization approach due to Mangasarian, and a bilinear formulation. 3.1. Basis Pursuit The approach to replace the ℓ0 quasi-norm in (P0 ) and (P0δ ) by the more tractable ℓ1 -norm to obtain problems (P1 ) and (P1δ ) is called basis pursuit (BP) and was pioneered by Chen, Donoho, and Saunders [13], see also Chen [12]. There are several ways to rewrite these problems as linear programs. We will only present the different formulations for (P1δ ) – the changes for (P1 ) are straightforward. A first easy representation is min 1T y b − δ · 1 ≤ Ax ≤ b + δ · 1 −y ≤ x ≤ y. Note that here y is automatically nonnegative. This formulation has 2n variables and 2(m + n) constraints. For many LP-solvers, e.g., simplex algorithm based solvers, the number of (ordinary) constraints is crucial for speed, while bounds on variables are implicitly handled and provide little 8 SADEGH JOKAR AND MARC E. PFETSCH additional computational cost. Therefore the following formulation, which we use in our implementations, should be more efficient. min 1T (x+ + x− ) A(x+ − x− ) + s = b −δ · 1 ≤ s ≤ δ · 1 x+ , x− ≥ 0. This system has 2n + m variables, m equations, and 2(n + m) bounds on the variables. It arises by splitting x into its positive part x+ and its negative part x− and introducing slack variables s. Note that in the optimum, for − every j ∈ [n], not both x+ j and xj are positive; hence, we have the corre− + − spondence x+ j − xj = xj and xj + xj = |xj |. Clearly, in a formulation for (P0 ), variables s are not necessary. Remark. A general problem with all linear programming based approaches is that the matrix A is usually dense in applications. This has negative effects on the performance of LP-solvers, since the speed of solving the equation systems in LP-solvers approaches its worst case cubic behavior, and techniques for handling sparse data do not help here. See Section 4 for more information and, for instance, Yarmish [39] for an implementation of a dense LP-solver. 3.2. Orthogonal Matching Pursuit The orthogonal matching pursuit (OMP) approach to heuristically solve (P0 ) was developed by Pati, Rezaiifar, and Krishnaprasad [31] and Davis, Mallat, and Zhang [14]. We will not discuss the earlier proposed matching pursuit introduced by Mallat and Zhang [28]. The idea of OMP to find a sparse solution of Ax = b is the following. At the ith iteration, one is given an approximate point xi with residue r i = b − Axi . We want to enlarge the support S i of xi by a single index and therefore compute i i j ⋆ = argmax {|AT j r | : j ∈ [n] \ S }, where Aj is the jth column of A. Then one sets S i+1 = S i ∪ {j ⋆ } and solves the least squares problem X λj Aj 2 : λj = 0 for j ∈ / S i+1 }. (2) minn { b − R λ∈ j∈S i+1 The solution is taken as the new point xi+1 and one sets r i+1 = b − Axi+1 . We stop the process if kr i k∞ is small enough. The number of iterations gives the number of nonzeros of the computed solution to Ax = b. To get a solution for (P0δ ), we perform OMP as just explained, but stop as soon as kr i k∞ ≤ δ. Remark. We use the ℓ∞ -norm for the stopping criterion in order get a fair comparison with linear programming based approaches: In standard LPsolvers feasibility is checked w.r.t. the ℓ∞ -norm, using a feasibility tolerance of around ε = 10−6 . We use this tolerance in all implementations. EXACT AND APPROXIMATE SPARSE SOLUTIONS OF LINEAR EQUATIONS 9 There are examples in which OMP produces solutions arbitrarily far away from the optimal solution of (P0 ), see Chen, Donoho, and Saunders [13]. Nevertheless, one can prove interesting results about the quality of OMP solutions. For instance, Tropp and Gilbert [36] showed that if A consists of independent random unit vectors and b has a sparse enough representation, then OMP computes this representation (which is unique) with high probability. Recently Donoho, Tsaig, Drori, and Starck [22] proposed stagewise orthogonal matching pursuit, a variant of OMP in which many coefficients can enter the model at each step. 3.2.1. Generalizations of OMP. Orthogonal matching pursuit can be generalized as follows. We replace (2) by the following problem with S = S i+1 . min 1T (u + w) b − δ · 1 − Ax ≤ u Ax − b − δ · 1 ≤ w xj = 0 j ∈ /S u, w ≥ 0. (3) This LP computes a solution closest to Qδ with respect to the ℓ1 -norm that uses only variables in S (note that in an optimal solution at most one of uj and wj is positive). The process is terminated if the objective function value of (3) is small enough. Similarly, one can replace (2) by X λj Aj ∞ : λj = 0 for j ∈ / S i+1 }. (4) minn { b − R λ∈ j∈S i+1 This leads to an LP that is similar to (3): min z b − δ · 1 − Ax ≤ z · 1 Ax − b − δ · 1 ≤ z · 1 xj = 0 j∈ / S, where S = S i+1 again denotes the current support set. The process is terminated when the optimal solution value is small enough. In computational experiments it turned out that the results of these two methods do not significantly differ from the results produced by OMP. We therefore do not investigate these two variants further. Another generalization yields a new method that we call greedy orthogonal matching pursuit (GOMP). We present it for the case δ > 0 only; the modifications for δ = 0 are straightforward. At the ith iteration, we are given an approximate solution xi with support S i and compute the best extension of S i by solving (3) for each S = S i ∪ {j}, j ∈ / S i . We choose j ⋆ to be an index j that produces the smallest value in (3). The new point xi+1 is given by the solution of the corresponding LP. GOMP naturally increases the computational effort with respect to OMP, since it has to solve many LPs in each iteration. See Section 4 for more information. 10 SADEGH JOKAR AND MARC E. PFETSCH 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -3 -2 -1 0 1 2 3 Figure 1: Plot of f2 (x), f4 (x), f8 (x), and f16 (x) for n = 1. 3.3. Mangasarian’s Approach We now present an adaptation of a general method due to Mangasarian [29]. The idea is to replace kxk0 by the approximation fα (x) = n X j=1  1 − e−α|xj | , for some α > 0, see Figure 1. We clearly have fα (x) ≤ kxk0 and for x ∈ Rn , lim fα (x) = kxk0 . α→∞ For a fixed α and δ, we consider the problem (Mαδ ) where S(δ) := {(x, y) ∈ min{fα (y) : (x, y) ∈ S(δ)}, Rn × Rn : b − δ · 1 ≤ Ax ≤ b + δ · 1, −y ≤ x ≤ y}. Mangasarian [29] proved that a stationary point of (Mαδ ) can be reached by a successive linearization algorithm in finite time. Furthermore, there exists some α such that (Mαδ ) optimally solves (P0δ ). The successive linearization algorithm works as follows. Given a solution (xi , y i ) ∈ S(δ), we find a (vertex) solution (xi+1 , y i+1 ) of min { cT y : (x, y) ∈ S(δ) }, where i cj = α e−αyj , j = 1, . . . , n. Here, we replace |xj | by yj and use that d dyj (1 − e−α yj ) = αe−α yj . In the above linear program the constant part of the linearization is skipped, because it is irrelevant. We terminate the process once ky i+1 − y i k∞ is small enough. Although it is guaranteed that there exists α such that (Mαδ ) computes an optimal solution to (P0δ ), there is no efficient way to compute it (as long as P 6= NP). We therefore use the following simple approach. We start with α = 0.1 and double it after every of 20 iterations, each time solving the linearization with the successive linearization algorithm and storing the sparsest solution found. EXACT AND APPROXIMATE SPARSE SOLUTIONS OF LINEAR EQUATIONS 11 3.4. A Bilinear Formulation In this section, we introduce another formulation for the problem (P0δ ) and a heuristic way to solve this model. This approach follows ideas of Bennett and Bredensteiner [6]. Lemma 5. The minimization problem (P0δ ) is equivalent to the following bilinear program (with equilibrium constraints): min 1T z b − δ · 1 ≤ Ax ≤ b + δ · 1 xj (1 − zj ) = 0, j ∈ [n] 0 ≤ z ≤ 1. Proof. Suppose that x⋆ is an optimal solution of (P0δ ) and (x̂, ẑ) is an optimal solution of the above bilinear program. We want to show that 1T ẑ = kx⋆ k0 . Defining zj⋆ = 1 if x⋆j 6= 0 and zj⋆ = 0 if x⋆j = 0, we have kx⋆ k0 = kz ⋆ k0 = 1T z ⋆ ≥ 1T ẑ, since (x⋆ , z ⋆ ) is a feasible point for the above problem. Conversely, since if x̂j 6= 0 then ẑj = 1, we have kx̂k0 ≤ 1T ẑ. Hence, it follows that kx⋆ k0 ≤ kx̂k0 ≤ 1T ẑ, since x̂ is a feasible solution for (P0δ ).  Note that the set of feasible points of the bilinear formulation is not convex anymore. Hence, there may exist many local optima. A direct successive linearization method does not help for the above model, since fixing x or z forces hard bounds for the remaining variables that cannot be changed. We can, however, use a trick similar to the approach of Bennett and Bredensteiner [6]. The idea is to introduce a bound β ∈ + on the objective function and move the bilinear constraints to the objective. To make the problem bounded, we note that xj (1 − zj ) = 0 if and only if |xj |(1−zj ) = 0. Hence, we again introduce variables y modeling the absolute values of x (see Section 3.1). This leads to the following formulation: Z (Bβδ ) Z min y T (1 − z) (x, y) ∈ S(δ) 0≤z ≤1 1T z ≤ β. Given a fixed β ∈ + , a natural successive linearization method for this formulation is as follows. At the ith iteration we have points (xi , y i ) ∈ S(δ) and z i ∈ [0, 1]n . We then solve the linear program δ ) (Bβ,1 min y T (1 − z i ) (x, y) ∈ S(δ) to obtain xi+1 and y i+1 . Then we solve δ ) (Bβ,2 min (y i+1 )T (1 − z) 0≤z≤1 1T z ≤ β, 12 SADEGH JOKAR AND MARC E. PFETSCH to get z i+1 . The process is repeated as long as the objective function improves, i.e., (y i+1 )T (1 − z i+1 ) < (y i)T (1 − z i). δ ) has a simple closed form In fact, the second linearization problem (Bβ,2 solution, see also [6]; we skip the easy proof. R Lemma 6. Given the nonnegative vector y i+1 ∈ n , let j1 , . . . , jn be a . Then the vector permutation of [n], such that yji+1 ≥ yji+1 ≥ · · · ≥ yji+1 n 1 2 defined by zjk = 1 for k = 1, . . . , β and zjk = 0 for k = β + 1, . . . , n is an δ ). optimal solution to (Bβ,2 Bennett and Mangasarian [5] prove that (Bβδ ) has a vertex solution and then provide the following finiteness result for the successive linearization algorithm. Theorem 7 (Bennett and Mangasarian [5]). The successive linearization algorithm terminates in a finite number of steps at a global solution or a point (xi+1 , y i+1 , z i) that satisfies the minimum principle necessary optimality condition (1 − z i )T (y − y i+1 ) + (y i+1 )T (z i − z) ≥ 0, δ ) and z which is feasible for (B δ ). for all (x, y) feasible for (Bβ,1 β,2 On top of the above linearization process one adds a binary search routine to determine the optimal β. We use the secant method described in [6]. 3.5. Postprocessing Having obtained a solution x⋆ by a heuristic method, we can try to iteratively reduce the support of given points xi ∈ Qδ , where x0 := x⋆ . For this we let S = supp(xi ) and check for each k ∈ S whether {x ∈ Qδ : xj = 0, j ∈ [n] \ S, xk = 0} = 6 ∅. If this is the case for some k, we let xi+1 be the solution of this linear program. This approach sometimes helps to further reduce the support of the solution. 4. Computational Results We conducted several computational experiments with the algorithms discussed in this paper. We use deterministic and random matrices and construct instances that are guaranteed to have small support. The heuristics are implemented as described in Section 3 and the branchand-cut algorithm as indicated in Section 2. All algorithms are implemented in C/C++. The linear programs for the heuristics are solved via the primal simplex implementation of CPLEX 10.11. The implementation of the branch-and-cut algorithm is described in detail in Pfetsch [33]; it is based on the framework SCIP (Solving Constraint Integer Programs) version 0.90 by Achterberg [1], using CPLEX 10.11 for solving the intermediate LPs. The computations were performed on a 3.4 GHz Pentium 4 machine with 3 GB of main memory and 2 MB cache, running Linux. EXACT AND APPROXIMATE SPARSE SOLUTIONS OF LINEAR EQUATIONS 13 4.1. Experiments with Deterministic Matrices In this section, we use deterministic matrices constructed as follows. We take A = [I H] ∈ m×2m , where H is the normalized Hadamard matrix of size m, and I is the m × m identity matrix; this matrix was also used in Donoho, Elad, and Temlyakov [20]. By construction √ of the Hadamard matrix, m is a power of 2. The incoherence is M := 1/ m. R 4.1.1. The case m = 16. We first investigate the case in which the number of rows of A is m = 16. Hence, the number of columns is n = 32, and the number of variables in the Max FS formulation is 64. In a first experiment, we study the case in which we allow no deviation, i.e., δ = 0. For each N = 1, 2, . . . , 16, we generated three random instances as follows. Take a random vector x0 with N nonzeros. The support entries are chosen uniformly at random, and the values are drawn from a normal distribution with zero mean and variance 1. Then take b = Ax0 . Hence, it is guaranteed that Ax = b has a solution with N nonzeros, but it is possible that there exist solutions with fewer nonzeros – see the results below. Note that from N = 16 on, there trivially are solutions with at most 16 nonzeros. Table 1 shows the results of the branch-and-cut algorithm and the heuristics. The columns have the following meaning: “nodes” gives the number of nodes/subproblems in the branch-and-cut algorithm, “time” is the CPU time of the branch-and-cut algorithm in seconds, “sols” lists the number of optimal solutions, “opt” gives the number of nonzeros in an optimal solution, “BP” shows the results of basis pursuit, “MG” of Mangasarian’s approach, “OMP” of orthogonal matching pursuit, “GOMP” of the greedy OMP, and “BL” of the bilinear approach. Postprocessing never improved the solutions. Heuristical solutions that are optimal are marked in bold face. For computing the number of optimal solutions we set a time limit of 10 hours. Numbers appearing in italics in the corresponding column were obtained when the time limit was reached. The last row contains the geometric means over each column. The computation times of the heuristics are well below one second and are therefore not listed in the table. We now discuss these results in more detail. The results show that the larger N gets, the more time the branch-andcut algorithms needs. For N ≤ 13 all instances can be solved in a couple of seconds, while for N = 16 the execution almost needs up to six hours. In fact, the solution time is correlated to the size of the optimal solution – if it is at least 15, the instances get very hard to solve. Note that for N ≤ 12 the size of each optimal solution is N , while for larger N there often exists a solution of size smaller than N . We used a modification of the branch-and-cut algorithm to count the number of optimal solutions. Table 1 shows that for N ≤ 11 (with one exception for N = 9) the optimal solutions are unique, although Theorem 1 guaran√ tees uniqueness only for N ≤ 1/M = 16 = 4. These results complement the experiments of Elad [23], who investigated uniqueness for matrices of size 8 × 20 by enumeration. The heuristics perform very well for small N , usually finding the optimal solution; an exception is the second instance for N = 6, where all heuristics 14 SADEGH JOKAR AND MARC E. PFETSCH Table 1: Results of the branch-and-cut algorithm (columns 2–5) and heuristics (BP, MG, OMP, GOMP, and BL) for Hadamard matrices with m = 16 and δ = 0. Column “opt” gives the optimal solution, and numbers in bold are heuristical solutions that are optimal. The last row contains the geometric mean of the entries in each column. N nodes time sols opt BP MG OMP GOMP BL 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9 9 9 10 10 10 11 11 11 12 12 12 13 13 13 14 14 14 15 15 15 16 16 16 1 1 1 1 2 1 1 2 2 4 1 2 1 1 4 1 17 1 1 4 15 13 14 53 167 123 227 915 633 296 2751 1539 719 8827 4772 2784 20505 3248 18007 178872 190032 171931 22553939 162437 132454 23272940 22604125 19706327 0.01 0.00 0.00 0.00 0.00 0.01 0.01 0.01 0.01 0.01 0.00 0.01 0.01 0.01 0.03 0.02 0.15 0.02 0.04 0.08 0.09 0.18 0.20 0.34 0.56 0.48 1.09 4.58 3.40 1.42 14.92 10.37 3.50 38.17 28.90 14.39 63.26 14.50 51.11 286.82 313.60 293.62 20881.82 320.93 220.60 21847.21 21133.88 18380.20 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 8 1 1 1 1 1 1 1 1 8 1 48 48 288 1296 1296 48 54609 1296 970 63691 55278 50 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9 9 9 10 10 10 11 11 11 12 12 12 13 12 13 14 14 14 15 14 14 15 15 15 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 8 14 6 7 7 7 8 11 10 9 12 13 14 15 13 16 16 14 16 16 16 14 16 15 16 16 16 16 16 16 15 16 16 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 11 6 7 7 7 8 8 9 9 9 12 11 13 12 13 16 11 14 16 13 14 13 15 15 16 15 16 15 16 15 16 16 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 9 6 7 7 7 9 8 8 9 9 11 12 13 10 14 16 15 14 16 14 16 12 16 15 16 16 16 16 16 16 16 16 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 9 6 7 7 7 8 10 8 12 9 11 12 13 13 14 12 14 14 15 14 14 12 16 14 16 16 16 15 15 16 16 16 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 11 6 7 7 7 8 8 9 9 9 12 11 14 12 12 16 11 14 15 15 13 12 15 14 16 15 16 15 16 15 16 16 G 219.7 1.84 6.1 6.7 7.7 7.3 7.3 7.3 7.2 EXACT AND APPROXIMATE SPARSE SOLUTIONS OF LINEAR EQUATIONS 15 fail to find solutions close to optimal. Basis pursuit always finds the optimal solutions for N ≤ √ 5. The maximum size of N for which this is guaranteed by Theorem 2 is ( 2 − 21 )/M ≈ 0.91 · 4 = 3.64. Hence, in this case, the theoretical bound is quite close to the empirical bound. The other heuristics all improve upon basis pursuit and often find optimal solutions for N ≤ 9. For large N , their performance does not show a clear picture; quite frequently they only find trivial solutions of size 16. There is no clear winner among the heuristics (excluding basis pursuit), although the bilinear method seems to be slightly better than the other heuristics (the geometric mean of its solution values is 7.2, compared to 7.3 for the others). Furthermore, the bilinear method finds 29 optimal solutions, compared to 22 for basis pursuit, 26 for Mangasarian’s method, 26 for OMP, and 25 for GOMP. In the second experiment, we allow a deviation of δ = 0.1. Table 2 shows the results, the notation being the same as in Table 1. Column “nz” gives the number of nonzeros of the computed solution, and we additionally report in column “p”, for each heuristic, the number of nonzeros obtained by the postprocessing method of Section 3.5. Unlike the δ = 0 case, here this method can often improve the solution. Hence, it seems to be a good idea to use it after any of the heuristics. The computation times of the heuristics are again negligible. Because of the allowed deviation, the optimal solution values are usually smaller than N . For instance, no optimal solution has more than 10 nonzeros. As a consequence, each instance can be solved by the branch-and-cut algorithm within two seconds. The optimal solutions are unique only for N ≤ 5. An explanation is that the admittance of solutions in a neighborhood around the solution with N nonzeros allows more candidates. On the other hand, the number of optimal solutions stays small. It seems that these solutions are more isolated, since their support is smaller (because of the deviation), and solutions to the equations with few nonzeros tend to be unique as observed above. Again the heuristics perform quite well; for N ≤ 10, at least one heuristic (with postprocessing) finds the optimal solution. When comparing the heuristics, it again turns out that basis pursuit, although its results can be considerably improved by postprocessing, in general produces worse solutions than the other heuristics. The best heuristic with respect to the geometric mean over the number of produced nonzeros is GOMP, followed by OMP, Mangasarian’s method, the bilinear approach, and finally basis pursuit, but the numbers are quite close together for the first four. Note that the geometric mean of 5.3 for GOMP with postprocessing is not far from the geometric mean over all optimal solutions with 5.1, emphasizing the good quality of the produced solutions. Furthermore, basis pursuit finds 21 optimal solutions after postprocessing, Mangasarian’s method 32, OMP 35, GOMP 39, and the bilinear method finds 27 optimal solutions after postprocessing. Hence, for these instances, GOMP performs better than the other heuristics, but as we will see later, it is also quite time consuming for larger instances. 4.1.2. The case m = 128. We experimented with the branch-and-cut algorithm for larger instances. It turned out that already for m = 32, it cannot solve the instances with large N to optimality within any reasonable 16 SADEGH JOKAR AND MARC E. PFETSCH Table 2: Results of the branch-and-cut algorithm and heuristics for Hadamard matrices with m = 16 and δ = 0.1. The notation is as in Table 1. Column “nz” gives the number of nonzeros in a solution and column “p” the number of nonzeros after the postprocessing method of Section 3.5. BP nz p MG nz p OMP nz p GOMP nz p BL nz p 1 0 1 2 2 2 2 3 3 3 4 3 3 4 2 5 5 6 5 6 4 7 6 6 6 7 7 7 9 8 7 7 8 9 8 10 9 10 9 10 10 9 9 10 10 9 10 10 5 0 4 6 6 2 6 8 7 8 7 6 5 13 6 9 9 12 10 13 11 10 8 12 8 11 12 13 15 12 13 11 13 14 15 11 12 16 14 15 14 13 15 16 14 14 16 16 1 0 1 2 2 2 2 3 3 3 4 3 3 4 3 5 5 8 5 7 4 9 7 8 6 7 7 7 9 8 7 9 10 13 9 10 10 15 9 10 11 10 10 12 11 13 10 11 1 1 1 2 2 2 2 3 3 3 4 3 3 4 3 5 5 6 5 7 4 9 6 9 7 7 7 8 10 8 7 8 10 10 9 11 11 12 10 12 11 11 11 11 10 10 11 12 1 1 1 2 2 2 2 3 3 3 4 3 3 4 2 5 5 6 5 6 4 7 6 6 6 7 7 7 9 8 7 8 9 9 9 10 11 11 9 10 10 10 10 10 11 10 10 11 1 0 1 2 2 2 2 3 3 3 4 3 3 4 4 5 5 9 5 10 7 9 7 11 6 7 10 7 10 8 9 9 9 11 11 10 11 11 13 11 11 12 12 13 10 11 11 11 5.1 9.5 6.1 N nodes time sols opt 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9 9 9 10 10 10 11 11 11 12 12 12 13 13 13 14 14 14 15 15 15 16 16 16 1 1 1 2 2 1 2 4 1 2 2 2 4 2 2 4 4 10 4 31 4 4 15 31 11 27 5 15 165 35 27 27 62 176 185 431 183 388 103 377 430 160 181 352 369 153 463 375 0.00 0.00 0.00 0.01 0.01 0.00 0.01 0.02 0.02 0.01 0.02 0.01 0.01 0.02 0.01 0.03 0.03 0.05 0.03 0.11 0.02 0.09 0.07 0.10 0.05 0.11 0.15 0.10 0.53 0.21 0.09 0.09 0.32 0.52 0.56 1.44 0.57 1.56 0.45 1.73 1.69 0.49 0.57 1.47 1.22 0.72 1.78 1.42 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 10 3 2 1 2 1 1 9 1 2 2 3 8 42 5 11 1 2 10 1 45 3 87 6 44 78 3 2 12 18 2 76 32 G 18.5 0.14 3.3 1 0 1 2 2 2 2 3 3 3 4 3 3 4 2 5 5 9 5 7 7 10 7 11 6 7 12 10 10 9 10 9 9 13 12 10 11 13 11 13 11 13 13 13 12 13 11 14 1 0 1 2 2 2 2 3 3 3 4 3 3 4 3 5 5 6 5 7 4 7 7 8 6 7 7 7 9 8 7 8 10 11 9 10 10 14 9 10 11 10 10 12 10 12 10 11 5.6 5.5 1 0 1 2 2 2 2 3 3 3 4 3 3 4 2 5 5 6 5 7 4 9 6 8 6 7 7 7 9 8 7 8 10 9 9 11 11 12 9 10 10 9 10 11 10 9 11 11 5.6 5.4 1 0 1 2 2 2 2 3 3 3 4 3 3 4 2 5 5 6 5 6 4 7 6 6 6 7 7 7 9 8 7 8 9 9 9 10 11 11 9 10 10 10 10 10 11 9 10 11 5.3 5.3 1 0 1 2 2 2 2 3 3 3 4 3 3 4 3 5 5 8 5 8 7 7 7 11 6 7 9 7 9 8 7 8 9 11 11 10 11 11 11 10 11 12 12 12 10 11 11 11 5.9 5.7 EXACT AND APPROXIMATE SPARSE SOLUTIONS OF LINEAR EQUATIONS N 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 G BP nz time 10.0 15.0 20.0 25.0 30.0 35.0 40.0 55.7 70.1 100.3 127.2 127.9 128.0 128.0 128.0 (0.07) (0.06) (0.07) (0.07) (0.07) (0.08) (0.08) (0.08) (0.09) (0.09) (0.08) (0.08) (0.08) (0.08) (0.08) 51.1 (0.08) MG nz time 10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.9 55.0 60.0 67.7 74.9 118.8 113.7 (0.10) (0.10) (0.12) (0.12) (0.12) (0.12) (0.13) (0.13) (0.15) (0.15) (0.17) (0.18) (0.22) (0.33) (0.33) 41.1 (0.15) nz OMP time p nz GOMP time p 10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 56.7 60.4 80.5 91.2 117.8 122.5 (0.01) (0.01) (0.01) (0.02) (0.02) (0.03) (0.03) (0.04) (0.05) (0.07) (0.08) (0.20) (0.27) (0.46) (0.51) 10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 56.0 60.0 79.5 89.6 117.2 121.9 10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.1 56.6 60.3 78.0 99.9 113.2 120.2 (0.11) (0.33) (0.72) (1.31) (2.13) (3.24) (4.63) (6.33) (8.41) (11.62) (13.61) (28.87) (50.85) (65.26) (73.78) 10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 55.9 60.0 77.2 98.8 113.0 119.8 42.4 (0.05) 42.2 42.4 (5.26) 42.3 17 BL nz time 10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 56.0 65.6 69.7 94.2 123.4 127.0 (0.09) (0.10) (0.11) (0.12) (0.12) (0.13) (0.14) (0.13) (0.15) (0.15) (0.16) (0.14) (0.18) (0.18) (0.17) 42.5 (0.14) Table 3: Results of the heuristics for Hadamard matrices with m = 128 and δ = 0. Columns “p” give number of nonzeros after postprocessing. The entries are averages over ten instances. The last row contains the geometric mean of the entries in each column. time. Furthermore, the dual bounds can only provide meaningful information for very small N (approx. N ≤ 35). We therefore decided to test only the heuristics on larger instances and present results for m = 128 and N = {10, 15, 20, . . . , 80} for δ = 0 and N = {10, 15, . . . , 100} for δ = 0.1. For each N we generated ten instances (i.e., random right hand sides). The entries in the tables report average results. Table 3 shows the results of the heuristics for δ = 0. We skipped the columns for the postprocessing method, in which no result could be improved. The last row again contains the geometric mean of each column. We observe that all heuristics could find a solution of value N for N ≤ 40. It turns out that Mangasarian’s approach produces the sparsest solutions in geometric mean, closely followed by OMP, GOMP, the bilinear method, and finally basis pursuit. Note that GOMP does not improve upon OMP in the geometric mean, but this is mainly because of the bad performance for N = 70. For all other N , GOMP is better than OMP on average. GOMP, however, needs much larger computation times. Note also that basis pursuit only produces useless results for N ≥ 55. Furthermore, the geometric mean of Mangasarian’s approach with 41.1 is not too far from the geometric mean over all N with 38.6, indicating the good performance of this heuristic. Table 4 shows the results for the case δ = 0.1. Because of space constraints, we skipped the information about the running times, except for GOMP; the other running times are similarly small as in Table 3. The results again show that postprocessing pays off. Altogether, GOMP produces the sparsest solutions in geometric mean after postprocessing, followed Mangasarian’s approach, OMP, and the bilinear method; basis pursuit performs clearly worse than the other methods. All heuristics produce solutions whose number of nonzeros in the geometric mean are below the geometric mean over all N with 46.4. 18 SADEGH JOKAR AND MARC E. PFETSCH BP nz N 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 G 12.7 20.2 26.7 34.4 42.0 53.8 57.9 67.8 76.4 82.1 91.4 96.5 100.4 103.5 105.1 108.0 109.7 112.2 114.0 MG nz p p 7.2 10.8 16.4 20.1 23.7 31.5 35.3 42.9 50.6 53.2 65.3 70.7 78.6 84.9 85.6 87.0 89.8 90.5 93.9 7.3 11.1 15.6 20.7 23.9 29.3 31.6 35.9 39.2 43.2 48.6 53.2 57.8 62.9 65.8 70.8 72.7 73.1 72.8 63.7 43.5 OMP nz p 7.2 10.9 15.6 19.9 23.4 29.3 31.3 35.5 38.9 42.7 48.3 53.0 57.4 62.6 65.8 70.3 72.6 72.6 72.7 8.4 12.6 18.0 22.0 26.3 32.0 34.3 39.4 43.1 48.9 51.8 61.8 65.9 70.4 71.4 77.2 76.7 78.3 78.7 36.9 36.6 7.3 11.2 15.7 19.5 23.5 29.0 30.9 35.5 39.1 43.9 47.4 56.3 59.1 65.4 66.7 71.7 71.4 72.9 73.4 40.7 36.9 nz GOMP time p 7.3 10.9 15.2 19.6 23.1 28.5 30.2 35.0 38.3 41.8 47.3 52.1 59.4 62.1 63.6 68.3 69.7 71.7 72.6 (0.51) (0.94) (1.55) (2.33) (2.83) (4.55) (4.94) (6.32) (8.03) (8.89) (11.07) (12.68) (15.09) (15.71) (17.72) (17.83) (19.40) (20.46) (19.79) 7.3 10.8 15.2 19.6 23.1 28.5 30.2 35.0 38.1 41.7 47.2 51.9 59.1 62.1 63.6 68.3 69.5 71.5 72.6 36.0 (6.63) 36.0 BL nz 8.1 11.0 15.5 21.0 23.7 29.1 30.7 37.8 39.6 43.4 50.5 55.2 61.9 69.1 68.6 73.6 75.7 76.1 77.4 p 7.4 10.9 15.2 20.2 23.4 28.7 30.2 36.8 39.2 42.6 49.8 54.6 60.5 68.3 67.3 72.3 75.0 75.3 76.5 38.1 37.3 Table 4: Results of the heuristics for Hadamard matrices with m = 128 and δ = 0.1. N 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 G BP nz time 10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0 89.0 120.7 128.0 128.0 128.0 128.0 128.0 (0.16) (0.16) (0.18) (0.17) (0.18) (0.19) (0.20) (0.21) (0.21) (0.20) (0.20) (0.20) (0.20) (0.20) (0.20) 51.8 (0.19) MG nz time 10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 55.0 60.0 71.3 81.6 106.8 113.5 (0.29) (0.29) (0.35) (0.31) (0.31) (0.34) (0.36) (0.40) (0.38) (0.43) (0.45) (0.57) (0.67) (0.87) (0.92) 41.2 (0.43) nz OMP time p nz GOMP time p 10.0 15.0 20.1 25.3 30.4 35.9 41.6 48.0 59.8 82.4 108.7 121.3 127.2 127.3 127.3 (0.00) (0.01) (0.01) (0.01) (0.02) (0.02) (0.03) (0.05) (0.10) (0.22) (0.40) (0.49) (0.53) (0.54) (0.53) 10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0 57.7 76.4 106.9 121.0 127.2 127.3 127.3 10.0 15.0 20.1 25.2 30.1 35.2 41.5 47.6 52.4 65.2 102.2 125.9 122.0 126.0 125.7 (0.11) (0.33) (0.72) (1.32) (2.16) (3.25) (5.05) (7.38) (9.42) (19.11) (53.99) (78.55) (73.96) (78.63) (78.24) 10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 62.1 99.3 125.9 120.1 126.0 125.7 49.0 (0.09) 48.0 47.4 (6.81) 46.6 BL nz time 10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 62.2 80.3 96.3 104.6 117.4 127.7 (0.24) (0.24) (0.31) (0.28) (0.28) (0.31) (0.32) (0.36) (0.36) (0.44) (0.42) (0.42) (0.44) (0.45) (0.44) 44.5 (0.35) Table 5: Results of the heuristics for random matrices with m = 128 and δ = 0. 4.2. Experiments with Random Matrices In our final two experiments, we constructed random matrices by taking 256 unit random vectors of size 128. Similar random matrices and experiments have been applied in many articles, see, e.g., Candès and Tao [11]. We ran the heuristics for ten instances each and report the averages in Table 5 for the case of δ = 0 and in Table 6 for δ = 0.1. The notation is similar to Tables 3 and 4. EXACT AND APPROXIMATE SPARSE SOLUTIONS OF LINEAR EQUATIONS N 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 G BP nz p 13.7 21.6 30.3 37.3 43.3 47.0 54.9 60.2 62.9 70.2 75.9 78.8 80.4 83.2 86.0 90.2 89.1 90.8 94.3 5.8 10.5 15.0 25.3 27.8 34.1 40.0 45.1 50.5 54.1 60.5 61.9 63.4 67.8 70.3 74.3 69.8 73.1 75.8 57.1 40.5 MG nz p 6.0 7.7 11.1 14.9 17.9 20.6 23.4 27.6 30.6 32.2 36.6 38.9 42.6 44.3 46.4 47.0 48.9 51.0 53.1 5.6 7.6 10.9 14.9 17.6 20.2 23.1 26.9 30.4 31.9 36.2 38.6 42.3 44.2 46.1 46.4 48.7 50.5 52.7 26.8 26.5 OMP nz p 5.3 8.4 12.5 17.8 20.7 24.3 29.1 33.6 36.4 44.2 51.7 54.6 60.6 60.6 63.2 66.8 66.8 66.0 71.1 5.1 7.6 10.6 15.2 18.2 21.4 24.6 29.3 31.3 36.8 44.4 46.6 51.6 53.0 55.1 57.7 57.1 58.7 60.5 33.7 29.3 nz GOMP time p 5.3 7.6 10.8 15.3 17.9 21.0 24.2 29.0 32.8 36.3 38.8 44.3 49.6 50.0 51.7 55.3 53.6 55.4 56.4 (1.53) (2.20) (3.38) (4.97) (6.49) (8.33) (10.37) (12.97) (15.88) (19.12) (21.44) (26.54) (31.65) (31.85) (34.56) (37.69) (36.07) (38.13) (40.34) 5.3 7.6 10.6 15.2 17.9 20.7 24.0 28.8 31.9 35.8 38.6 43.4 48.6 49.7 50.6 53.7 52.7 53.9 55.4 28.5 (13.89) 28.1 19 BL nz p 6.4 7.8 11.0 15.8 19.2 21.5 24.6 29.5 32.6 34.6 38.1 40.2 44.8 46.3 49.1 51.0 51.2 53.8 54.1 5.4 7.7 10.7 15.3 18.1 20.7 24.3 28.3 31.4 33.4 37.7 39.6 43.9 45.0 48.7 50.1 50.3 52.6 53.3 28.2 27.3 Table 6: Results of the heuristics for random matrices with m = 128 and δ = 0.1. The general outcome of the experiments for δ = 0 are similar to the deterministic case. Basis pursuit fails to recover all solutions of size N beginning at N = 50, while Mangasarian’s approach yields perfect recovery for N ≤ 60. The best performance in the geometric mean is achieved by Mangasarian’s approach. Recall that the geometric mean over all N is 38.6, and hence Mangasarian’s approach with 41.2 yields good results. The results for δ = 0.1 confirm the above results. Clearly the best heuristic for these random instances is Mangasarian’s approach, followed by the bilinear method, GOMP (which is again much more expensive than the other heuristics), OMP, and, clearly the worst heuristic, basis pursuit. A comparison to the geometric mean of 46.4 over all N shows that the solutions of the heuristics are of considerable smaller size than N , in the geometric mean. The results also show that the heuristics, in general, produce very good results (up to about N = 70 for δ = 0 and generally for δ = 0.1). Comparing the results for Hadamard and random matrices, the picture is not clear. Surprisingly, for δ = 0, the heuristics produce solutions with more nonzeros in the geometric mean for random matrices than for Hadamard matrices, while for δ = 0.1 the reverse holds. The results for δ = 0 are not in conflict with theory, since the known good behavior of random matrices, for instance in Theorem 3, hold only asymptotically and with high probability. 5. Conclusions The results of this paper can be summarized as follows. An exact solution of the sparse representation problem by branch-and-cut seems to be hard, especially if the optimal solutions have large support. Nevertheless, the optimal solutions obtained in this way allow to evaluate the performance 20 SADEGH JOKAR AND MARC E. PFETSCH 52 44 42 48 40 44 38 36 40 BP MG OMP GOMP BL BP MG OMP GOMP (a) Hadamard matrices, δ = 0 (b) Hadamard matrices, δ = 0.1 52 44 BL 40 48 36 32 44 28 40 28 BP MG OMP GOMP BL BP MG OMP GOMP BL (c) Random matrices, δ = 0 (d) Random matrices, δ = 0.1 Figure 2: Comparison of the geometric means for the experiments of Section 4.1.2 and 4.2 with m = 128. of heuristics. Furthermore, this exact approach can be used to investigate the uniqueness of the optimal solutions. The results show that uniqueness occurs for instances with a much larger number of nonzeros than guaranteed by theory. Concerning the heuristics, Figure 2 shows a comparison of the results of all experiments with m = 128. It turns out that basis pursuit consistently is the worst heuristic. In total, Mangasarian’s approach seems to be the best heuristic. GOMP improves the results of OMP, but is computationally too expensive for larger instances. OMP itself yields quite good results as well. The bilinear method is somewhat in between the other approaches; for random instances it works better than for Hadamard matrices. If deviations are allowed, postprocessing usually helps to reduce the number of nonzeros for all heuristics. Finally, the computation times of the heuristics (except GOMP) are quite reasonable for the problem sizes we consider. This of course might change for larger practically relevant instances. OMP, which does not use linear programming, is in general faster than the other approaches and might be of use for these cases. But OMP is also a bit worse in terms of quality of solutions. References [1] T. Achterberg, SCIP – A framework to integrate constraint and mixed integer programming, Report 04–19, ZIB, 2004. http://www.zib.de/Publications/abstracts/ ZR-04-19/. [2] E. Amaldi, On the complexity of designing compact perceptrons and some consequences, in Electronic Proc. Fifth International Symposium on Artificial Intelligence and Mathematics, Fort Lauderdale, Florida, E. Boros and R. Greiner, eds., RUTCOR, 1999. EXACT AND APPROXIMATE SPARSE SOLUTIONS OF LINEAR EQUATIONS 21 [3] E. Amaldi and V. Kann, On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems, Theor. Comput. Sci. 209, no. 1–2 (1998), pp. 237–260. [4] E. Amaldi, M. E. Pfetsch, and L. E. Trotter, Jr., On the maximum feasible subsystem problem, IISs, and IIS-hypergraphs, Math. Program. 95, no. 3 (2003), pp. 533–554. [5] K. Bennett and O. Mangasarian, Bilinear separation of two sets in n-space, Comput. Optim. Appl. 2 (1993), pp. 207–227. [6] K. P. Bennett and E. J. Bredensteiner, A parametric optimization method for maschine learning, INFORMS J. Comput. 9, no. 3 (1997), pp. 311–318. [7] E. J. Candès and J. Romberg, Quantitative robust uncertainty principles and optimally sparse decompositions, Found. Comput. Math. 6, no. 2 (2006), pp. 227– 254. [8] E. J. Candès, J. Romberg, and T. Tao, Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information, IEEE Trans. Inform. Theory 52, no. 2 (2006), pp. 489–509. [9] E. J. Candès, J. Romberg, and T. Tao, Stable signal recovery from incomplete and inaccurate measurements, Comm. Pure Appl. Math. 59, no. 8 (2006), pp. 1207– 1223. [10] E. J. Candés, M. Rudelson, T. Tao, and R. Vershynin, Error correction via linear programming, in Proc. 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS05), IEEE, 2005, pp. 295–308. [11] E. J. Candès and T. Tao, Decoding by linear programming, IEEE Trans. Inform. Theory 51, no. 12 (2005), pp. 4203–4215. [12] S. S. Chen, Basis Pursuit, PhD thesis, Standford University, 1995. [13] S. S. Chen, D. L. Donoho, and M. A. Saunders, Atomic decmposition by basis pursuit, SIAM J. Sci. Comput. 20, no. 1 (1999), pp. 33–61. [14] G. Davis, S. Mallat, and Z. Zhang, Adaptive time-frequency decompositions with matching pursuits, Opt. Eng. 33, no. 7 (1994), pp. 2183–2191. [15] D. L. Donoho, Compressed sensing, IEEE Trans. Inform. Theory 52, no. 4 (2006), pp. 1289–1306. [16] D. L. Donoho, For most large underdetermined systems of equations, the minimal ℓ1 -norm near-solution approximates the sparsest near-solution, Comm. Pure Appl. Math. 59, no. 7 (2006), pp. 907–934. [17] D. L. Donoho, For most large underdetermined systems of linear equations the minimal ℓ1 -norm solution is also the sparsest solution, Comm. Pure Appl. Math. 59, no. 6 (2006), pp. 797–829. [18] D. L. Donoho and M. Elad, Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization, Proc. Natl. Acad. Sci. USA 100, no. 5 (2003), pp. 2197–2202. [19] D. L. Donoho and M. Elad, On the stability of the basis pursuit in the presence of noise, Signal Process. 86 (2006), pp. 511ŋ–532. [20] D. L. Donoho, M. Elad, and V. Temlyakov, Stable recovery of sparse overcomplete representations in the presence of noise, IEEE Trans. Inform. Theory 52, no. 1 (2006), pp. 6–18. [21] D. L. Donoho and X. Huo, Uncertainty principles and ideal atomic decomposition, IEEE Trans. Inform. Theory 47, no. 7 (2001), pp. 2845–2862. [22] D. L. Donoho, Y. Tsaig, I. Drori, and J.-L. Starck, Sparse solution of underdetermined linear equations by stagewise orthogonal matching pursuit, Tech. Report 2006-02, Standford, Department of Statistics, 2006. [23] M. Elad, Sparse representations are most likely to be the sparsest possible, EURASIP J. Appl. Signal Process. 2006 (2006), pp. 1ŋ–12. [24] M. Elad and A. M. Bruckstein, A generalized uncertainty principle and sparse representation in pairs of bases, IEEE Trans. Inform. Theory 48, no. 9 (2002), pp. 2558–2567. [25] J. J. Fuchs, Recovery of exact sparse representations in the presence of bounded noise, IEEE Trans. Inform. Theory 51, no. 10 (2005). 22 SADEGH JOKAR AND MARC E. PFETSCH [26] J.-J. Fuchs, Recovery conditions of sparse representations in the presence of noise, in Proc. ICASSP 2006, IEEE, 2006, pp. 337–340. [27] M. R. Garey and D. S. Johnson, Computers and Intractability. A Guide to the Theory of NP-Completeness, W. H. Freeman and Company, New York, 1979. [28] S. Mallat and Z. Zhang, Matching pursuit in a time-frequency dictionary, IEEE Trans. Inform. Theory 41 (1993), pp. 3397–3415. [29] O. L. Mangasarian, Minimum-support solutions of polyhedral concave programs, Optimization 45 (1999), pp. 149–162. [30] B. K. Natarajan, Sparse approximate solutions to linear systems, SIAM J. Comput. 24, no. 2 (1995), pp. 227–234. [31] Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad, Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decompositions, in Proc. 27th Asilomar Conference on Signals, Systems and Computers, A. Singh, ed., 1993. [32] M. E. Pfetsch, The Maximum Feasible Subsystem Problem and Vertex-Facet Incidence of Polyhedra, PhD thesis, TU Berlin, 2002. [33] M. E. Pfetsch, Branch-and-cut for the maximum feasible subsystem problem, ZIBReport 05-46, Zuse Institute Berlin, 2005. [34] A. Schrijver, Theory of linear and integer programming, John Wiley & Sons, Chichester, 1986. Reprint 1998. [35] T. Strohmer and j. Robert W. Heath, Grassmannian frames with applications to coding and communication, Appl. Comput. Harmon. Anal. 14, no. 3 (2003), pp. 257– 275. [36] J. Tropp and A. C. Gilbert, Signal recovery from partial information via orthogonal matching pursuit. Preprint, Apr. 2005. [37] J. A. Tropp, Greed is good: Algorithmic results for sparse approximations, IEEE Trans. Inform. Theory 50, no. 10 (2004), pp. 2231–2242. [38] J. A. Tropp, Just relax: Convex programming methods for identifying sparse signals in noise, IEEE Trans. Inform. Theory 52, no. 3 (2006), pp. 1030–1051. [39] G. Yarmish, Wavelet decomposition via the standard tableau simplex method of linear programming. Preprint, available at www.optimization-online.org, 2006. Sadegh Jokar, Insitute of Mathematics, TU Berlin, Straße des 17. Juni 136, 10623 Berlin, Germany E-mail address: jokar@math.tu-berlin.de Marc E. Pfetsch, Zuse Institute Berlin, Takustr. 7, 14195 Berlin, Germany E-mail address: pfetsch@zib.de View publication stats