IEEE TRANSACTIONS ON INFORMATION THEORY
1
Rank Minimization over Finite Fields: Fundamental
Limits and Coding-Theoretic Interpretations
arXiv:1104.4302v4 [cs.IT] 1 Dec 2011
Vincent Y. F. Tan, Laura Balzano, Student Member, IEEE, Stark C. Draper, Member, IEEE
Abstract—This paper establishes information-theoretic limits
for estimating a finite field low-rank matrix given random linear
measurements of it. These linear measurements are obtained
by taking inner products of the low-rank matrix with random
sensing matrices. Necessary and sufficient conditions on the
number of measurements required are provided. It is shown
that these conditions are sharp and the minimum-rank decoder
is asymptotically optimal. The reliability function of this decoder
is also derived by appealing to de Caen’s lower bound on the
probability of a union. The sufficient condition also holds when
the sensing matrices are sparse – a scenario that may be amenable
to efficient decoding. More precisely, it is shown that if the
n×n-sensing matrices contain, on average, Ω(nlog n) entries, the
number of measurements required is the same as that when the
sensing matrices are dense and contain entries drawn uniformly
at random from the field. Analogies are drawn between the above
results and rank-metric codes in the coding theory literature.
In fact, we are also strongly motivated by understanding when
minimum rank distance decoding of random rank-metric codes
succeeds. To this end, we derive minimum distance properties
of equiprobable and sparse rank-metric codes. These distance
properties provide a precise geometric interpretation of the fact
that the sparse ensemble requires as few measurements as the
dense one.
Index Terms—Rank minimization, Finite fields, Reliability
function, Sparse parity-check matrices, Rank-metric codes, Minimum rank distance properties
I. I NTRODUCTION
This paper considers the problem of rank minimization over
finite fields. Our work attempts to connect two seemingly disparate areas of study that have, by themselves, become popular
in the information theory community in recent years: (i) the
theory of matrix completion [2]–[4] and rank minimization [5],
[6] over the reals and (ii) rank-metric codes [7]–[12], which
are the rank distance analogs of binary block codes endowed
with the Hamming metric. The work herein provides a starting
point for investigating the potential impact of the low-rank
assumption on information and coding theory. We provide a
brief review of these two areas of study.
This work is supported in part by the Air Force Office of Scientific Research
under grant FA9550-09-1-0140 and by the National Science Foundation under
grant CCF 0963834. V. Y. F. Tan is also supported by A*STAR Singapore.
This paper was presented in part at the IEEE International Symposium on
Information Theory (ISIT), St. Petersburg, Russia, August 2011 [1].
The authors are with the Department of Electrical and Computer Engineering (ECE), University of Wisconsin, Madison, WI, 53706, USA (emails:
vtan@wisc.edu; sunbeam@ece.wisc.edu; sdraper@ece.wisc.edu). The first
author is also affiliated to the Laboratory for Information and Decision
Systems (LIDS), Massachusetts Institute of Technology (MIT), Cambridge,
MA, 02139, USA (email: vtan@mit.edu).
Copyright (c) 2011 IEEE. Personal use of this material is permitted.
However, permission to use this material for any other purposes must be
obtained from the IEEE by sending a request to pubs-permissions@ieee.org.
The problem of matrix completion [2]–[4] can be stated as
follows: One is given a subset of noiseless or noisy entries
of a low-rank matrix (with entries over the reals), and is then
required to estimate all the remaining entries. This problem
has a variety of applications from collaborative filtering (e.g.,
Netflix prize [13]) to obtaining the minimal realization of a
linear dynamical system [14]. Algorithms based on the nuclear
norm (sum of singular values) convex relaxation of the rank
function [14], [15] have enjoyed tremendous successes. A
generalization of the matrix completion problem is the rank
minimization problem [5], [6] where, instead of being given
entries of the low-rank matrix, one is given arbitrary linear
measurements of it. These linear measurements are obtained
by taking inner products of the unknown matrix with sensing
matrices. The nuclear norm heuristic has also been shown
to be extremely effective in estimating the unknown lowrank matrix. Theoretical results [5], [6] are typically of the
following flavour: If the number of measurements (also known
as the measurement complexity) exceeds a small multiple
of the product of the dimension of the matrix and its rank,
then optimizing the nuclear-norm heuristic yields the same
(optimal) solution as the rank minimization problem under
certain conditions on the sensing matrices. Note that in the
case of real matrices, if the observations (or the entries) are
noisy, perfect reconstruction is impossible. As we shall see in
Section V, this is not the case in the finite field setting. We
can recover the underlying matrix exactly albeit at the cost of
a higher measurement complexity.
Rank-metric codes [7]–[12] are subsets of finite field matrices endowed with the rank-metric. We will be concerned
with linear rank-metric codes, which may be characterized by
a family of parity-check matrices, which are equivalent to the
sensing matrices in the rank minimization problem.
A. Motivations
Besides analyzing the measurement complexity for rank
minimization over finite fields, this paper is also motivated
by two applications in coding. The first is index coding with
side information [16]. In brief, a sender wants to communicate
the l-th coordinate of a length-L bit string to the l-th of L
receivers. Furthermore, each of the L receivers knows a subset
of the coordinates of the bit string. These subsets can be
represented by (the neighbourhoods of) a graph. Bar-Yossef
et al. [16] showed that the linear version of this problem
reduces to a rank minimization problem. In previous works,
the graph is deterministic. Our work, and in particular the
rank minimization problem considered herein, can be cast as
IEEE TRANSACTIONS ON INFORMATION THEORY
2
the solution of a linear index coding problem with a random
side information graph.
Second, we are interested in properties of the rank-metric
coding problem [10]. Here, we are given a set of matrix-valued
codewords that form a linear rank-metric code C . A codeword
C∗ ∈ C is transmitted across a noisy finite field matrix-valued
channel which induces an additive error matrix X. This error
matrix X is assumed to be low rank. For example, X could
be a matrix induced by the crisscross error model in data
arrays [17]. In the crisscross error model, X is a sparse low
rank matrix in which the non-zero elements are restricted to
a small number of rows and columns. The received matrix is
R := C∗ + X. The minimum distance decoding problem is
given by the following:
Ĉ := arg min rank(R − C).
(1)
C∈C
We would like to study when problem (1) succeeds (i.e.,
uniquely recovers the true codeword C∗ ) with high probability1 (w.h.p.) given that C is a random code characterized
by either dense or sparse random parity-check matrices and
X is a deterministic error matrix. But why analyze random
codes? Our study of random (instead of deterministic) codes is
motivated by the fact that data arrays that arise in applications
are often corrupted by crisscross error patterns [17]. Decoding
techniques used in the rank-metric literature such as error
trapping [11], [18] are unfortunately not able to correct such
error patterns because they are highly structured and hence
the “error traps” would miss (or not be able to correct) a
non-trivial subset of errors. Indeed, the success such an error
trapping strategy hinges strongly on the assumption that the
underlying low-rank error matrix X is drawn uniformly at
random over all matrices whose rank is r [18, Sec. IV] (so
subspaces can be trapped). The decoding technique in [17]
is specific to correcting crisscross error patterns. In contrast,
in this work, we are able to derive distance properties of
random rank-metric codes and to show that given sufficiently
many constraints on the codewords, all error patterns of rank
no greater than r can be successfully corrected. Although
our derivations are similar in spirit to those in Barg and
Forney [19], our starting point is rather different. In particular,
we combine the use of techniques from [20] and those in [19].
We are also motivated by the fact that error exponentlike results for matrix-valued finite field channels are, to the
best of the authors’ knowledge, not available in the literature.
Such channels have been popularized by the seminal work
in [21]. Capacity results for specific channel models such as
the uniform given rank (u.g.r.) multiplicative noise model [22]
have recently been derived. In this work, we derive the error
exponent for the minimum-rank decoder E(R) (for the additive noise model). This fills an important gap in the literature.
B. Main Contributions
We summarize our four main contributions in this work.
Firstly, by using a standard converse technique (Fano’s
inequality), we derive a necessary condition on the number
1 Here and in the following, with high probability means with probability
tending to one as the problem size tends to infinity.
of measurements required for estimating a low-rank matrix.
Furthermore, under the assumption that the linear measurements are obtained by taking inner products of the unknown
matrix with sensing matrices containing independent entries
that are equiprobable (in Fq ), we demonstrate an achievability
procedure, called the min-rank decoder, that matches the
information-theoretic lower bound on the number of measurements required. Hence, the sufficient condition is sharp.
Extensions to the noisy case are also discussed. Note that in
this paper, we are not as concerned with the computational
complexity of recovering the unknown low-rank matrix as
compared to the fundamental limits of doing so.
Secondly, we derive the reliability function (error exponent)
E(R) of the min-rank decoder by using de Caen’s lower bound
on the probability of a union [23]. The use of de Caen’s bound
to obtain estimates of the reliability function (or probability
of error) is not new. See the works by Séguin [24] and Cohen
and Merhav [25] for example. However, by exploiting pairwise
independence of constituent error events, we not only derive
upper and lower bounds on E(R), we show that these bounds
are, in fact, tight for all rates (for the min-rank decoder).
We derive the corresponding error exponents for codes in [7]
and [18] and make comparisons between the error exponents.
Thirdly, we show that if the fraction of non-zero entries
of the sensing or measurement matrices scales (on average)
as Ω( logn n ) (where the matrix is of size n × n), the min-rank
decoder achieves the information-theoretic lower bound. Thus,
if the average number of entries in each sparse sensing matrix
is Ω(n log n) (which is much fewer than n2 ), we can show that,
very surprisingly, the number of linear measurements required
for reliable reconstruction of the unknown low-rank matrix is
exactly the same as that for the equiprobable (dense) case. This
main result of ours opens the possibility for the development
of efficient, message-passing decoding algorithms based on
sparse parity-check matrices [26].
Finally, we draw analogies between the above results and
rank-metric codes [7]–[12] in the coding theory literature. We
derive minimum (rank) distance properties of the equiprobable
random ensemble and the sparse random ensemble. Using
elementary techniques, we derive an analog of the GilbertVarshamov distance for the random rank-metric code. We also
compare and contrast our result to classical binary linear block
codes with the Hamming metric [19]. From our analyses in
this section, we obtain geometric intuitions to explain why
minimum rank decoding performs well even when the sensing
matrices are sparse. We also use these geometric intuitions to
guide our derivation of strong recovery guarantees along the
lines of the recent work by Eldar et al. [27].
C. Related Work
There is a wealth of literature on rank minimization to which
we will not be able to do justice here. See for example the
seminal works by Fazel et al. [14], [15] and the subsequent
works by other authors [2]–[4] (and the references therein).
However, all these works focus on the case where the unknown
matrix is over the reals. We are interested in the finite field
setting because such a problem has many connections with
IEEE TRANSACTIONS ON INFORMATION THEORY
3
TABLE I
C OMPARISON OF OUR WORK (T AN -BALZANO -DRAPER ) TO EXISTING
CODING - THEORETIC TECHNIQUES FOR RANK MINIMIZATION
TABLE II
C OMPARISONS BETWEEN THE RESULTS IN VARIOUS SECTIONS OF THIS
Paper
Gabidulin [7]
SKK [10]
MU [11]
SKK [18]
GLS [33]
TBD
Parity-check
matrix Ha
Random, dense
Deterministic, dense
Random, sparse
Deterministic, sparse
Code Structure
Algebraic
Algebraic
Factor Graph
Error Trapping
Perfect Graph
See Table II
Decoding Technique
Berlekamp-Massey
Extended Berlekamp-Massey
Error Trapping & Message Passing
Error Trapping
Semidefinite Program (Ellipsoid)
Min-Rank Decoder (Section VIII)
and applications to coding and information theory [16], [17],
[28]. The analogous problem for the reals was considered by
Eldar et al. [27]. The results in [27], developed for dense
sensing matrices with i.i.d. Gaussian entries, mirror those in
this paper but only achievability results (sufficient conditions)
are provided. We additionally analyze the sparse setting.
Our work is partially inspired by [29] where fundamental
limits for compressed sensing over finite fields were derived.
To the best of our knowledge, Vishwanath’s work [30] is
the only one that employs information-theoretic techniques to
derive necessary and sufficient conditions on the number of
measurements required for reliable matrix completion (or rank
minimization). It was shown using typicality arguments that
the number of measurements required is within a logarithmic
factor of the lower bound. Our setting is different because we
assume that we have linear measurements instead of randomly
sampled entries. We are able to show that the achievability
and converse match for a family of random sensing matrices.
Emad and Milenkovic [31] recently extended the analyses in
the conference version [1] of this paper to the tensor case,
where the rank, the order of the tensor and the number of
measurements grow simultaneously with the size of the matrix.
We compare and contrast our decoder and analysis for the
noisy case to that in [31]. Another recent related work is that
by Kakhaki et al. [32] where the authors considered the binary
erasure channel (BEC) and binary symmetric channel (BSC)
and empirically studied the error exponents for codes whose
generator matrices are random and sparse. For the BEC, the
authors showed that there exist capacity-achieving codes with
generator matrices whose sparsity factor (density) is O( logn n )
(similar to this work). However, motivated by the fact that
sparse parity-check matrices may make decoding amenable to
lower complexity message-passing type decoders, we analyze
the scenario where the parity-check matrices are sparse.
The family of codes known as rank-metric codes [7]–[12],
which are the the rank-distance analog of binary block codes
equipped with the Hamming metric, bears a striking similarity
to the rank minimization problem over finite fields. Comparisons between this work and related works in the coding theory
literature are summarized in Table I. Our contributions in the
various sections of this paper, and other pertinent references,
are summarized in Table II. We will further elaborate on these
comparisons in Section IX-A.
D. Outline of Paper
Section II details our notational choices, describes the
measurement models and states the problem. In Section III, we
PAPER AND OTHER RELATED WORKS
Random
low-rank matrix X
Section IV
Section IV, [18]
Section VI
Section VI, [11], [18]
Deterministic
low-rank matrix X
Section IV
Section VII-C, [7], [10]
Section VI
Section VII-C
use Fano’s inequality to derive a lower bound on the number
of measurements for reconstructing the unknown low-rank
matrix. In Section IV, we consider the uniformly at random
(or equiprobable) model where the entries of the measurement
matrices are selected independently and uniformly at random
from Fq . We derive a sufficient condition for reliable recovery
and the reliability function of the min-rank decoder using de
Caen’s lower bound. The results are then extended to the noisy
scenario in Section V. Section VI, which contains our main
result, considers the case where the measurement matrices are
sparse. We derive a sufficient condition on the sparsity factor
(density) as well as the number of measurements for reliable
recovery. Section VII is devoted to understanding and interpreting the above results from a coding-theoretic perspective.
In Section VIII, we provide a procedure to search for the
low-rank matrix by exploiting indeterminacies in the problem.
Discussions and conclusions are provided in Section IX. The
lengthier proofs are deferred to the appendices.
II. P ROBLEM S ETUP AND M ODEL
In this section, we state our notational conventions, describe
the system model and state the problem. We also distinguish
between the two related notions of weak and strong recovery.
A. Notation
In this paper we adopt the following set of notations:
Serif font and san-serif font denote deterministic and random
quantities respectively. Bold-face upper-case and bold-face
lower-case denote matrices and (column) vectors respectively.
Thus, y, y, X and X denote a deterministic scalar, a scalarvalued random variable, a deterministic matrix and a random
matrix respectively. Random functions will also be denoted in
san-serif font. Sets (and events) are denoted with calligraphic
font (e.g., U or C ). The cardinality of a finite set U is
denoted as |U|. For a prime power q, we denote the finite
(Galois) field with q elements as Fq . If q is prime, one can
identify Fq with Zq = {0, . . . , q − 1}, the set of the integers
modulo q. The set of m × n matrices with entries in Fq is
. For simplicity, we let [k] := {1, . . . , k}
denoted as Fm×n
q
and yk := (y1 , . . . , yk ). For a matrix M, the notations kMk0
and rank(M) respectively denote the number of non-zero
elements in M (the Hamming weight) and the rank of M
in Fq . For a matrix M ∈ Fm×n
, we also use the notation
q
vec(M) ∈ Fmn
to
denote
vectorization
of M with its columns
q
stacked on top of one another. For a real number b, the
notation |b|+ is defined as max{b, 0}. Asymptotic notation
such as O( · ), Ω( · ) and o( · ) will be used throughout. See [34,
IEEE TRANSACTIONS ON INFORMATION THEORY
4
TABLE III
TABLE OF SYMBOLS USED IN THIS PAPER
Notation
k
r/n → γ
σ = kwk0 /n2
α = k/n2
p = Ekwk0 /k
δ = EkHa k0 /n2
NC (r)
d(C )
Definition
Number of measurements
Rank-dimension ratio
Deterministic noise parameter
Measurement scaling parameter
Random noise parameter
Sparsity factor
Num. of matrices of rank r in C
Minimum rank distance of C
we also analyze the scenario where Ha is relatively sparse.
Our setting is more similar in spirit to the rank minimization
problems analyzed in Recht et al. [5], Meka et al. [6] and
Eldar et al. [27]. However, these works focus on problems in
the reals whereas our focus is the finite field setting.
Section
Section II-B
Section II-B
Section V-A
Section V-B
Section V-B
Section VI
Section VII
Section VII
C. Problem Statement
Sec. I.3] for definitions. For the reader’s convenience, we have
summarized the symbols used in this paper in Table III.
B. System Model
We are interested in the following model: Let X be an
unknown (deterministic or random) square2 matrix in Fqn×n
whose rank is less than or equal to r, i.e., rank(X) ≤ r. The
upper bound on the rank r is allowed to be a function of n,
i.e., r = rn . We assume that r/n → γ and we say that the
limit γ ∈ [0, 1] is the rank-dimension ratio.3 We would like to
recover or estimate X from k linear measurements
X
[Ha ]i,j [X]i,j
a ∈ [k],
(2)
ya = hHa , Xi :=
(i,j)∈[n]2
i.e., ya is the trace of Ha XT . In (2), the sensing or measurement matrices Ha ∈ Fqn×n , a ∈ [k], are random matrices
chosen according to some probability mass function (pmf).
The k scalar measurements ya ∈ Fq , a ∈ [k], are available
for estimating X. We will operate in the so-called highdimensional setting and allow the number of measurements
k to depend on n, i.e., k = kn . Multiplication and addition
in (2) are performed in Fq . In the subsequent sections, we will
also be interested in a generalization of the model in (2) where
the measurements ya , a ∈ [k], may not be noiseless, i.e.,
ya = hHa , Xi + wa ,
a ∈ [k],
(3)
where wa , a ∈ [k], represents random or deterministic noise.
We will specify precise noise models in Section V.
The measurement models we are concerned with in this
paper, (2) and (3), are somewhat different from the matrix
completion problem [2]–[4]. In the matrix completion setup,
a subset of entries Ω ⊂ [n]2 in the matrix X is observed and
one would like to “fill in” the rest of the entries assuming
the matrix is low-rank. This model can be captured by (2) by
choosing each sensing matrix Ha to be non-zero only in a
single position. Assuming Ha 6= Ha′ for all a 6= a′ , the number of measurements is k = |Ω|. In contrast, our measurement
models in (2) and (3) do not assume that kHa k0 = 1. The
sensing matrices are, in general, dense although in Section VI,
2 Our results are not restricted to the case where X is square but for the
most part in this paper, we assume that X is square for ease of exposition.
3 Our results also include the regime where r = o(n) but the case where
r = Θ(n) (and γ is the proportionality constant) is of greater interest and
significance. This is because the rank r grows as rapidly as possible and hence
this regime is the most challenging. Note that if r/n → γ = 1, then we would
need n2 measurements to recover X since we are not making any low rank
assumptions on it. This is corroborated by the converse in Proposition 2.
Our objective is to estimate the unknown low-rank matrix
X given yk (and the measurement matrices Ha , a ∈ [k]). In
general, given the measurement model in (2) and without
any assumptions on X, the problem is ill-posed and it is
not possible to recover X if k < n2 . However, because X
is assumed to have rank no larger than r (and r/n → γ),
we can exploit this additional information to estimate X
with k < n2 measurements. Our goal in this paper is to
characterize necessary and sufficient conditions on the number
of measurements k as n becomes large assuming a particular
pmf governing the sensing matrices Ha , a ∈ [k] and under
various (random and deterministic) models on X.
D. Weak Versus Strong Recovery
In this paper, we will focus (in Sections III to VI) on the
so-called weak recovery problem where the unknown low-rank
matrix X is fixed and we ask how many measurements k are
sufficient to recover X (and what the procedure is for doing
so). However, there is also a companion problem known as
the strong recovery problem, where one would like to recover
all matrices in Fqn×n with rank no larger than r. A familiar
version of this distinction also arises in compressed sensing.4
More precisely, given k sensing matrices Ha , a ∈ [k], we
define the linear operator H : Fqn×n → Fkq as
H(X) := [hH1 , Xi, hH2 , Xi, . . . , hHk , Xi]T .
(4)
Then, a necessary and sufficient condition for strong recovery
is that the operator H is injective when restricted to the set
of all matrices of rank-2r (or less). In other words, there are
no rank-2r (or less) matrices in the nullspace of the operator
H [27, Sec. 2]. This can be observed by noting that for two
matrices X1 and X2 of rank-r (or less) that generate the same
linear observations (i.e., H(X1 ) = H(X2 )), their difference
X1 − X2 has rank at most 2r by the triangle inequality.5 We
would thus like to find conditions on k (via, for example, the
geometry of the random code) such that the following subset
of Fqn×n
(n)
R2r := {X ∈ Fqn×n : rank(X) ≤ 2r}
(5)
is disjoint from the nullspace of H with probability tending to
one as n grows. As mentioned in Section II-B, we allow r to
4 Analogously in compressed sensing, consider the combinatorial ℓ -norm
0
optimization problem minx̃∈Fn {kx̃k0 : Ax̃ = y}, where the field F can
either be the reals R [27] or a finite field Fq [29]. It can be seen that if
we want to recover fixed but unknown s-sparse vector x (weak recovery),
s + 1 linear measurements suffice w.h.p. However, for strong recovery where
we would like to guarantee recovery for all s-sparse vectors, we need to
ensure that the nullspace of the measurement matrix A is disjoint from the
set of 2s-sparse vectors. Thus, w.h.p., 2s measurements are required for strong
recovery [27], [29].
5 Note that (A, B) 7→ rank(A − B) is a metric on the space of matrices.
IEEE TRANSACTIONS ON INFORMATION THEORY
5
grow linearly with n (with proportionality constant γ). Under
(n)
the condition that R2r ∩ nullspace(H) = ∅, the solution to the
rank minimization problem [stated precisely in (12) below] is
unique and correct for all low-rank matrices with probability
tending to one as n grows. As we shall see in Section VII-C,
the conditions on k for strong recovery are more stringent
than those for weak recovery. See the recent paper by Eldar et
al. [27, Sec. 2] for further discussions on weak versus strong
recovery in the real field setting.
for recovery of X to be reliable, i.e., for the probability of Ẽn
to tend to zero as n grows. From a linear algebraic perspective,
this means we need at least as many measurements as there
are degrees of freedom in the unknown object X. Clearly, the
bound in (9) applies to both the noisy and the noiseless models
introduced in Section II-B. The proof involves an elementary
application of Fano’s inequality [35, Sec. 2.10].
Proof: Consider the following lower bounds on the probability of error P(Ẽn ):
(a)
E. Bounds on the number of low-rank matrices
In the sequel, we will find it useful to leverage the following
lemma, which is a combination of results stated in [21,
Lemma 4], [9, Proposition 1] and [12, Lemma 5].
Lemma 1 (Bounds on the number of low-rank matrices). Let
Φq (n, r) and Ψq (n, r) respectively be the number of matrices
in Fqn×n of rank exactly r and the number of matrices in
Fqn×n of rank less than or equal to r. Note that Ψq (n, r) =
P
r
l=0 Φq (n, l). The following bounds hold:
2
2
2
2
q (2n−2)r−r ≤ Φq (n, r) ≤ 4q 2nr−r ,
q 2nr−r ≤ Ψq (n, r) ≤ 4q 2nr−r .
(6)
(7)
In other words, we have from (7) and the fact that r/n → γ
that | n12 logq Ψq (n, r) − 2γ(1 − γ/2)| → 0.
III. A N ECESSARY C ONDITION
FOR
R ECOVERY
This section presents a necessary condition on the scaling of
k with n for the matrix X to be recovered reliably, i.e., for the
error probability in estimating X to tend to zero as n grows. As
with most other converse statements in information theory, it is
necessary to assume a statistical model on the unknown object,
in this case X. Hence, in this section, we denote the unknown
low-rank matrix as X (a random variable). We also assume
that X is drawn uniformly at random from the set of matrices
in Fqn×n of rank less than or equal to r. For an estimator
(deterministic or random function) X̂ : Fkq ×(Fqn×n )k → Fqn×n
whose range is the set of all Fqn×n -matrices whose rank is less
than or equal to r, we define the error event:
Ẽn := {X̂(yk , Hk ) 6= X}.
(8)
This is the event that the estimate X̂(yk , Hk ) is not equal to the
true low-rank matrix X. We emphasize that the estimator can
either be deterministic or random. In addition, the arguments
(yk , Hk ) are random so X̂(yk , Hk ) in the definition of Ẽn is a
random matrix. We can demonstrate the following:
Proposition 2 (Converse). Fix ε > 0 and assume that X is
drawn uniformly at random from all matrices of rank less than
or equal to r. Also, assume X is independent of Hk . If,
k < (2 − ε)γ (1 − γ/2) n2
(9)
then for any estimator X̂ whose range is the set of Fqn×n matrices whose rank is less than or equal to r, P(Ẽn ) ≥ ε/4 >
0 for all n sufficiently large.
Proposition 2 states that the number of measurements k
must exceed 2nr−r2 (which is approximately 2γ(1−γ/2)n2)
P(X̂ 6= X) ≥
(b)
=
(c)
≥
H(X|yk , Hk )−1 H(X)−I(X; yk , Hk )−1
=
logq Ψq (n, r)
logq Ψq (n, r)
H(X) − H(yk |Hk ) − 1
H(X) − I(X; yk |Hk ) − 1
=
logq Ψq (n, r)
logq Ψq (n, r)
k
H(X) − k − 1 (d)
− o(1),
= 1−
logq Ψq (n, r)
logq Ψq (n, r)
(10)
where (a) is by Fano’s inequality (estimating X given yk and
Hk ), (b) is because Hk is independent of X so I(X; yk , Hk ) =
I(X; yk |Hk ) + I(X; Hk ) = I(X; yk |Hk ). Inequality (c) is due
to the fact that ya is q-ary for all a ∈ [k] so
H(yk |Hk ) ≤ H(yk ) ≤ kH(y1 ) ≤ k logq q = k,
(11)
and finally, (d) is due to the uniformity of X. It can be
easily verified that if k satisfies (9) for some ε > 0, then
k/logq Ψq (n, r) ≤ 1−ε/3 for n sufficiently large by the lower
bound in (7) and the convergence r/n → γ. Hence, (10) is
larger than ε/4 for all n sufficiently large.
We emphasize that the assumption that the sensing matrices
Ha , a ∈ [k] are statistically independent of the unknown lowrank matrix X is important. This is to ensure the validity of
equality (b) in (10). This assumption is not a restrictive one in
practice since the sensing mechanism is usually independent
of the unknown matrix.
IV. U NIFORMLY R ANDOM S ENSING M ATRICES : T HE
N OISELESS C ASE
In this section, we assume the noiseless linear model in (2)
and provide sufficient conditions for the recovery of a fixed X
(a deterministic low-rank matrix) given yk , where rank(X) ≤
r. We will also provide the functional form of the reliability
function (error exponent) for this recovery problem. To do so
we first consider the following optimization problem:
minimize
rank(X̃)
subject to hHa , X̃i = ya ,
a ∈ [k]
(12)
The optimization variable is X̃ ∈ Fqn×n . Thus among all the
matrices that satisfy the linear constraints in (2), we select one
whose rank is the smallest. We call the optimization problem
in (12) the min-rank decoder, denoting the set of minimizers
as S ⊂ Fqn×n . If S is a singleton set, we also denote the unique
optimizer to (12), a random quantity, as X∗ . We analyze the
error probability that either S is not a singleton set or X∗ does
not equal the true matrix X, i.e., the error event
En := {|S| > 1} ∪ ({|S| = 1} ∩ { X∗ 6= X}).
(13)
IEEE TRANSACTIONS ON INFORMATION THEORY
6
The optimization in (12) is, in general, intractable (in fact
NP-hard) unless there is additional structure on the sensing
matrices Ha (See discussions in Section IX). Our focus, in this
paper, is on the information-theoretic limits for solving (12)
and its variants. We remark that the minimization problem
is reminiscent of Csiszár’s so-called α-decoder for linear
codes [36]. In [36], Csiszár analyzed the error exponent of
the decoder that minimizes a function α( · ) [e.g., the entropy
H( · )] of the type (or empirical distribution) of a sequence
subject to the sequence satisfying a set of linear constraints.
For this section and Section V, we assume that each element
in each sensing matrix is drawn independently and uniformly
at random from Fq , i.e., from the pmf
∀ h ∈ Fq .
Ph (h; q) = 1/q,
(14)
We call this the uniform or equiprobable measurement model.
For simplicity, throughout this section, we use the notation P to
denote the probability measure associated to the equiprobable
measurement model.
less than or equal to the rank of X. Furthermore, we claim
that P(AZ ) = q −k for every Z 6= X. This follows because
P(AZ ) = P(hZ − X, Ha i = 0, a ∈ [k])
(a)
In this subsection, we assume the noiseless linear model
in (2). We can now exploit ideas from [29] to demonstrate
the following achievability (weak recovery) result. Recall that
X is non-random and fixed, and we are asking how many
measurements y1 , . . . , yk are sufficient for recovering X.
Proposition 3 (Achievability). Fix ε > 0. Under the uniform
measurement model as in (14), if
k > (2 + ε)γ (1 − γ/2) n2
(15)
(18)
where (a) follows from the fact that the Ha are i.i.d. matrices
and (b) from the fact Z − X 6= 0 and every non-zero
element in a finite field has a (unique) multiplicative inverse
so P(hZ − X, H1 i = 0) = q −1 [29], [36]. More precisely, this
is because hZ − X, H1 i has distribution Ph by independence
and uniformity of the elements in H1 . Since r/n → γ, for any
fixed η ′ > 0, |r/n − γ| ≤ η ′ for all n sufficiently large. By the
uniform continuity of the function t 7→ 2t−t2 on t ∈ [0, 1], for
any η > 0, |(2nr − r2 )/n2 − 2γ(1 − γ/2)| ≤ η for all n ≥ Nη
(an integer just depending on η). Now by combining (18) with
the union of events bound,
X
(c)
q −k ≤ Ψq (n, r) q −k
P(En ) ≤
Z:Z6=X,rank(Z)≤rank(X)
(d)
A. A Sufficient Condition for Recovery in the Noiseless Case
(b)
= P(hZ − X, H1 i = 0)k = q −k ,
≤ 4q 2nr−r
2
−k
(e)
2
≤ 4q −n
[−2γ(1−γ/2)−η+k/n2 ]
,
(19)
where (c) follows because rank(X) ≤ r, (d) follows from
the upper bound in (7) and (e) follows for all n sufficiently
large as argued above. Thus, we see that if k satisfies (15), the
exponent in (19) is positive if we choose η ′ sufficiently small
so that η < εγ(1 − γ/2). Hence P(En ) → 0 as desired.
Remark: Here and in the following, we can, without loss
of generality, assume that r = ⌊γn⌋ (in place of r/n → γ).
In this way, we can remove the effect of the small positive
constant η as in the above argument. This simplification does
not affect the precision of any of the arguments in the sequel.
then P(En ) → 0 as n → ∞.
Note that the number of measurements stipulated by Proposition 3 matches the information-theoretic lower bound in (9).
In this sense, the min-rank decoder prescribed by the optimization problem in (12) is asymptotically optimal, i.e., the bounds
are sharp. Note also that in the converse (Proposition 2), the
range of the decoder X̂( · ) is constrained to be the set of
matrices whose rank does not exceed r. Hence, the decoder
in the converse has additional side information – namely the
upper bound on the rank. For the min-rank decoder in (12), no
such knowledge of the rank is required and yet it meets the
lower bound. We remark that the packing-like achievability
proof is much simpler than the typicality-based argument
presented by Vishwanath in [30] (albeit in a different setting).
Proof: To each matrix Z ∈ Fqn×n that is not equal to X
and whose rank is no greater than rank(X), define the event
AZ := {hZ, Ha i = hX, Ha i, ∀ a ∈ [k]}.
(16)
We have shown in the previous section that the min-rank
decoder is asymptotically optimal in the sense that the number
of measurements required for it to decode X reliably with
P(En ) → 0 matches the lower bound (necessary condition) on
k (Proposition 2). It is also interesting to analyze the rate of
decay of P(En ) for the min-rank decoder. For this purpose,
we define the rate R of the measurement model.
Definition 1. The rate of (a sequence of) linear measurement
models as in (2) is defined as
n2 − k
k
= lim 1 − 2
2
n→∞
n→∞
n
n
assuming the limit exists. Note that R ∈ [0, 1].
R := lim
(20)
The use of the term rate is in direct analogy to the use of
the term in coding theory. The rate of the linear code
C := {C ∈ Fqn×n : hC, Ha i = 0, a ∈ [k]}
Then we note that
B. The Reliability Function
(21)
(17)
is Rn := 1 − dim(span{vec(H1 ), . . . , vec(Hk )})/n2 , which
is lower bounded6 by 1 − k/n2 for every k = 0, 1, . . . , n2 .
since an error occurs if and only if there exists a matrix Z 6= X
such that (i) Z satisfies the linear constraints, (ii) its rank is
6 The lower bound is achieved when the vectors vec(H ), . . . , vec(H ) are
1
k
linearly independent in Fq . See Section VII, and in particular Proposition 14,
for details when the sensing matrices are random.
P(En ) = P
[
Z:Z6=X,rank(Z)≤rank(X)
AZ
IEEE TRANSACTIONS ON INFORMATION THEORY
7
We revisit the connection of the rank minimization problem
to coding theory (and in particular to rank-metric codes) in
detail in Section VII.
Definition 2. If the limit exists, the reliability function or error
exponent of the min-rank decoder (12) is defined as
E(R) := lim −
n→∞
1
logq P(En ).
n2
Proposition 4 (Upper bound on E(R)). Assume that
rank(X)/n → γ̃ as n → ∞. Under the uniform measurement
model in (14) and assuming the min-rank decoder is used,
+
lim sup −
n→∞
k
1
logq P(En ) ≤ −2γ̃ (1 − γ̃/2) + η + lim 2 ,
n→∞ n
n2
(25)
2
(22)
We show in Corollary 7 that the limit in (22) indeed exists.
Unlike the usual definition of the reliability function [37,
Eq. (5.8.8)], the normalization in (22) is 1/n2 since X is an
n × n matrix.7 Also, we restrict our attention to the min-rank
decoder. The following proposition provides an upper bound
on the reliability function of the min-rank decoder when there
is no noise in the measurements as in (2).
E(R) ≤ |(1 − R) − 2γ̃ (1 − γ̃/2)| .
1 − R > 2γ̃ (1 − γ̃/2), the normalized logarithm of the error
probability can now be simplified as
Corollary 7 (Reliability function). Under the assumptions of
Proposition 4, the error exponent of the min-rank decoder is
E(R) = |(1 − R) − 2γ̃ (1 − γ̃/2)|+ .
Lemma 5 (de Caen [23]). Let (Ω, F , Q) be a probability
space. For a finite number events B1 , . . . , BM ∈ F , the
probability of their union can be lower bounded as
!
M
M
X
[
Q(Bm )2
Q
.
(24)
Bm ≥
PM
m=1
m=1
m′ =1 Q(Bm ∩ Bm′ )
(26)
Proof: The lower bound on E(R) follows from the
achievability in (19), which may be strengthened as follows:
+
2
2
P(En ) ≤ 4q −n |−2γ̃(1−γ̃/2)−η+k/n | ,
(23)
The proof of this result hinges on the pairwise independence
of the events AZ and de Caen’s inequality [23], which for the
reader’s convenience, we restate here:
2
where we used the fact that 4q n [2γ̃(1−γ̃/2)+η−k/n ] → 0 for
sufficiently small η > 0. The case where 1−R ≤ 2γ̃ (1 − γ̃/2)
results in E(R) = 0 because P(En ) fails to converge to zero
as n → ∞. The proof of the upper bound of the reliability
function is completed by appealing to the definition of R
in (20) and the arbitrariness of η > 0.
(27)
since P(En ) can also be upper bounded by unity. Now, because
| · |+ is continuous, the lower limit of the normalized logarithm
of the bound in (27) can be expressed as follows:
+
1
k
logq P(En ) ≥ −2γ̃ (1 − γ̃/2) − η + lim 2 .
n→∞
n→∞ n
n2
(28)
Combining the upper bound in Proposition 4 and the lower
bound in (28) and noting that η > 0 is arbitrary yields the
reliability function in (26).
We observe that pairwise independence of the events AZ
We now prove Proposition 4.
(Lemma 6) is essential in the proof of Proposition 4. Pairwise
Proof: In order to apply (24) to analyze the error proba- independence is a consequence of the linear measurement
bility in (17), we need to compute the probabilities P(AZ ) and model in (2) and the uniformity assumption in (14). Note that
P(AZ ∩ AZ′ ). The former is q −k as argued in (18). The latter the events AZ are not jointly (nor triple-wise) independent. But
uses the following lemma which is proved in Appendix A.
the beauty of de Caen’s bound allows us to exploit the pairwise
Lemma 6 (Pairwise Independence). For any two distinct independence to lower bound P(En ) and thus to obtain a
matrices Z and Z′ , neither of which is equal to X, the events tight upper bound on E(R). To draw an analogy, just as only
pairwise independence is required to show that linear codes
AZ and AZ′ (defined in (16)) are independent.
achieve capacity in symmetric DMCs, de Caen’s inequality
As a result of this lemma, P(AZ ∩AZ′ ) = P(AZ )P(AZ′ ) = allows us to move the exploitation of pairwise independence
q −2k if Z 6= Z′ and P(AZ ∩ AZ′ ) = P(AZ ) = q −k if Z = Z′ . into the error exponent domain to make statements about the
Now, we apply the lower bound (24) to P(En ) noting from (17) error exponent behavior of ensembles of linear codes.
that En is the union of all AZ such that Z 6= X and rank(Z) ≤
A natural question arises: Is E(R) given in (26) the largest
r̃ := rank(X). Then, for a fixed η > 0, we have
possible exponent over all decoders X̂( · ) for the model in
which Ha follows the uniform pmf? We conjecture that this
X
q −2k
!
is indeed the case, but a proof remains elusive.
P(En ) ≥
P
Z:Z6=X
1) Comparison of error exponents to existing works [38]:
−k 1 +
q −k
Z′ :Z′ 6=X,Z
rank(Z)≤rank(X) q
′
As
mentioned in the Introduction, the preceding results can
rank(Z )≤rank(X)
2
2
2
be
interpreted
from a coding-theoretic perspective. This is
(a) (q 2nr̃−r̃ − 1)q −k (b) q n [2γ̃(1−γ̃/2)−η−k/n ] − q −k
indeed
what
we
will do in Section VII. In this subsection,
≥
,
≥
1 + 4q 2nr̃−r̃2 −k
1 + 4q n2 [2γ̃(1−γ̃/2)+η−k/n2 ]
we compare the reliability function derived in Corollary 7
where (a) is from the upper and lower bounds in (7) and with three other coding techniques present in the literature.
(b) holds for all n sufficiently large since r̃/n → γ̃. See First, we have the well-known construction of maximum rank
argument justifying inequality (c) in (19). Assuming that distance (MRD) codes by Gabidulin [7]. Second, we have
the error trapping technique [18] alluded to in Section I-A.
7 The “block-length” of the code C in (21) is n2 .
Third, we have a combination of the two preceding code
lim inf −
IEEE TRANSACTIONS ON INFORMATION THEORY
8
elements of Ha are i.i.d. and uniform in Fq . The noise w is first
assumed in Section V-A to be deterministic but unknown. We
then extend our results to the situation where w is a random
vector in Section V-B.
constructions which is discussed in [18, Section VI.E]. To
perform this comparison, we define another reliability function
E1 (R) that is “normalized by n”. This is simply the quantity
in (22) where the normalization is 1/n instead of 1/n2 . We
now denote the reliability function normalized by n2 as in (22)
by E2 (R). We also use various superscripts on E1 and E2
to denote different coding schemes. Hence, for our encoding
and decoding strategy using random sensing and min-rank
decoding (RSMR), E1RSMR (R) = ∞ for all R ≤ (1 − γ)2
and E2RSMR (R) is given by (26).
Since Gabidulin codes are MRD, they achieve the Singleton
bound [12, Section III]) for rank-metric codes given by n2 −
k ≤ n(n − dR + 1), where dR is the minimum rank distance
of the code in (21) [See exact definitions in (48) and (49)].
Thus, it can be verified that for j = 1, 2,
∞ R ≤ 1 − 2γ
EjGab (R) =
.
(29)
0
else
In the deterministic setting, we assume that kwk0 = ⌊σn2 ⌋
for some noise level σ ∈ (0, k/n2 ]. Instead of using the
minimum entropy decoder as in [29] (also see [36]), we
consider the following generalization of the min-rank decoder:
From [18, Section IV.B, Eq. (12)], it can also be checked that
for the error trapping coding strategy, assuming the low-rank
error matrix is uniformly distributed over those of rank r,
√ +
E2ET (R) = 0.
(30)
E1ET (R) = 1 − γ − R ,
Proposition 8 (Achievability under deterministic noisy measurement model). Fix ε > 0 and choose λ = 1/n. Assume the
uniform measurement model and that kwk0 = ⌊σn2 ⌋. If
Finally, from [18, Section VI.E], for the combination of
Gabidulin coding and error trapping, under the same condition
of uniformity,
E1GabET (R) = 1 − γ −
R
1−γ
+
,
E2GabET (R) = 0. (31)
Note that for the error exponents in (29), (30) and (31), the
randomness is over the low-rank error matrix X and not the
code construction, which is deterministic. In contrast, our
coding strategy RSMR involves a random encoding scheme.
It can be seen from (29) to (31) that there is a non-trivial
interval of rates R := [1 − 2γ, (1 − γ)2 ] in which our
reliability functions E1RSMR (R) and E2RSMR (R) are the best
(largest). Indeed, in the interval R, E1RSMR (R) = ∞ and
our result in (22) implies that E2RSMR (R) > 0 whereas all
the abovementioned coding schemes give E2 (R) = 0. Thus,
using both a random code for encoding and min-rank decoding
is advantageous from a reliability function standpoint in the
regime R ∈ R. Furthermore, as we shall see from (40)
in Section VI which deals with the sparse sensing setting
(SRSMR), E1SRSMR (R) = ∞ and E2SRSMR (R) = 0 for
all R ≤ (1 − γ)2 . Such an encoding scheme using sparse
parity-check matrices may be amenable for the design of
low-complexity decoding strategies that also have good error
exponent properties. In general though, our min-rank decoder
requires exhaustive search (though Section VIII proposes
techniques to reduce the search space), while all the preceding
techniques have polynomial-time decoding complexity.
V. U NIFORMLY R ANDOM S ENSING M ATRICES : T HE
N OISY C ASE
We now generalize the noiseless model and the accompanying results in Section IV to the case where the measurements
yk are noisy as in (3). As in Section IV, we assume that the
A. Deterministic Noise
minimize
rank(X̃) + λkw̃k0
subject to
hHa , X̃i + w̃a = ya ,
a ∈ [k]
(32)
The optimization variables are X̃ ∈ Fqn×n and w̃ ∈ Fkq . The
parameter λ = λn > 0 governs the tradeoff between the rank
of the matrix X and the sparsity of the vector w. Let Hq (p) :=
−p logq (p) − (1 − p) logq (p) be the (base-q) binary entropy.
k>
(3 + ε)(γ + σ)[1 − (γ + σ)/3] 2
n ,
1 − H2 [1/(3 − (γ + σ))] logq 2
(33)
then P(En ) → 0 as n → ∞.
The proof of this proposition is provided in Appendix B.
Since the prefactor in (33) is a monotonically increasing
function in the noise level σ, the number of measurements
increases as σ increases, agreeing with intuition. Note that the
regularization parameter λ is chosen to be 1/n and is thus
independent of σ. Hence, the decoder does not need to know
the true value of the noise level σ. The factor of 3 (instead of 2)
in (33) arises in part due to the uncertainty in the locations of
the non-zero elements of the noise vector w. We remark that
Proposition 8 does not reduce to the noiseless case (σ = 0)
in Proposition 3 because we assumed a different measurement
model in (3), and employed a different bounding technique.
The measurement complexity in (33) is suboptimal, i.e., it
does not match the converse in (9). This is because the decoder
in (32) estimates both the matrix X and the noise w whereas
in the derivation of the converse, we are only concerned with
reconstructing the unknown matrix X. By decoding (X, w)
jointly, the analysis proceeds along the lines of the proof of
Proposition 3. It is unclear whether a better parameter-free
decoding strategy exists in the presence of noise and whether
such a strategy is also amenable to analysis. The noisy setting
was also analyzed in [31] but, as in our work, the number of
measurements for achievability does not match the converse.
B. Random Noise
We now consider the case where the noise in (3) is random,
i.e., w = (w1 , . . . , wk ) ∈ Fkq is a random vector. We assume
the noise vector w is i.i.d. and each component is distributed
according to any pmf for which
Pw (w; p) = 1 − p
if
w = 0.
(34)
IEEE TRANSACTIONS ON INFORMATION THEORY
9
Plot of the critical α against p for q = 2
0.8
Corollary 9 (Converse under random noise model). Assume
the setup in Proposition 2 and consider the noisy measurement
model given by (3) and (34). Additionally, assume that X, Hk
and w are jointly independent. If,
(2 − ε)γ (1 − γ/2) 2
n
1 − Hq (p)
Note that the probability of error P(Ẽn ) above is computed
over both the randomness in the sensing matrices Ha and in
the noise w. The proof is given in Appendix C. From (35),
the number of measurements necessarily has to increase by a
factor of 1/(1 − Hq (p)) for reliable recovery. As expected, for
a fixed q, the larger the crossover probability p ∈ (0, 1/2), the
more measurements are required. The converse is illustrated
for different parameter settings in Figs. 1 and 2.
To present our achievability result compactly, we assume
that k = ⌈αn2 ⌉ for some scaling parameter α ∈ (0, 1), i.e., the
number of observations is proportional to n2 and the constant
of proportionality is α. We would like to find the range of
values of the scaling parameter α such that reliable recovery
is possible. Recall that the upper bound on the rank is r and
the noise vector has expected weight pk ≈ pαn2 .
Corollary 10 (Achievability under random noisy measurement
model). Fix ε > 0 and choose λ = 1/n. Assume the uniform
measurement model and that k = ⌈αn2 ⌉. Define the function
g(α; p, γ) := α 1 − (logq 2)H2 (p + γ/α) − 2p(1 − γ) +α2 p2 .
(36)
If the tuple (α, p, γ) satisfies the following inequality:
g(α; p, γ) ≥ (2 + ε)γ(1 − γ/2),
0.2
(35)
then for any estimator, P(Ẽn ) ≥ ε/4 > 0 for all n sufficiently
large, where Ẽn is defined in (8).
0.6
0.4
0
0
0.02
0.04
0.06
Crossover probability p
0.08
0.1
Fig. 1. Plot of αcrit against p for q = 2. Both αcrit for the converse (con)
in (35) the achievability (ach) in (37) are shown. All α’s below the converse
curves are not achievable.
Plot of the critical α against p for q = 256
0.22
0.2
0.18
0.16
αcrit
k<
γ = 0.075 (ach)
γ = 0.050 (ach)
γ = 0.025 (ach)
γ = 0.075 (con)
γ = 0.050 (con)
γ = 0.025 (con)
1
αcrit
This pmf represents a noisy channel where every symbol
is changed to some other (different) symbol independently
with crossover probability p ∈ (0, 1/2). We can ask how
many measurements are necessary and sufficient for recovering
a fixed X in the presence of the additive stochastic noise
w. Also, we are interested to know how this measurement
complexity depends on p. We leverage on Propositions 2
and 8 to derive a converse result and an achievability result
respectively. We start with the converse, which is partially
inspired by Theorem 3 in [31].
0.14
0.12
0.1
0.08
0.06
0.04
Fig. 2.
0
0.02
0.04
0.06
Crossover probability p
0.08
0.1
Plot of αcrit against p for q = 256. See Fig. 1 for the legend.
(37)
then P(En ) → 0 as n → ∞.
The proof of this corollary uses typicality arguments and
is presented in Appendix D. As in the deterministic noise
setting, the sufficient condition in (37) does not reduce to the
noiseless case (p = 0) in Proposition 3. It also does not match
the converse in (35). This is due to the different bounding
technique employed to prove Corollary 10 [both X and w are
decoded in (32)]. In addition, the inequality in (37) does not
admit an analytical solution for α. Hence, we search for the
critical α [the minimum one satisfying (37)] numerically for
some parameter settings. See Figs. 1 and 2 for illustrations
of how the critical α varies with (p, γ) when the field size is
small (q = 2) and when it is large (q = 256).
From Fig. 1, we observe that the noise results in a significant
increase in the critical value of the scaling parameter α
when q = 2. We see that for a rank-dimension ratio of
γ = 0.05 and with a crossover probability of p = 0.02,
the critical scaling parameter is αcrit ≈ 0.32. Contrast this
to the noiseless case (Proposition 3) and the converse result
for the noisy case (Corollary 9) which stipulate that the
critical scaling parameters are 2γ(1 − γ/2) ≈ 0.098 and
2γ(1 − γ/2)/(1 − H2 (p)) ≈ 0.114 respectively. Hence,
we incur roughly a threefold increase in the number of
measurements to tolerate a noise level of p = 2%. This
phenomenon is due to our incognizance of the locations of
the non-zero elements of w (and hence knowledge of which
measurements ya are reliable). In contrast to the reals, in the
IEEE TRANSACTIONS ON INFORMATION THEORY
10
Comparison of αcrit between TBD and EM
For example, in low-density parity-check (LDPC) codes, the
parity-check matrix (analogous to the set of Ha matrices) is
sparse. The sparsity aids in decoding via the sum-product algorithm [39] as the resulting Tanner (factor) graph is sparse [26].
In [32], the authors considered the case where the generator
matrices are sparse and random but their setting is restricted
to the BSC and BEC channel models.
In this section, we revisit the noiseless model in (2) and
analyze the scenario where the sensing matrices are sparse
on average. More precisely, each element of Ha , a ∈ [k], is
assumed to be an i.i.d. random variable with associated pmf
1−δ
h=0
Ph (h; δ, q) :=
.
(39)
δ/(q − 1) h ∈ Fq \ {0}
0.55
p = 0.05 (TBD)
p = 0.10 (TBD)
p = 0.05 (EM)
p = 0.10 (EM)
p = 0.05 (con)
p = 0.10 (con)
0.5
0.45
αcrit
0.4
0.35
0.3
0.25
0.2
0.15
0.1
1
2
3
4
5
6
7
8
log2 (q)
Fig. 3. Plot of αcrit against log2 (q) for our work (TBD Corollary 10), the
converse in Corollary 9 and Emad and Milenkovic (EM) [31].
finite field setting, there is no notion of the “size” of the
noise (per measurement). Hence, estimation performance in
the presence of noise does not degrade as gracefully as in the
reals (cf. [6, Theorem 1.2]). However, when the field size is
large (more likened to the reals), the degradation is not as
severe. This is depicted in Fig. 2. Under the same settings as
above, αcrit ≈ 0.114, which is not too far from the converse
(2γ(1 − γ/2)/(1 − H256 (p)) ≈ 0.099).
As a final remark, we compare the decoders for the noisy
model in (32) and that in [31]. In [31], the authors considered
the (analog of) following decoder (for tensors):
minimize
rank(X̃)
subject to
k yX̃ − y k0 ≤ τ,
(38)
where yX̃ := [hH1 , X̃i . . . hHk , X̃i]T and y = yk is the
noisy observation vector in (3). However, the threshold τ
that constrains the Hamming distance between yX̃ and y is
not straightforward to choose.8 Our decoder, in contrast, is
parameter-free because the regularization constant λ in (32)
can be chosen to be 1/n, independent of all other parameters.
In addition, Fig. 3 shows that at high q, our decoder and
analysis result in a better (smaller) αcrit than that in [31].
Our decoding scheme gives a bound that is closer to the
converse at high q while the decoding scheme in [31] is farther.
The slight disadvantage of our decoder is that the number of
measurements in (37) cannot be expressed in closed-form.
VI. S PARSE R ANDOM S ENSING M ATRICES
In the previous two sections, we focused exclusively on the
case where the elements of the sensing matrices Ha , a ∈ [k],
are drawn uniformly from Fq . However, there is substantial
motivation to consider other ensembles of sensing matrices.
8 In fact, the achievability result of Theorem 4 in [31] says that τ = ηk
where η ∈ (p, (q − 1)/q) but for our optimization program in (32), the
decoder does not need to know the crossover probability p.
Note that if δ is small, then the probability that an entry
in Ha is zero is close to unity. The problem of deriving a
sufficient condition for reliable recovery is more challenging
as compared to the equiprobable case since (18) no longer
holds (compare to Lemma 21). Roughly speaking, the matrix
X is not sensed as much as in the equiprobable case and the
measurements yk are not as informative because Ha , a ∈ [k],
are sparse. In the rest of this section, we allow the sparsity
factor δ to depend on n but we do not make the dependence of
δ on n explicit for ease of exposition. The question we would
like to answer is: How fast can δ decay with n such that the
min-rank decoder is still reliable for weak recovery?
Theorem 11 (Achievability under sparse measurement model).
Fix ε > 0 and let δ be any sequence in Ω( logn n ) ∩ o(1). Under
the sparse measurement model as in (39), if the number of
measurements k satisfies (15) for all n > Nε,δ , then P(En ) →
0 as n → ∞.
The proof of Theorem 11, our main result, is detailed in
Appendix E. It utilizes a “splitting” technique to partition the
set of misleading matrices {Z 6= X : rank(Z) ≤ rank(X)}
into those with low Hamming distance from X and those with
high Hamming distance from X.
Observe that the sparsity-factor δ is allowed to tend to zero
albeit at a controlled rate of Ω( logn n ). Thus, each Ha is allowed
to have, on average, Ω(n log n) non-zero entries (out of n2
entries). The scaling rate is reminiscent of the number of trials
required for success in the so-called coupon collector’s problem. Indeed, it seems plausible that we need at least one entry
in each row and one entry in each column of X to be sensed
(by a sensing matrix Ha ) for the min-rank decoder to succeed.
It can easily be seen that if δ = o( logn n ), there will be at least
one row and one column in Ha of zero Hamming weight w.h.p.
Really surprisingly, the number of measurements required in
the δ = Ω( logn n )-sparse sensing case is exactly the same as
in the case where the elements of Ha are drawn uniformly at
random from Fq in Proposition 3. In fact it also matches the
information-theoretic lower bound in Proposition 2 and hence
is asymptotically optimal. We will analyze this weak recovery
sparse setting (and understand why it works) in greater detail
by studying minimum distance properties of sparse paritycheck rank-metric codes in Section VII-B. The sparse scenario
may be extended to the noisy case by combining the proof
techniques in Proposition 8 and Theorem 11.
IEEE TRANSACTIONS ON INFORMATION THEORY
11
There are two natural questions at this point: Firstly, can
the reliability function be computed for the min-rank decoder
assuming the sparse measurement model? The events AZ ,
defined in (16), are no longer pairwise independent. Thus, it is
not straightforward to compute P(AZ ∩AZ′ ) as in the proof of
Proposition 4. Further, de Caen’s lower bound may not be tight
as in the case where the entries of the sensing matrices are
drawn uniformly at random from Fq . Our bounding technique
for Theorem 11 only ensures that
1
logq P(En ) ≤ −C
(40)
lim sup
n→∞ n log n
for some non-trivial C ∈ (0, ∞). Thus, instead of having a
speed9 of n2 in the large-deviations upper bound, we have
a speed of n log n. This is because δ is allowed to decay to
zero. Whether the speed n log n is optimal is open. Secondly,
is δ = Ω( logn n ) the best (smallest) possible sparsity factor? Is
there a fundamental tradeoff between the sparsity factor δ and
(a bound on) the number of measurements k? We leave these
for further research.
VII. C ODING -T HEORETIC I NTERPRETATIONS AND
M INIMUM R ANK D ISTANCE P ROPERTIES
This section is devoted to understand the coding-theoretic
interpretations and analogs of the rank minimization problem
in (12). In particular, we would like to understand the geometry of the random linear rank-metric codes that underpin
the optimization problem in (12) for both the equiprobable
ensemble in (14) and the sparse ensemble in (39).
As mentioned in the Introduction, there is a natural correspondence between the rank minimization problem and rankmetric decoding [7]–[12]. In the former, we solve a problem
of the form (12). In the latter, the code C typically consists
of length-n vectors10 whose elements belong to the extension
field Fqn and these vectors in Fnqn a belong to the kernel of
some linear operator H. A particular vector codeword c ∈ C
is transmitted. The received word is r = c + x, where x is
assumed to be a low-rank “error” vector. (By rank of a vector
we mean that there exists a fixed basis of Fqn over Fq and the
rank of a vector a ∈ Fnqn is defined as the rank of the matrix
A ∈ Fqn×n whose elements are the coefficients of a in the
basis. See [10, Sec. VI.A] for details of this isomorphic map.)
The optimization problem for decoding c given r is then
minimize
subject to
rank(r − c)
c∈C
(41)
which is identical to the min-rank problem in (12) with the
identification of the low error vector x ≡ r − c. Note that
the matrix version of the vector r (assuming a fixed basis),
denoted as R, satisfies the linear constraints in (2). Since the
assignment (A, B) 7→ rank(A−B) is a metric on the space of
matrices [10, Sec. II.B], the problem in (41) can be interpreted
as a minimum (rank) distance decoder.
9 The term speed is in direct analogy to the theory of large-deviations [40]
where Pn is said to satisfy a large-deviations upper bound with speed an and
rate function J( · ) if lim supn→∞ a−1
n log Pn (E) ≤ − inf x∈cl(E) J(x).
10 We abuse notation by using a common symbol C to denote both a code
consisting of vectors with elements in Fqn and a code consisting of matrices
with elements in Fq .
A. Distance Properties of Equiprobable Rank-Metric Codes
We formalize the notion of an equiprobable linear code and
analyze its rank distance properties in this section. The results
we derive here are the rank-metric analogs of the results in
Barg and Forney [19] and will prove to be useful in shedding
light on the geometry involved in the sufficient condition for
recovering the unknown low-rank matrix X in Proposition 3.
Definition 3. A rank-metric code is a non-empty subset
of Fqn×n endowed with the the rank distance (A, B) 7→
rank(A − B).
Definition 4. We say that C ⊂ Fqn×n is an equiprobable linear
rank-metric code if
C := {C ∈ Fqn×n : hC, Ha i = 0, a ∈ [k]}
(42)
where Ha , a ∈ [k] are random matrices where each entry is
statistically independent of other entries and equiprobable in
Fq , i.e., with pmf given in (14). Each matrix C ∈ C is called a
codeword. Each matrix Ha is said to be a parity-check matrix.
Recall that the inner product is defined as hC, Ha i =
Tr(C HTa ). We reiterate that in the coding theory literature [7]–
[12], rank-metric codes usually consist of length-n vectors
c ∈ C whose elements belong to the extension field Fqn . We
refrain from adopting this approach here as we would like to
make direct comparisons to the rank minimization problem,
where the measurements are generated as in (2).11 Hence, the
term codewords will always refer to matrices in C .
Definition 5. The number of codewords in the code C of rank
r (r = 0, 1, . . . , n) is denoted as NC (r).
Note that NC (r) is a random variable since C ⊂ Fqn×n is
a random subspace. This quantity can also be expressed as
X
NC (r) :=
I{M ∈ C },
(43)
M∈Fn×n
:rank(M)=r
q
where I{M ∈ C } is the (indicator) random variable which
takes on the value one if M ∈ C and zero otherwise. Note that
the matrix M is deterministic, while the code C is random. We
remark that the decomposition of NC (r) in (43) is different
11 The usual approach to defining linear rank-metric codes [7], [8] is the
following: Every codeword in the codebook, c ∈ Fn
, is required to satisfy
qN
Pn
the m parity-check constraints
i=1 ha,i ci = 0 ∈ Fq N for a ∈ [m] and
where ha,i ∈ FqN and ci ∈ FqN are, respectively, the i-th elements of ha
and c. Note that in the paper we focus on the case N = n, but make the
distinction here to connect directly with the coding literature. We can reexpress
each of these m constraints as N matrix trace constraints in Fq , per (42), as
follows. Consider any basis B for FqN over Fq , B = {b1 , . . . , bN }, where
P
bj ∈ FqN . We represent ha,i and ci in this basis as ha,i = N
j=1 ha,i,j bj
PN
and ci = k=1 ci,k bk , respectively. Let H̃a be the n × N matrix whose
(i, j)-th entry is the coefficient ha,i,j ∈ Fq and C be similarly defined by the
ci,k ∈ Fq . Now define ωj,k,l as the coefficients in Fq of the representation
PN
of bj bk , i.e., bj bk =
l=1 ωj,k,l bl . Define Ωl to be the symmetric
N × N matrix whose (j, k)-th entry is ωj,k,l . By substituting the expansions
for ha and c into the standard parity-check definition and making use of
the fact that the basis elements
bj are linearly independent, we discover the
P
following: the constraint n
i=1 ha,i ci = 0 is equivalent to the N constraints
Tr(CΩl H̃T
a ) = 0 ∈ Fq for l ∈ [N ]. If we define H̃a Ωl for each a ∈ [m],
l ∈ [N ] to be one of the constraints in (42), we get that the set of C matrices
C satisfying (42) is the rank-metric codes defined by the ha , a ∈ [m]. A
simple relation between the Ωl matrices holds if the basis is chosen to be a
normal basis [41, Def. 2.32].
IEEE TRANSACTIONS ON INFORMATION THEORY
12
from that in Barg and Forney [19, Eq. (2.3)] where the authors
considered and analyzed the analog of the sum
X
I{rank(Cj ) = r},
(44)
ÑC (r) :=
j∈{1,...,|C |} : Cj 6=0
where j ∈ {1, . . . , |C |} indexes the (random) codewords in
C . Note that ÑC (r) = NC (r) for all r ≥ 1 but they differ
when r = 0 (ÑC (0) = 0 while NC (0) = 1). It turns out
that the sum in (43) is more amenable to analysis given
that our parity-check (sensing) matrices Ha , a ∈ [k], are
random (as in Gallager’s work in [20, Theorem 2.1]) whereas
in [19, Sec. II.C], the generators are random.12 Recall the rankdimension ratio γ is the limit of the ratio r/n as n → ∞.
Using (43), we can show the following:
Lemma 12 (Moments of NC (r)). For r = 0, NC (r) = 1. For
1 ≤ r ≤ n, the mean of NC (r) satisfies
q −k+2rn−r
2
−2r
2
≤ ENC (r) ≤ 4q −k+2rn−r .
(45)
Furthermore, the variance of NC (r) satisfies
var(NC (r)) ≤ ENC (r).
(46)
The proof of Lemma 12 is provided in Appendix F. Observe
from (45) that the average number of codewords with rank r,
namely ENC (r), is exponentially large (in n2 ) if k < (2 −
ε)γ(1−γ/2)n2 (compare to the converse in Proposition 2) and
exponentially small if k > (2 + ε)γ(1 − γ/2)n2 (compare to
the achievability in Proposition 3). By Chebyshev’s inequality,
an immediate corollary of Lemma 12 is the following:
Corollary 13 (Concentration of number of codewords of rank
r). Let fn be any sequence such that limn→∞ fn = ∞. Then,
p
lim P |NC (r) − ENC (r)| ≥ fn ENC (r) = 0. (47)
n→∞
Thus, NC (r) concentrates to its mean in the sense of (47).
A similar result for the random generator case was developed
in [9, Corollary 1]. Also, our derivations based on Lemma 12
are cleaner and require fewer assumptions. We now define the
notion of the minimum rank distance of a rank-metric code.
Definition 6. The minimum rank distance of a rank-metric
code C is defined as
dR (C ) :=
min
C1 ,C2 ∈C :C1 6=C2
rank(C1 − C2 ).
(48)
By linearity of the code C , it can be seen that the minimum
rank distance in (48) can also be written as
dR (C ) :=
min
C∈C :C6=0
rank(C).
(49)
Thus, the minimum rank distance of a linear code is equal to
the minimum rank over all non-zero matrix codewords.
Definition 7. The relative minimum rank distance of a code
C ⊂ Fqn×n is defined as dR (C )/n.
Note that the relative minimum rank distance is a random
variable taking on values in the unit interval. In this section,
12 Indeed, if the generators are random, it is easier to derive the statistics
of the number of codewords of rank r using (44) instead of (43).
we assume there exists some α ∈ (0, 1) such that k/n2 → α
(cf. Section V-B). This is the scaling regime of interest.
Proposition 14 (Asymptotic linear independence). Assume
that each random matrix Ha ∈ Fqn×n consists of independent
entries that are drawn according to the pmf in (39). Let
m := dim(span{vec(H1 ), . . . , vec(Hk )}). If δ ∈ Ω( logn n ),
then m/k → 1 almost surely (a.s.).
The proof of this proposition is a consequence of a result
by Blömer et al. [42]. We provide the details in Appendix G.
We would now like to define the notion of the rate of a random code. Strictly speaking, since C is a random linear code,
the rate of the code should be defined as the random variable
R̃n := 1 − m/n2 . However, a consequence of Proposition 14
is that R̃n /(1 − k/n2 ) → 1 a.s. if δ ∈ Ω( logn n ). Note that
this prescribed rate of decay of δ subsumes the equiprobable
model (of interest in this section) as a special case. (Take
δ = (q − 1)/q to be constant.) In light of Proposition 14, we
adopt the following definition:
Definition 8. The rate of the linear rank-metric code C [as
in (42)] is defined as
Rn :=
k
n2 − k
= 1 − 2.
n2
n
(50)
The limit of Rn in (50) is denoted as R ∈ [0, 1]. Note also
that R̃n /R → 1 a.s.
Proposition 15 (Lower bound on relative minimum distance).
Fix ε > 0. For any R ∈ [0, 1], the probability that the
equiprobable linear code
√ in (42) has relative minimum rank
distance less than 1 − R − ε goes to zero as n → ∞.
Proof: Assume13 ε ∈ (0, 2(1 − γ)) and define the positive
constant ε′ := 2ε(1 − γ) − √
ε2 . Consider a sequence of ranks
r such that r/n → γ ≤ 1 − R − ε. Fix η = ε′ /2 > 0. Then,
by Markov’s inequality and (45), we have
2
k
P(NC (r) ≥ 1) ≤ ENC (r) ≤ 4q −n [ n2 −2γ(1−γ/2)−η] , (51)
√
for all n > Nε′ . Since γ ≤ 1 − R − ε, we may assert
by invoking the definition of R that k ≥ (2γ(1 − γ/2) +
ε′ )n2 . Hence, the exponent in square parentheses in (51) is
no smaller than ε′ /2. This implies that P(NC (r) ≥ 1) → 0
or equivalently, P(NC (r) = 0) → 1. In other words, there are
no matrices of rank r in the equiprobable linear code C with
′ 2
probability at least 1 − 4q −ε n /2 for all n > Nε′ .
We now introduce some additional notation. We say that
two positive sequences {an }n∈N and {bn }n∈N are equal to
..
second order in the exponent (denoted an = bn ) if
lim
n→∞
an
1
logq
= 0.
n2
bn
(52)
Proposition 16 (Concentration of relative minimum distance).
Fix ε > 0. For any R ∈ [0,
√ 1], if r is a sequence of ranks
such that r/n → γ ≥ 1 − R + ε, then the probability that
2
..
NC (r) = q −k+2γ(1−γ/2)n goes to one as n → ∞.
13 The restriction that ε < 2(1−γ) is not a serious one since the validity of
the claim in Proposition 15 for some ε0 > 0 implies the same for all ε > ε0 .
IEEE TRANSACTIONS ON INFORMATION THEORY
13
Proof:
√ If the sequence of ranks r is such that r/n →
γ ≥ 1 − R + ε, then the average number of matrices in the
code of rank r, namely ENC (r), is exponentially large. By
Markov’s inequality and the triangle inequality,
Definition 9. We say that C is a δ-sparse linear rank-metric
code if C is as in (42) and where Ha , a ∈ [k] are random
matrices where each entry is statistically independent and
drawn from the pmf Ph ( · ; δ, q) defined in (39).
E|NC (r) − ENC (r)|
t
2ENC (r)
≤
.
(53)
t
To analyze the number of matrices of rank r in this random
ensemble NC (r), we partition the sum in (43) into subsets of
matrices based on their Hamming weight, i.e.,
P(|NC (r) − ENC (r)| ≥ t) ≤
2
2
Choose t := q −k+(2γ(1−γ/2)+η)n +n , where η is given in the
proof of Proposition 15. Then, applying (45) to (53) yields
P(|NC (r) − ENC (r)| ≥ t) ≤ 8q −n → 0.
(54)
Hence, NC (r) ∈ (ENC (r) − t, ENC (r) + t) with probability
exceeding 1 − 8q −n . Furthermore, it is easy to verify that
2
..
ENC (r) ± t = q −k+2γ(1−γ/2)n , as desired.
Propositions 15 and 16 allow us to conclude that with probability approaching one (exponentially fast) as n → ∞, the
relative minimum rank distance of the equiprobable
linear
√ code
√
in (42) is contained in the interval (1 − R − ε, 1 − R + ε)
for all R ∈ [0, 1]. The analog of the Gilbert-Varshamov (GV)
distance [19, Sec. II.C] is thus
√
(55)
γGV (R) := 1 − R.
Indeed, by substituting the definition of R into NC (r) in
Proposition 16, we see that a typical (in the sense of [19])
equiprobable linear rank-metric code has distance distribution:
.. 2
2
= q n [R−(1−γ) ] γ ≥ γGV (R) + ε,
Ntyp (r)
(56)
= 0
γ ≤ γGV (R) − ε.
We again remark that Loidreau in [9, Sec. 5] also derived
results for uniformly random linear codes in the rank-metric
that are somewhat similar to Propositions 15 and 16. However,
our derivations are more straightforward and require fewer
assumptions. As mentioned above, we assume that the paritycheck matrices Ha , a ∈ [k], are random (akin to [20, Theorem 2.1]), while the assumption in [9, Sec. 5] is that the
generators are random and linearly independent. Furthermore,
to the best of our knowledge, there are no previous studies on
the minimum distance properties for the sparse parity-check
matrix setting. We do this in Section VII-B.
From the rank distance properties, we can re-derive the
achievability (weak recovery) result in Proposition 3 by using
the definition of R and solving the following inequality for k:
√
1 − R − ε ≥ γ.
(57)
This provides geometric intuition as to why the min-rank decoder succeeds on average; the typical relative minimum rank
distance of the code should exceed the rank-dimension ratio
for successful error correction. We derive a stronger condition
(known as the strong recovery condition) in Section VII-C.
B. Distance Properties of Sparse Rank-Metric Codes
In this section, we derive the analog of Proposition 15 for
the case where the code C is characterized by sparse sensing
(or measurement or parity-check) matrices Ha , a ∈ [k].
NC (r) =
n
X
X
d=0 M∈Fn×n
:rank(M)=r,kMk0 =d
q
I{M ∈ C }. (58)
Define θ(d; δ, q, k) := [q −1 + (1 − q −1 )(1 − δ/(1 − q −1 ))d ]k .
As shown in Lemma 21 in Appendix E, this is the probability
that a non-zero matrix M of Hamming weight d belongs to the
δ-sparse code C . We can demonstrate the following important
bound for the δ-sparse linear rank-metric code:
Lemma 17 (Mean of NC (r) for sparse codes). For r = 0,
NC (r) = 1. If 1 ≤ r ≤ n and η > 0,
2
ENC (r) ≤ 2n
H2 (β)
2
(q − 1)βn (1 − δ)k +
2
1
+ 4n2 q n [2γ(1−γ/2)+η+ n2
logq θ(⌈βn2 ⌉; δ,q,k)]
, (59)
for all β ∈ [0, 1/2] and all n ≥ Nη .
By using the sum in (58), one sees that this lemma can be
justified in exactly the same way as Theorem 11 (See steps
leading to (81) and (82) in Appendix E). Hence, we omit its
proof. Lemma 17 allows us to find a tight upper bound on the
expectation of NC (r) for the sparse linear rank-metric code
by optimizing over the free parameter β ∈ [0, 1/2]. It turns
out β = Θ( logδ n ) is optimum. In analogy to Proposition 15 for
the equiprobable linear rank-metric code, we can demonstrate
the following for the sparse linear rank-metric code.
Proposition 18 (Lower bound on relative minimum distance
for sparse codes). Fix ε > 0 assume that δ = Ω( logn n ) ∩ o(1).
For any R ∈ [0, 1], the probability that the sparse
√ linear code
has relative minimum distance less than 1 − R − ε goes to
zero as n → ∞.
Proof: The condition on the minimum distance implies
that k > (2 + ε̃)γ(1 − γ/2)n2 for some ε̃ > 0 (for sufficiently
small ε). See detailed argument in proof of Proposition 15.
This implies from Theorem 11, Lemma 17 and Markov’s
inequality that P(NC (r) ≥ 1) → 0.
Proposition 18 asserts that the relative minimum rank distance of a√δ = Ω( logn n )-sparse linear rank-metric code is at
least 1 − R − ε w.h.p. Remarkably, this property is exactly
the same as that of a (dense) linear code (cf. Proposition 15)
in which the entries in the parity-check matrices Ha are
statistically independent and equiprobable in Fq . The fact that
the (lower bounds on the) minimum distances of both ensembles of codes coincide explains why the min-rank decoder
matches the information-theoretic lower bound (Proposition 2)
in the sparse setting (Theorem 11) just as in the dense one
(Proposition 3). Note that only an upper bound of ENC (r) as
in (59) is required to make this claim.
IEEE TRANSACTIONS ON INFORMATION THEORY
14
C. Strong Recovery
We now utilize the insights gleaned from this section
to derive results for strong recovery (See Section II-D and
also [27, Sec. 2] for definitions) of low-rank matrices from
linear measurements. Recall that in strong recovery, we are
interested in recovering all matrices whose ranks are no larger
than r. We contrast this to weak recovery where a matrix X (of
low rank) is fixed and we ask how many random measurements
are needed to estimate X reliably.
Proposition 19 (Strong recovery for uniform measurement
model). Fix ε > 0. Under the uniform measurement model,
the min-rank decoder recovers all matrices of rank less than
or equal to r with probability approaching one as n → ∞ if
k > (4 + ε)γ(1 − γ)n2 .
(60)
We contrast this to the weak achievability result (Proposition 3) in which X with rank(X) ≤ r was fixed and we
showed that if k > (2 + ε)γ(1 − γ/2)n2, the min-rank decoder
recovers X w.h.p. Thus, Proposition 19 says that if γ is small,
roughly twice as many measurements are needed for strong
recovery vis-à-vis weak recovery. These fundamental limits
(and the increase in a factor of 2 for strong recovery) are
exactly analogous those developed by Draper and Malekpour
in [29] in the context of compressed sensing over finite fields
and Eldar et al. [27] for the problem of rank minimization over
the reals. Given our derivations in the preceding subsections,
the proof of this result is straightforward.
Proof: We showed in Proposition 15 that with probability
approaching one (exponentially fast),√the relative minimum
distance of C is no smaller than 1 − R − ε̃ for any ε̃ > 0.
As such to guarantee strong recovery, we need the decoding
regions (associated to each codeword in C ) to be disjoint.
In other words, the rank distance between any two distinct
codewords C1 , C2 ∈ C must exceed 2r. See Fig. 4 for an
illustration.
In terms of the relative minimum rank distance
√
1 − R − ε̃, this requirement translates to14
√
1 − R − ε̃ ≥ 2γ.
(61)
Rearranging this inequality as and using the definition of R
[limit of Rn in (50)] as we did in Proposition 15 yields the
required number of measurements prescribed.
In analogy to Proposition 19, we can show the following
for the sparse model.
Proposition 20 (Strong recovery for sparse measurement
model). Fix ε > 0. Under the δ = Ω( logn n )-sparse measurement model, the min-rank decoder recovers all matrices of
rank less than or equal to r with probability approaching one
as n → ∞ if (60) holds.
Proof: The proof uses Proposition 18 and follows along
the exact same lines as that of Proposition 19.
14 The strong recovery requirement in (61) is analogous to the well-known
fact that in the binary Hamming case, in order to correct any vector r = c+e
corrupted with t errors (i.e., kek0 = t) using minimum distance decoding,
we must use a code with minimum distance at least 2t + 1.
C3
r
r
C1
r
C2
Fig. 4. For strong recovery, the decoding regions associated to each codeword
C ∈ C have to be disjoint, resulting in the criterion in (61).
VIII. R EDUCTION IN THE C OMPLEXITY OF
M IN -R ANK D ECODER
THE
In this section, we devise a procedure to reduce the complexity for min-rank decoding (vis-à-vis exhaustive search).
This procedure is inspired by techniques in the cryptography
literature [43], [44]. We adapt the techniques for our problem
which is somewhat different. As we mentioned in Section VII,
the codewords in this paper are matrices rather than vectors
whose elements belong to an extension field [43], [44].
Recall that in min-rank decoding (12), we search for a
×n
matrix X ∈ FN
of minimum rank that satisfies the linq
ear constraints. In this section, for clarity of exposition, we
differentiate between the number of rows (N ) and the number
of columns (n) in X. The vector yk is known as the syndrome.
We first suppose that the minimum rank in (12) is known to
be equal to some integer r ≤ min{N, n}. Since our proposed
algorithm requires exponentially many elementary operations
(addition and multiplication) in Fq , this assumption does not
affect the time complexity significantly. Then the problem
in (12) reduces to a satisfiability problem: Given an integer
r, a collection of parity-check matrices Ha , a ∈ [k] and a
×n
syndrome vector yk , find (if possible) a matrix X ∈ FN
q
of rank exactly equal to r that satisfies the linear constraints
in (12). Note that the constrains in (12) are equivalent to
hvec(Ha ), vec(X)i = ya , a ∈ [k].
We first claim that we can, without loss of generality,
assume that yk = 0k , i.e, the constraints in (12) read
hHa , Xi = 0,
a ∈ [k].
(62)
We justify this claim as follows: Consider the new syndromen+1
for every a ∈ [k].
augmented vectors [vec(Ha ); ya ]T ∈ FN
q
′
Then, every solution vec(X ) of the system of equations
h[vec(Ha ); ya ], vec(X′ )i = 0,
a ∈ [k]
′
(63)
can be partitioned into two parts, vec(X ) = [vec(X1 ); x2 ]
n
where vec(X1 ) ∈ FN
and x2 ∈ Fq . Thus, every solution
q
of (63) satisfies one of two conditions:
• x2 = 0. In this case X1 is a solution to the linear
equations in (12).
• x2 6= 0. In this case X1 solves hHa , X1 i = x2 ya . Thus,
x−1
2 X1 solves (12).
This is also known as coset decoding. Now, observe that since
it is known that X has rank equal to r (which is assumed
IEEE TRANSACTIONS ON INFORMATION THEORY
15
known), it can be written as
X=
r
X
ul vlT = UVT
(64)
l=1
n
where each of the vectors ul ∈ FN
q and vl ∈ Fq . The matrices
n×r
N ×r
and V ∈ Fq are of (full) rank r and are referred
U ∈ Fq
to as the basis matrix and the coefficient matrix respectively.
The linear system of equations in (62) can be expanded as
r X
N X
n
X
[Ha ]i,j ul,i vl,j = 0,
l=1 i=1 j=1
a ∈ [k]
discussion on the indeterminacies in the decomposition of the
low rank matrix X, we observe that the complexity involved
×r
in the enumeration of all FN
matrices in step 2 in the
q
naı̈ve implementation can be reduced by only enumerating the
different equivalence classes induced by ∼. More precisely,
we find (if possible) coefficients V for a basis U from each
equivalence class, e.g., U1 ∈ [U1 ], . . . , Um ∈ [Um ]. Note that
the number of equivalence classes (by Lagrange’s theorem) is
m=
(65)
where ul = [ul,1 , . . . , ul,N ]T and vl = [vl,1 , . . . , vl,n ]T . Thus,
we need to solve a system of quadratic equations in the basis
elements ul,i and the coefficients vl,j .
A. Naı̈ve Implementation
A naı̈ve way to find a consistent U and V for (65) is to
employ the following algorithm:
1) Start with r = 1.
2) Enumerate all bases U = {ul,i : i ∈ [N ], l ∈ [r]}.
3) For each basis, solve (if possible) the resulting linear
system of equations in V = {vl,j : j ∈ [n], l ∈ [r]}.
4) If a consistent set of coefficients V exists (i.e., (65) is
satisfied), terminate and set X = UVT . Else increment
r ← r + 1 and go to step 2.
The second step can be solved easily if the number of
equations is less than or equal to the number of unknowns,
i.e., if nr ≥ k. However, this is usually not the case since for
successful recovery, k has to satisfy (15) so, in general, there
are more equations (linear constraints) than unknowns. We
attempt to solve for (if possible) a consistent V, otherwise we
increment the guessed rank r. The computational complexity
of this naı̈ve approach (assuming r is known and so no
iterations over r are needed) is O((nr)3 q N r ) since there are
q N r distinct bases and solving the linear system via Gaussian
elimination requires at most O((nr)3 ) operations in Fq .
B. Simple Observations to Reduce the Search for the Basis U
We now use ideas from [43], [44] and make two simple
observations to dramatically reduce the search for the basis in
step 2 of the above naı̈ve implementation.
Observation (A): Note that if X̃ solves (62), so does ρX̃
for any ρ ∈ Fq . Hence, without loss of generality, we may
assume that the we can scale the (1,1) element of U to be
equal to 1. The number of bases we need to enumerate may
thus be reduced by a factor of q.
Observation (B): Note that the decomposition X = UVT
is not unique. Indeed if X = UVT , we may also decompose
X as X = ŨṼT , where Ũ = UT and Ṽ = VT−T and
T is any invertible r × r matrix over Fq . We say that two
bases U, Ũ are equivalent, denoted U ∼ Ũ, if there exists
an invertible matrix T such that U = ŨT. The equivalence
×r
relation ∼ induces a partition of the set of FN
matrices.
q
N ×r
Let [U] := {Ũ ∈ Fq
: Ũ ∼ U} be the equivalence
class of matrices containing the matrix U. From the preceding
qN r
≤ 4q r(N −r) ,
Φq (r, r)
(66)
where recall from Section II-E that Φq (r, r) is the number
of non-singular matrices in Fr×r
. The inequality arises from
q
2
the fact that Φq (r, r) ≥ 14 q r , a simple consequence of [43,
Cor. 4]. Algorithmically, we can enumerate the equivalence
classes by first considering all matrices of the form
Ir×r
U=
,
(67)
Q
where Ir×r is the identity matrix of size r, and Q takes on
(N −r)×r
all possible values in Fq
. Note that if Q and Q̃ are
distinct, the corresponding U = [I; QT ]T and Ũ = [I; Q̃T ]T
belong to different equivalence classes. However, the top r
rows of U may not be linearly independent so we have yet
to consider all equivalence classes. Hence, we subsequently
permute the rows of each previously considered U to ensure
every equivalence class is considered.
From the considerations in (A) and (B), the computational complexity can be reduced from O((nr)3 q N r ) to
O((nr)3 q r(N −r)−1 ). By further noting that there is symmetry between the basis matrix U and the coefficient matrix
V, we see that the resulting computational complexity is
O((max{n, N }r)3 q r(min{n,N }−r)−1 ). Finally, to incorporate
the fact that r is unknown, we start the procedure assuming
r = 1, proceed to r ← r + 1 if there does not exist
a consistent solution and so on, until a consistent (U, V)
pair is found. The resulting computational complexity is thus
O(r(max{n, N }r)3 q r(min{n,N }−r)−1 ).
IX. D ISCUSSION
AND
C ONCLUSION
In this section, we elaborate on connections of our work to
the related works mentioned the introduction and in Tables I
and II. We will also conclude the paper by summarizing our
main contributions and suggesting avenues for future research.
A. Comparison to existing coding-theoretic techniques for
rank minimization over finite fields
In general, solving the min-rank decoding problem (41) is
intractable (NP-hard). However, it is known that if the linear
operator H (in (4) characterizing the code C ) admits a favorable algebraic structure, then one can estimate a sufficiently
low-rank (vector with elements in the extension field Fqn or
matrix with elements in Fq ) x and thus the codeword c from
the received word r efficiently (i.e., in polynomial time). For
instance, the class of Gabidulin codes [7], [8], which are rankmetric analogs of Reed-Solomon codes, not only achieves
the Singleton bound and thus has maximum rank distance
IEEE TRANSACTIONS ON INFORMATION THEORY
✛
✲
n
✻
16
r
r
r
n
r
r
r
r
r
r
r
r
❄
r
Fig. 5. Probabilistic crisscross error patterns [17]: The figure shows an error
matrix X. The non-zero values (indicated as black dots) are restricted to two
columns and one row. Thus, the rank of the error matrix X is at most three.
(MRD), but decoding can be achieved using a modified form
of the Berlekamp-Massey algorithm (See [45] for example).
However, the algebraic structure of the codes (and in particular
the mutual dependence between the equivalent Ha matrices)
does not permit the line of analysis we adopted. Thus it is
unclear how many linear measurements would be required in
order to guarantee recovery using the suggested code structure.
Silva, Kschischang and Kötter [10] extended the BerlekampMassey-based algorithm to handle errors and erasures for the
purpose of error control in linear random network coding. In
both these cases, the underlying error matrix is assumed to be
deterministic and the algebraic structure on the parity check
matrix permitted efficient decoding based on error locators.
In another related work, Montanari and Urbanke [11] assumed that the error matrix X is drawn uniformly at random
from all matrices of known rank r. The authors then constructed a sparse parity check code (based on a sparse factor
graph). Using an “error-trapping” strategy by constraining
codewords to have rows that are have zero Hamming weight
without any loss of rate, they first learned the rowspace of
X before adopting a (subspace) message passing strategy to
complete the reconstruction. However, the dependence across
rows of the parity check matrix (caused by lifting) violates
the independence assumptions needed for our analyses to
hold. The ideas in [11] were subsequently extended by Silva,
Kschischang and Kötter [18] where the authors computed the
information capacity of various (additive and/or multiplicative)
matrix-valued channels over finite fields. They also devised
“error-trapping” codes to achieve capacity. However, unlike
this work, it is assumed in [18] that the underlying low-rank
error matrix is chosen uniformly. As such, their guarantees do
not apply to so-called crisscross error patterns [17], [45] (see
Fig. 5), which are of interest in data storage applications.
Our work in this paper is focused primarily on understanding the fundamental limits of rank-metric codes that are
random. More precisely, the codes are characterized by either
dense or sparse sensing (parity-check) matrices. This is in contrast to the literature on rank-metric codes (except [9, Sec. 5]),
in which deterministic constructions predominate. The codes
presented in Section VII are random. However, in analogy to
the random coding argument for channel coding [35, Sec. 7.7],
if the ensemble of random codes has low average error
probability, there exists a deterministic code that has low
error probability. In addition, the strong recovery results in
Section VII-C allow us to conclude that our analyses apply
to all low-rank matrices X in both equiprobable and sparse
settings. This completes all remaining entries in Table II.
Yet another line of research on rank minimization over finite
fields (in particular over F2 ) has been conducted by the combinatorial optimization and graph theory communities. In [33,
Sec. 6] and [46, Sec. 1] for example, it was demonstrated that
if the code (or set of linear constraints) is characterized by
a perfect graph,15 then the rank minimization problem can be
solved exactly and in polynomial time by the ellipsoid method
(since the problem can be stated as a semidefinite program). In
fact, the rank minimization problem is also intimately related
to Lovász’s θ function [47, Theorem 4], which characterizes
the Shannon capacity of a graph.
B. Conclusion and Future Directions
In this paper, we derive information-theoretic limits for
recovering a low-rank matrix with elements over a finite field
given noiseless or noisy linear measurements. We show that
even if the random sensing (or parity-check) matrices are very
sparse, decoding can be done with exactly the same number
of measurements as when the sensing matrices are dense. We
then adopt a coding-theoretic approach and derived minimum
rank distance properties of sparse random rank-metric codes.
These results provide geometric insights as to how and why
decoding succeeds when sufficiently many measurements are
available. The work herein could potentially lead to the design
of low-complexity sparse codes for rank-metric channels.
It is also of interest to analyze whether the sparsity factor
of Θ( logn n ) is the smallest possible and whether there is
a fundamental tradeoff between this sparsity factor and the
number of measurements required for reliable recovery of the
low-rank matrix. Additionally, in many of the applications
that motivate this problem, the sensing matrices fixed by
the application and will not be random; take for example
deterministic parity-check matrices that might define a rankmetric code. In rank minimization in the real field there are
properties about the sensing matrices, and about the underlying
matrix being estimated, that can be checked (for example
the restricted isometry property [6, Eq. (1)], or random point
sampling joint with incoherence of the low-rank matrix) that,
if they are satisfied, guarantee that the true matrix of interest
can be recovered using convex programming. It is of interest
to identify an analog in the finite field, that is, a necessary
(or sufficient) condition on the sensing matrices and the
underlying matrix such that recovery is guaranteed. We would
like to develop tractable algorithms along the lines of those in
Table I or in the work by Baron et al. [26] to solve the minrank optimization problem approximately for particular classes
of sensing matrices such as the sparse random ensemble.
Finally, Dimakis and Vontobel [48] make an intriguing
connection between linear programming (LP) decoding for
channel coding and LP decoding for compressed sensing.
They reach known compressed sensing results via a new path
15 A perfect graph G is one in which each induced subgraph H ⊂ G has
a chromatic number χ(H) that is the same as its clique number ω(H).
IEEE TRANSACTIONS ON INFORMATION THEORY
17
– channel coding. Analogously, we wonder whether known
rank minimization results can be derived using rank-metric
coding tools, thereby providing novel interpretations. And just
as in [48], the reverse direction is also open. That is, whether
the growing literature and understanding of rank minimization
problems could be leveraged to design more tractable and
interesting decoding approaches for rank-metric codes.
in (69) into two disjoint subsets each, obtaining
P(hM′ − M, H1 i = 0 | hM, H1 i = 0)
X
X
[H1 ]i,j = 0
[H1 ]i,j +
=P
(i,j)∈L∩K
(i,j)∈K\L
X
=P
Acknowledgements
We would like to thank Associate Editor Erdal Arıkan and
the reviewers for their suggestions to improve the paper and
to acknowledge discussions with Ron Roth, Natalia Silberstein
and especially Danilo Silva, who made the insightful points
in Section IV-B1 [38]. We would also like to thank Ying Liu
and Huili Guo for detailed comments and help in generating
Fig. 4 respectively.
X
[H1 ]i,j =
(i,j)∈L\K
[H1 ]i,j = −
Proof: It suffices to show that the conditional probability
P(AZ′ |AZ ) = P(AZ′ ) = q −k for Z 6= Z′ . We define the
non-zero matrices M := X − Z and M′ := X − Z′ . Let
K := supp(M′ − M) and L := supp(M). The idea of the
proof is to partition the joint support K ∪ L into disjoint sets.
More precisely, consider
(a)
P(AZ′ |AZ ) = P(hM′ , H1 i = 0 | hM, H1 i = 0)k
′
k
where (a) is from the definition of AZ := {hX − Z, Ha i =
0, ∀ a ∈ [k]} and the independence of the random matrices
Ha , a ∈ [k] and (b) by linearity. It suffices to show that the
probability in (68) is q −1 . Indeed,
[M]i,j [H1 ]i,j = 0
(d)
= P
X
(i,j)∈K
[H1 ]i,j = 0
X
(i,j)∈L
(i,j)∈L∩K
(f )
= q −1 ,
A PPENDIX B
OF P ROPOSITION 8
Ennoisy:= {|S noisy | > 1}∪({|S noisy | = 1}∩{(X∗ , w∗ ) 6= (X, w)}).
Note that (Ennoisy )c occurs, both the matrix X and the noise
vector w are recovered so, in fact, we are decoding two objects
when we are only interested in X. Clearly, En ⊂ Ennoisy so it
suffices to upper bound P(Ennoisy ) to obtain an upper bound
of P(En ). For this purpose consider the event
(70)
defined for each matrix-vector pair (Z, v) ∈ Fqn×n × Fkq such
that rank(Z) + λkvk0 ≤ rank(X) + λkwk0 . The error event
Ennoisy occurs if and only if there exists a pair (Z, v) 6= (X, w)
such that (i) rank(Z) + λkvk0 ≤ rank(X) + λkwk0 and (ii)
the event Anoisy
Z,v occurs. By the union of events bound, the
error probability can be bounded as:
X
P(Anoisy
P(Ennoisy ) ≤
Z,v )
(Z,v):rank(Z)+λkvk0 ≤rank(X)+λkwk0
(a)
=
X
q −k
(Z,v):rank(Z)+λkvk0 ≤rank(X)+λkwk0
(i,j)∈K
(i,j)∈L
[H1 ]i,j
Anoisy
Z,v := {hZ, Ha i = hX, Ha i + va , ∀ a ∈ [k]},
= P(hM −M, H1 i = 0 | hM, H1 i = 0) , (68)
X
X
Proof: Recall the optimization problem for the noisy case
in (32) where the optimization variables are X̃ and w̃. Let
S noisy ⊂ Fqn×n × Fkq be the set of optimizers. In analogy
to (13), we define the “noisy” error event
A PPENDIX A
P ROOF OF L EMMA 6
P(hM′ − M, H1 i = 0 | hM, H1 i = 0)
X
(c)
[M′ − M]i,j [H1 ]i,j = 0
=P
[H1 ]i,j
P
Equality
(e) is by using the condition (i,j)∈L\K [H1 ]i,j =
P
− (i,j)∈L∩K [H1 ]i,j and finally (f ) from the fact that the sets
K\L, L\K and L∩K are mutually disjoint so the probability is
q −1 by independence and uniformity of [H1 ]i,j , (i, j) ∈ [n]2 .
P ROOF
(b)
X
(i,j)∈K\L
(i,j)∈L\K
X
[H1 ]i,j = 0
(i,j)∈L∩K
(i,j)∈L\K
(e)
X
[H1 ]i,j +
[H1 ]i,j = 0 ,
(b)
≤ q −k |Ur,s |,
(69)
where (c) is from the definition of the inner product and the
sets K and L, (d) from the fact that [M]i,j [H1 ]i,j has the
same distribution as [H1 ]i,j since [M]i,j 6= 0 and [H1 ]i,j is
uniformly distributed in Fq . Now, we split the sets K and L
(71)
where (a) is from the same argument as the noiseless case
[See (18)] and in (b), we defined the set Ur,s := {(Z, v) :
rank(Z) + λkvk0 ≤ rank(X) + λkwk0 }, where the subscripts
r and s index respectively the upper bound on the rank of X
and sparsity of w. Note that s = kwk0 = ⌊σn2 ⌋ ≤ σn2 . It
remains to bound the cardinality of Ur,s . In the following, we
partition the counting argument into disjoint subsets by fixing
the sparsity of the vector v to be equal to l for all possible
IEEE TRANSACTIONS ON INFORMATION THEORY
l’s. Note that 0 ≤ l ≤ (kvk0 )max :=
Ur,s is bounded as follows:
r
λ
18
+ s. The cardinality of
(kvk0 )max
X
|Ur,s | =
l=0
|{v ∈ Fkq : kvk0 = l}|×
where (a) follows by bounding the number of vectors which
are non-zero in l positions and the number of matrices
whose rank is no greater than r + λ(s − l) (Lemma 1), (b)
follows by first noting that the assignment r 7→ 2nr − r2
is monotonically increasing in r = 0, 1, . . . , n and second
by upper bounding the summands by their largest possible
values. Observe that (33) ensures that λr + s ≤ k2 , which is
needed to upper bound the binomial coefficient since l 7→ kl
is monotonically increasing iff l ≤ k/2. Inequality (c) uses
the fact that the binomial coefficient is upper bounded by a
function of the binary entropy [35, Theorem 11.1.3]. Now,
note that since r/n → γ, for every η > 0, |r/n − γ| < η for
n sufficiently large. Define γ̃η := γ + η + σ. From (c) above,
|Ur,s | can be further upper bounded as
(d)
γ̃η n2
2
2
2 2
≤ 4 γ̃η n2 + 1 2kH2 ( k ) q γ̃η n q 2γ̃η n −γ̃η n
≤ O(n2 )2
1
kH2 ( 3−γ̃
η
) γ̃η n +2γ̃η n
q
2
2
−γ̃η2 n2
.
(72)
(73)
Inequality (d) follows from the problem assumption that
rank(X) ≤ r ≤ (γ +η)n for n sufficiently large, kwk0 = s ≤
σn2 and the choice of the regularization parameter λ = 1/n.
Inequality (e) follows from the fact that since k satisfies (33),
k > 3γ̃η (1 − γ̃η /3)n2 and hence the binary entropy term
in (72) can be upper bounded as in (73). By combining (71)
and (73), we observe that the error probability P(Ennoisy ) can
be upper bounded as
2
2
k
1
noisy
2 −n n2 (1−(logq 2)H2 ( 3−γ̃η )−3γ̃η +γ̃η
P(En ) ≤ O(n )q
. (74)
Now, again by using the assumption that k satisfies (33), the
exponent in (74) is positive for η sufficiently small (γ̃η → γ+σ
as η → 0) and hence P(Ennoisy ) → 0 as n → ∞.
A PPENDIX C
P ROOF OF C OROLLARY 9
Proof: Fano’s inequality can be applied to obtain inequality (a) as in (10). We lower bound the term H(X|yk , Hk )
in (10) differently taking into account the stochastic noise. It
can be expressed as
k
k
k
k
(b)
(a)
× |{Z ∈ Fqn×n : rank(Z) ≤ r + λ(s − l)}|
(kvk0 )max
X
(a)
2
k
≤
(q − 1)l 4q 2n[r+λ(s−l)]−[r+λ(s−l)]
l
l=0
k r
(b) r
2
q λ +s 4q 2n(r+λs)−(r+λs)
+s+1
≤
r
+
s
λ
λ
r +s
(c) r
2
r
kH2 ( λk ) λ
q +s 4q 2n(r+λs)−(r+λs) ,
≤
+s+1 2
λ
(e)
The second term can be upper bounded as H(yk |Hk ) ≤ k
by (11). The third term, which is zero in the noiseless case,
can be (more tightly) lower bounded as follows:
k
k
H(X|y , H ) = H(X) − H(y |H ) + H(y |H , X).
(75)
H(yk |Hk , X) = kH(y1 |H1 , X) = kH(w1 ) ≥ kHq (p), (76)
where (a) follows by the independence of (X, H1 ) and w1 and
(b) follows from the fact that the entropy of w with pmf in (34)
is lower bounded by putting all the remaining probability mass
p on a single symbol in Fq \ {0} (i.e., a Bern(p) distribution).
Note that logarithms are to the base q. The result in (35)
follows by uniting (75), (76) and the lower bound in (7).
P ROOF
A PPENDIX D
OF C OROLLARY 10
Proof: The main idea in the proof is to reduce the problem
to the deterministic case and apply Proposition 8. For this
purpose, we define the ζ-typical set (for the length-k = ⌈αn2 ⌉
noise vector w) as
kwk0
−
p
≤
ζ
.
Tζ = Tζ (w) := w ∈ Fkq :
αn2
We choose ζ to be dependent on n in the following way (cf.
the Delta-convention [49]): ζn → 0 and nζn → ∞ (e.g., ζn =
n−1/2 ). By Chebyshev’s inequality, P(w ∈
/ Tζn ) → 0 as n →
∞. We now bound the probability of error that the estimated
matrix is not the same as the true one by using the law of
total probability to condition the error event Ennoisy on the
event {w ∈ Tζn } and its complement:
/ Tζn ).
P(Ennoisy ) ≤ P(Ennoisy |w ∈ Tζn ) + P(w ∈
(77)
Since the second term in (77) converges to zero, it suffices
to prove that the first term also converges to zero. For this
purpose, we can follow the steps of the proof in Proposition 8
and in particular the steps leading to (72) and (74). Doing so
and defining pζ := p + ζ, we arrive at the upper bound
P(Ennoisy |w ∈ Tζn )
≤ O(n2 )2kH2 (
2
≤ O(n2 )q −n
2
= O(n2 )q −n
γn2 +pζ αn2
n
αn2
×q
) (
q
γn2 +pζ αn2
n
αn2
)
×
2n2 (γ+pζn α)−(γn+pζn αn)2 −αn2
γ
)−2αpζn (1−γ)+α2 p2ζn −2γ+γ 2
α−α(logq 2)H2 (pζn + α
[ g(α;pζn ,γ)−2γ(1−γ/2) ]
,
(78)
Since ζn → 0 and g defined in (36) is continuous in the second
argument, g(α; pζn , γ) → g(α; p, γ). Thus, if α satisfies (37),
the exponent in (78) is positive. Hence, P(Ennoisy ) → 0 as
n → ∞ as desired.
A PPENDIX E
P ROOF OF T HEOREM 11
Proof: We first state a lemma which will be proven as
the end of this section.
IEEE TRANSACTIONS ON INFORMATION THEORY
19
Lemma 21. Define d := kX − Zk0 . The probability of
AZ , defined in (16), under the δ-sparse measurement model,
denoted as θ(d; δ, q, k), is a function of d and is given as
"
d #k
δ
−1
−1
. (79)
θ(d; δ, q, k) := q + (1 − q ) 1 −
1 − q −1
Lemma 21 says that the probability P(AZ ) is only a function
of X though the number of entries it differs from Z, namely
d. Furthermore, it is easy to check that the probability in (79)
satisfies the following two properties:
1) θ(d; δ, q, k) ≤ (1 − δ)k ≤ exp(−kδ) for all d ∈ [n2 ],
2) θ(d; δ, q, k) is a monotonically decreasing function in d.
We upper bound the probability in (17). To do so, we partition
all possibly misleading matrices Z into subsets based on their
Hamming distance from X. Our idea is to separately bound
those partitions with low Hamming distance (which are few
and so for which a loose upper bound on θ(d; δ, q, k) suffices)
and those further from X (which are many, but for which we
can get a tight upper bound on θ(d; δ, q, k), a bound that is
only a function of the Hamming distance ⌈βn2 ⌉). Then we
optimize the split over the free parameter β:
2
P(En ) ≤
(a)
=
n
X
X
P(AZ )
d=1 Z:Z6=X,rank(Z)≤rank(X)
kX−Zk0 =d
⌊βn2 ⌋
X
X
θ(d; δ, q, k)+
d=1 Z:Z6=X,rank(Z)≤rank(X)
kX−Zk0 =d
2
n
X
+
X
θ(d; δ, q, k)
d=⌈βn2 ⌉ Z:Z6=X,rank(Z)≤rank(X)
kX−Zk0 =d
(b)
≤
⌊βn2 ⌋
X
X
exp(−kδ)+
d=1 Z:Z6=X,rank(Z)≤rank(X)
kX−Zk0 =d
2
+
n
X
X
d=⌈βn2 ⌉ Z:Z6=X,rank(Z)≤rank(X)
kX−Zk0 =d
θ(⌈βn2 ⌉; δ, q, k)
(c)
≤ |{Z : kZ − Xk0 ≤ ⌊βn2 ⌋} exp(−kδ)+
+ n2 |{Z : rank(Z) ≤ rank(X)}|θ(⌈βn2 ⌉; δ, q, k). (80)
In (a), we used the definition of θ(d; δ, q, k) in Lemma 21. The
fractional parameter β, which we choose later, may depend
on n. In (b), we used the fact that θ(d; δ, q, k) ≤ exp(−kδ)
and that θ(d; δ, q, k) is monotonically decreasing in d so
θ(d; δ, q, k) ≤ θ(⌈βn2 ⌉; δ, q, k) for all d ≥ ⌈βn2 ⌉. In (c),
we upper bounded the cardinality of the set {Z 6= X :
rank(Z) ≤ rank(X), kX − Zk0 ≤ ⌊βn2 ⌋} by the cardinality
of the set of matrices that differ from X in no more than
⌊βn2 ⌋ locations (neglecting the rank constraint). For the
second term, we upper bounded the cardinality of each set
Md := {Z 6= X : rank(Z) ≤ rank(X), kX − Zk0 = d}
by the cardinality of the set of matrices whose rank no more
than rank(X) (neglecting the Hamming weight constraint).
We denote the first and second terms in (80) as An and Bn
respectively. Now,
An := |{Z : kZ − Xk0 ≤ ⌊βn2 ⌋}| exp(−kδ)
(a)
2
≤ 2n
H2 (β)
2
(q − 1)βn exp(−kδ)
2
k
≤ 2n [H2 (β)+β log2 (q−1)− n2 δ log2 (e)] ,
(81)
where (a) used the fact that the number of matrices that differ
from X by less than or equal to ⌊βn2 ⌋ positions is upper
2
2
bounded by 2n H2 (β) (q − 1)βn . Note that this upper bound
is independent of X. Now fix η > 0 and consider Bn :
Bn := n2 |{Z : rank(Z) ≤ rank(X)}|θ(⌈βn2 ⌉; δ, q, k)
(a)
2
≤ 4n2 q (2γ(1−γ/2)+η)n θ(⌈βn2 ⌉; δ, q, k)
(b)
h
i
2
−1
−1
⌈βn2 ⌉
k
δ
2 n 2γ(1−γ/2)+η+ n2 logq q +(1−q )(1− 1−q−1 )
= 4n q
(82)
In (a), we used the fact that the number of matrices of rank
2
no greater than r is bounded above by 4q (2γ(1−γ/2)+η)n
(Lemma 1) for n sufficiently large (depending on η by
the convergence of r/n to γ). Equality (b) is obtained by
applying (79) in Lemma 21.
Our objective in the rest of the proof is to find sufficient
conditions on k and β so that (81) and (82) both converge to
zero. We start with Bn . From (82) we observe that if for every
ε > 0, there exists an N1,ε ∈ N such that
ε
2γ(1 − γ/2)n2
k > 1+
⌈βn2 ⌉ ,
5
δ
−1
−1
− logq q + (1 − q ) 1 − 1−q−1
(83)
for all n > N1,ε , then Bn → 0 since the exponent in (82)
is negative (for η sufficiently small). Now, we claim that if
limn→∞ ⌈βn2 ⌉δ = +∞ then the denominator in (83) tends to
1 from below. This is justified as follows: Consider the term,
⌈βn2 ⌉
δ
⌈βn2 ⌉δ n→∞
1−
≤
exp
−
−→ 0,
1 − q −1
1 − q −1
so the argument of the logarithm in (83) tends to q −1 from
above if limn→∞ ⌈βn2 ⌉δ = +∞.
Since δ ∈ Ω( logn n ), by definition, there exists a constant
C ∈ (0, ∞) and an integer Nδ ∈ N such that
log2 (n)
,
n
for all n > Nδ . Let β be defined as
δ = δn ≥ C
β = βn :=
2γ(1 − γ/2) log2 (e)δ
.
log2 (n)
(84)
(85)
Then ⌈βn2 ⌉δ ≥ 2γ(1 − γ/2) log2 (e)C 2 log2 (n) = Θ(log n)
and so the condition limn→∞ ⌈βn2 ⌉δ = +∞ is satisfied. Thus,
for sufficiently large n, the denominator in (83) exceeds 1/(1+
ε/5) < 1. As such, the condition in (83) can be equivalently
written as: Given the choice of β in (85), if there exists an
N2,ε ∈ N such that
ε 2
γ(1 − γ/2)n2
(86)
k >2 1+
5
.
IEEE TRANSACTIONS ON INFORMATION THEORY
20
for all n > N2,ε , then Bn → 0.
We now revisit the upper bound on An in (81). The
inequality says that, for every ε > 0, if there exists an
N3,ε ∈ N such that
ε H2 (β) + β log2 (q − 1) 2
k > 1+
n ,
5
δ log2 (e)
(87)
for all n > N3,ε , then An → 0 since the exponent in (81) is
negative. Note that H2 (β)/(−β log2 β) ↓ 1 as β ↓ 0. Hence,
if β is chosen as in (85), then by using (84), we obtain
lim
n→∞
H2 (β) + β log2 (q − 1)
≤ 2γ(1 − γ/2).
δ log2 (e)
(88)
In particular, for n sufficiently large, the terms in the sequence
in (88) and its limit (which exists) differ by less than 2γ(1 −
γ/2)ε/5. Hence (87) is equivalent to the following: Given the
choice of β in (85), if there exists an N4,ε ∈ N such that
ε 2
k >2 1+
γ(1 − γ/2)n2
(89)
5
for all n > N4,ε , the sequence An → 0. The choice of β
in (85) “balances” the two sums An and Bn in (80). Also
note that 2(1 + ε/5)2 < 2 + ε for all ε ∈ (0, 5/2).
Hence, if the number of measurements k satisfies (15) for
all n > Nε,δ := max{N1,ε , N2,ε , N3,ε , N4,ε , Nδ }, both (86)
and (89) will also be satisfied and consequently, P(En ) ≤ An +
Bn → 0 as n → ∞ as desired. We remark that the restriction
of ε ∈ (0, 5/2) is not a serious one, since the validity of the
claim in Theorem 11 for some ε0 > 0 implies the same for
all ε > ε0 . This completes the proof of Theorem 11.
It now remains to prove Lemma 21.
Proof: Recall that d = kX − Zk0 and θ(d; δ, q, k) =
P(hHa , Xi = hHa , Zi, a ∈ [k]). By the i.i.d. nature of the
random matrices Ha , a ∈ [k], it is true that
θ(d; δ, q, k) = P(hH1 , Xi = hH1 , Zi)k .
δ
P(hH1 , Xi = hH1 , Zi) = q −1 + (1 − q −1 ) 1 −
1 − q −1
δ/(q − 1)
0
Let the first and second vectors above be p1 and p2 respectively. Then, by linearity of the DFT, Fp = Fp1 + Fp2 where
1 − δ − δ/(q − 1)
qδ/(q − 1)
1 − δ − δ/(q − 1)
0
Fp1 =
.
, Fp2 =
..
..
.
.
1 − δ − δ/(q − 1)
0
Summing these up yields
1
1 − δ/(1 − q −1 )
Fp =
.
..
.
1 − δ/(1 − q −1 )
Raising Fp to the d-th power yields
1
(1 − δ/(1 − q −1 ))d
(Fp).d =
.
..
.
(1 − δ/(1 − q −1 ))d
Now using the same splitting technique, (Fp).d can be decomposed into
(1 − δ/(1 − q −1 ))d
1−(1 − δ/(1 − q −1 ))d
(1 − δ/(1 − q −1 ))d
0
(Fp).d =
+
.
..
..
.
.
(1 − δ/(1 − q −1 ))d
It thus remains to demonstrate that
of the vector v is raised to the d-th power.) We split p into
two vectors whose DFTs can be evaluated in closed-form:
δ/(q − 1)
1 − δ − δ/(q − 1)
δ/(q − 1)
0
p=
.
+
..
..
.
.
d
.
(90)
This may be proved using induction on d but we prove it using
more direct transform-domain ideas. Note that (90) is simply
the d-fold q-point circular convolution of the δ-sparse pmf
in (39). Let F ∈ Cq×q and F−1 ∈ Cq×q be the discrete Fourier
transform (DFT) and the inverse DFT matrices respectively.
We use the convention in [50]. Let
1−δ
δ/(q − 1)
p := Ph ( · ; δ, q) =
..
.
δ/(q − 1)
be the vector of probabilities defined in (39). Then, by
properties of the DFT, (90) is simply given by F−1 [(Fp).d ]
evaluated at the vector’s first element. (The notation v.d :=
d
[v0d . . . vq−1
]T denotes the vector in which each component
0
Let s1 and s2 denote each vector on the right hand side above.
Define ϕ := (1 − δ/(1 − q −1 ))d . Then, the inverse DFTs of
s1 and s2 can be evaluated analytically as
−1
q (1 − ϕ)
ϕ
q −1 (1 − ϕ)
0
F−1 s2 =
F−1 s1 = . ,
.
..
..
.
0
q −1 (1 − ϕ)
Summing the first elements of F−1 s1 and F−1 s2 completes
the proof of (90) and hence of Lemma 21.
A PPENDIX F
P ROOF OF L EMMA 12
Proof: The only matrix for which the rank r = 0 is the
zero matrix which is in C , since C is a linear code (i.e., a
subspace). Hence, the sum in (43) consists only of a single
term, which is one. Now for 1 ≤ r ≤ n, we start from (43)
IEEE TRANSACTIONS ON INFORMATION THEORY
21
where for (a) recall that k ∈ Θ(n2 ) and δ ∈ Ω( logn n ).
These facts imply that δ (as a sequence in n) belongs to the
interval Ik for all sufficiently large n [because any function
in Ω( logn n ) dominates the lower bound logke k for k ∈ Θ(n2 )]
so the hypothesis of Theorem 22 is satisfied and we can
apply (91) (with l = ǫk) to get inequality (a). Since (92)
is a summable sequence, by the Borel-Cantelli lemma, the
sequence of random variables m/k → 1 a.s.
and by the linearity of expectation, we have
X
E I{M ∈ C }
ENC (r) =
M∈Fn×n
:rank(M)=r
q
=
X
P(M ∈ C )
X
q −k = Φq (n, r)q −k ,
M∈Fn×n
:rank(M)=r
q
(a)
=
M∈Fn×n
:rank(M)=r
q
where (a) is because M 6= 0 (since 1 ≤ r ≤ n). Hence,
as in (18), P(M ∈ C ) = q −k . The proof is completed by
appealing to (6), which provides upper and lower bounds on
the number of matrices of rank exactly r. For the variance, note
that the random variables in the set {I{M ∈ C } : rank(M) =
r} are pairwise independent (See Lemma 6). As a result, the
variance of the sum in (43) is a sum of variances, i.e.,
X
var(NC (r)) =
var(I{M ∈ C })
M∈Fn×n
:rank(M)=r
q
X
=
M∈Fn×n
:rank(M)=r
q
≤
X
:rank(M)=r
M∈Fn×n
q
E I{M ∈ C }2 − [E I{M ∈ C }]2
E I{M ∈ C } = ENC (r),
as desired.
P ROOF
A PPENDIX G
OF P ROPOSITION 14
Proof: We first restate a beautiful result from [42]. For
each positive integer k, define the interval Ik := [ logke k , q−1
q ].
Theorem 22 (Corollary 2.4 in [42]). Let M be a random k×k
matrix over the finite field Fq , where each element is drawn
independently from the pmf in (39) with δ, a sequence in k,
belonging to Ik for each k ∈ N. Then, for every l ≤ k,
P(k − rank(M) ≥ l) ≤ Aq −l ,
(91)
and A is a constant. Moreover, if A is considered as a function
of δ then it is monotonically decreasing as a function in the
interval Ik .
To prove the Proposition 14, first define N := n2 and let
ha := vec(Ha ) ∈ FN
q be the vectorized versions of the random
×k
sensing matrices. Also let H := [h1 . . . hk ] ∈ FN
be the
q
k×k
matrix with columns ha . Finally, let H[k×k] ∈ Fq
be the
square sub-matrix of H consisting only of its top k rows.
Clearly, the dimension of the column span of H, denoted as
m ≥ rank(H[k×k] ). Note that m is a sequence of random
variables and k is a sequence of integers but we suppress their
dependences on n. Fix 0 < ǫ < 1 and consider
m
m
−1 ≥ǫ =P
≤ 1−ǫ
P
k
k
rank(H[k×k] )
≤1−ǫ
≤P
k
= P k − rank(H[k×k] ) ≥ ǫk
(a)
≤ Aq −ǫk ,
(92)
R EFERENCES
[1] V. Y. F. Tan, L. Balzano, and S. C. Draper, “Rank minimization over
finite fields,” in Intl. Symp. Inf. Th., (St Petersburg, Russia), Aug 2011.
[2] E. J. Candès and T. Tao, “The power of convex relaxation: near-optimal
matrix completion,” IEEE Trans. on Inf. Th., vol. 56, pp. 2053–2080,
May 2010.
[3] E. J. Candès and B. Recht, “Exact matrix completion via convex
optimization,” Foundations of Computational Mathematics, vol. 9, no. 6,
pp. 717–772, 2009.
[4] B. Recht, “A simpler approach to matrix completion,” To appear in J.
Mach. Learn. Research, 2009. arXiv:0910.0651v2.
[5] B. Recht, M. Fazel, and P. A. Parrilo, “Guaranteed minimum-rank
solutions of linear matrix equations via nuclear norm minimization,”
SIAM Rev., vol. 2, no. 52, pp. 471–501, 2009.
[6] R. Meka, P. Jain, and I. S. Dhillon, “Guaranteed rank minimization via
singular value projection,” in Proc. of Neural Information Processing
Systems, 2010. arXiv:0909.5457.
[7] E. M. Gabidulin, “Theory of codes with maximum rank distance,” Probl.
Inform. Transm., vol. 21, no. 1, pp. 1–12, 1985.
[8] R. M. Roth, “Maximum-rank array codes and their application to
crisscross error correction,” IEEE Trans. on Inf. Th., vol. 37, pp. 328–
336, Feb 1991.
[9] P. Loidreau, “Properties of codes in rank metric,” 2006. arXiv:0610057.
[10] D. Silva, F. R. Kschischang, and R. Kötter, “A rank-metric approach
to error control in random network coding,” IEEE Trans. on Inf. Th.,
vol. 54, pp. 3951 – 3967, Sep 2008.
[11] A. Montanari and R. Urbanke, “Coding for network coding,” 2007.
arXiv:0711.3935.
[12] M. Gadouleau and Z. Yan, “Packing and covering properties of rank
metric codes,” IEEE Trans. on Inf. Th., vol. 54, pp. 3873–3883, Sep
2008.
[13] ACM SIGKDD and Netflix, Proceedings of KDD Cup and Workshop, (San Jose, CA), Aug 2007. Proceedings available online at
http://www.cs.uic.edu/∼ liub/KDD-cup-2007/proceedings.html.
[14] M. Fazel, H. Hindi, and S. P. Boyd, “A Rank Minimization Heuristic
with Application to Minimum Order System Approximation,” in American Control Conference, 2001.
[15] M. Fazel, H. Hindi, and S. P. Boyd, “Log-det heuristic for matrix
rank minimization with applications with applications to Hankel and
Euclidean distance metrics,” in American Control Conference, 2003.
[16] Z. Bar-Yossef, Y. Birk, T. S. Jayram, and T. Kol, “Index coding with
side information,” IEEE Trans. on Inf. Th., vol. 57, pp. 1479 – 1494,
Mar 2011.
[17] R. M. Roth, “Probabilistic crisscross error correction,” IEEE Trans. on
Inf. Th., vol. 43, pp. 1425–1438, May 1997.
[18] D. Silva, F. R. Kschischang, and R. Kötter, “Communication over finitefield matrix channels,” IEEE Trans. on Inf. Th., vol. 56, pp. 1296 – 1305,
Mar 2010.
[19] A. Barg and G. D. Forney, “Random codes: Minimum distances and
error exponents,” IEEE Trans. on Inf. Th., vol. 48, pp. 2568–2573, Sep
2002.
[20] R. G. Gallager, Low density parity check codes. MIT Press, 1963.
[21] R. Kötter and F. R. Kschischang, “Coding for errors and erasures in
random network coding,” IEEE Trans. on Inf. Th., vol. 54, pp. 3579 –
3591, Aug 2008.
[22] R. W. Nóbrega, B. F. Uchôa-Filho, and D. Silva, “On the capacity of
multiplicative finite-field matrix channels,” in Intl. Symp. Inf. Th., (St
Petersburg, Russia), Aug 2011.
[23] D. de Caen, “A lower bound on the probability of a union,” Discrete
Math., vol. 69, pp. 217–220, May 1997.
[24] G. E. Séguin, “A lower bound on the error probability for signals in
white Gaussian noise,” IEEE Trans. on Inf. Th., vol. 44, pp. 3168–3175,
Jul 1998.
IEEE TRANSACTIONS ON INFORMATION THEORY
[25] A. Cohen and N. Merhav, “Lower bounds on the error probability of
block codes based on improvements on de Caen’s inequality,” IEEE
Trans. on Inf. Th., vol. 50, pp. 290–310, Feb 2004.
[26] D. Baron, S. Sarvotham, and R. G. Baraniuk, “Bayesian compressive
sensing via belief propagation,” IEEE Trans. on Sig. Proc., vol. 51,
pp. 269 – 280, Jan 2010.
[27] Y. C. Eldar, D. Needell, and Y. Plan, “Unicity conditions for low-rank
matrix recovery,” Preprint, Apr 2011. arXiv:1103.5479 (Submitted to
SIAM Journal on Mathematical Analysis).
[28] D. S. Papailiopoulos and A. G. Dimakis, “Distributed storage codes meet
multiple-access wiretap channels,” in Proc. of Allerton, 2010.
[29] S. C. Draper and S. Malekpour, “Compressed sensing over finite fields,”
in Intl. Symp. Inf. Th., (Seoul, Korea), July 2009.
[30] S. Vishwanath, “Information theoretic bounds for low-rank matrix
completion,” in Intl. Symp. Inf. Th., (Austin, TX), July 2010.
[31] A. Emad and O. Milenkovic, “Information theoretic bounds for
tensor rank minimization,” in Proc. of Globecomm, Dec 2011.
arXiv:1103.4435.
[32] A. Kakhaki, H. K. Abadi, P. Pad, H. Saeedi, K. Alishahi, and F. Marvasti,
“Capacity achieving random sparse linear codes,” Preprint, Aug 2011.
arXiv:1102.4099v3.
[33] M. Grötchel, L. Lovász, and A. Schrijver, “The ellipsoid method and
its consequences in combinatorial optimization,” Combinatorica, vol. 1,
no. 2, pp. 169–197, 1981.
[34] T. Cormen, C. Leiserson, R. Rivest, and C. Stein, Introduction to
Algorithms. McGraw-Hill Science/Engineering/Math, 2nd ed., 2003.
[35] T. M. Cover and J. A. Thomas, Elements of Information Theory. WileyInterscience, 2nd ed., 2006.
[36] I. Csiszár, “Linear codes for sources and source networks: Error exponents, universal coding,” IEEE Trans. on Inf. Th., vol. 28, pp. 585–592,
Apr 1982.
[37] R. G. Gallager, Information Theory and Reliable Communication. Wiley,
1968.
[38] D. Silva. Personal communication, Sep 2011.
[39] F. Kschischang, B. Frey, and H.-A. Loeliger, “Factor graphs and the
sum-product algorithm,” IEEE Trans. on Inf. Th., vol. 47, pp. 498–519,
Feb 2001.
[40] A. Dembo and O. Zeitouni, Large Deviations Techniques and Applications. Springer, 2nd ed., 1998.
[41] R. Lidl and H. Niederreiter, Introduction to Finite Fields and their
Applications. Cambridge University Press, 1994.
[42] J. Blömer, R. Karp and E. Welzl, “The Rank of Sparse Random Matrices
over Finite Fields,” Random Structures and Algorithms, vol. 10, no. 4,
pp. 407–419, 1997.
[43] F. Chabaud and J. Stern, “The cryptographic security of the syndrome
decoding problem for rank distance codes,” in ASIACRYPT, pp. 368–
381, 1996.
[44] A. V. Ourivski and T. Johansson, “New technique for decoding codes in
the rank metric and its cryptography applications,” Probl. Inf. Transm.,
vol. 38, pp. 237–246, July 2002.
[45] G. Richter and S. Plass, “Error and erasure of rank-codes with a modified
Berlekamp-Massey algorithm,” in Proceedings of ITG Conference on
Source and Channel Coding, Jan 2004.
[46] R. Peeters, “Orthogonal representations over finite fields and the chromatic number of graphs,” Combinatorica, vol. 16, no. 3, pp. 417–431,
1996.
[47] L. Lovász, “On the Shannon capacity of a graph,” IEEE Trans. on Inf.
Th., vol. IT-25, pp. 1–7, Jan 1981.
[48] A. G. Dimakis and P. O. Vontobel, “LP Decoding meets LP Decoding:
A Connection between Channel Coding and Compressed Sensing,” in
Proc. of Allerton, 2009.
[49] I. Csiszár and J. Korner, Information Theory: Coding Theorems for
Discrete Memoryless Systems. Akademiai Kiado, 1997.
[50] A. V. Oppenheim, R. W. Schafer, and J. R. Buck, Discrete-Time Signal
Processing. Prentice Hall, 1999.
Vincent Y. F. Tan received the B.A. and M.Eng. degrees in Electrical and
Information Engineering from Sidney Sussex College, Cambridge University
in 2005. He received the Ph.D. degree in Electrical Engineering and Computer
Science (EECS) from the Massachusetts Institute of Technology (MIT) in
2011. He is currently a postdoctoral researcher in the Electrical and Computer
Engineering Department at the University of Wisconsin (UW), Madison as
well as a research affiliate at the Laboratory for Information and Decision
22
Systems (LIDS) at MIT. He has held summer research internships at Microsoft
Research in 2008 and 2009. His research is supported by A*STAR, Singapore.
His research interests include network information theory, detection and
estimation, and learning and inference of graphical models.
Dr. Tan is a recipient of the 2005 Charles Lamb Prize, a Cambridge University Engineering Department prize awarded annually to the top candidate in
Electrical and Information Engineering. He also received the 2011 MIT EECS
Jin-Au Kong outstanding doctoral thesis prize. He has served as a reviewer
for the IEEE Transactions on Signal Processing, the IEEE Transactions on
Information Theory, and the Journal of Machine Learning Research.
Laura Balzano is a Ph.D. candidate in Electrical and Computer Engineering,
working with Professor Robert Nowak at the University of Wisconsin (UW),
Madison, degree expected May 2012. Laura received her B.S. and M.S.
in Electrical Engineering from Rice University 2002 and the University of
California in Los Angeles 2007 respectively. She received the Outstanding
M.S. Degree of the year award from UCLA. She has worked as a software
engineer at Applied Signal Technology, Inc. Her Ph.D. is being supported by a
3M fellowship. Her main research focus is on low-rank modeling for inference
and learning with highly incomplete or corrupted data, and its applications to
communications, biological, and sensor networks, and collaborative filtering.
Stark C. Draper (S’99-M’03) is an Assistant Professor of Electrical and
Computer Engineering at the University of Wisconsin (UW), Madison. He
received the M.S. and Ph.D. degrees in Electrical Engineering and Computer
Science from the Massachusetts Institute of Technology (MIT), and the B.S.
and B.A. degrees in Electrical Engineering and History, respectively, from
Stanford University.
Before moving to Wisconsin, Dr. Draper worked at the Mitsubishi Electric
Research Laboratories (MERL) in Cambridge, MA. He held postdoctoral positions in the Wireless Foundations, University of California, Berkeley, and in
the Information Processing Laboratory, University of Toronto, Canada. He has
worked at Arraycomm, San Jose, CA, the C. S Draper Laboratory, Cambridge,
MA, and Ktaadn, Newton, MA. His research interests include communication
and information theory, error-correction coding, statistical signal processing
and optimization, security, and application of these disciplines to computer
architecture and semiconductor device design.
Dr. Draper has received an NSF CAREER Award, the UW ECE Gerald
Holdridge Teaching Award, the MIT Carlton E. Tucker Teaching Award,
an Intel Graduate Fellowship, Stanford’s Frederick E. Terman Engineering
Scholastic Award, and a U.S. State Department Fulbright Fellowship.