A canonical form for testing Boolean function properties
Dana Dachman-Soled∗
Columbia University
dg2342@columbia.edu
Rocco A. Servedio†
Columbia University
rocco@cs.columbia.edu
April 13, 2011
Abstract
In a well-known result on graph property testing, [GT03] showed that every testable graph property
has a “canonical” testing algorithm in which a set of vertices is selected uniformly at random and the
edges queried are the complete graph over the selected vertices. In this paper we define a similar-in-spirit
canonical form for Boolean function testing algorithms, and show that under some mild conditions on
the function class and testing algorithm, property testers for Boolean functions can be transformed into
this canonical form.
We establish two main results. The first shows, roughly speaking, that every “nice” family of Boolean
functions that has low noise sensitivity and is testable by an “independent tester,” has a canonical testing
algorithm. The second result is similar but holds instead for families of Boolean functions that are
closed under ID-negative minors. Taken together, these two results cover almost all of the constantquery Boolean function testing algorithms that we know of in the literature, and show that all of these
testing algorithms can be automatically converted into a canonical form.
∗
†
Supported in part by an FFSEAS Presidential Fellowship.
Supported by NSF grants CCF-0347282, CCF-0523664 and CNS-0716245, and by DARPA award HR0011-08-1-0069.
0
1 Introduction
Over more than a decade property testing has emerged as a exciting and intensively studied area of theoretical computer science, with close connections to diverse topics such as sublinear-time algorithms and
probabilistically checkable proofs in complexity theory. As the field has matured several distinct strands of
research have emerged corresponding to different types of objects to be tested: graphs, Boolean functions,
error-correcting codes, probability distributions, and so on. Over the years each of these sub-areas has developed its own body of standard tools and proof techniques, and the results that have been obtained have
different flavors across the different areas. As one example, in graph property testing powerful and general
characterizations [AFNS06, AT08, AS05a, AS05b] have been given for what properties are constant-query
testable with a number of queries depending only on the error parameter ǫ but not on the number of vertices
n. In Boolean function testing, on the other hand, many specific properties have been investigated (see
e.g. [BLR93, BCH+ 96, BGS98, PRS02, AKK+ 03], [Sam07, FKR+ 04, Bla08, Bla09, MORS10]) but no
general characterizations have been given of what makes a property of Boolean functions constant-query
testable. Given this state of affairs, a natural goal is to obtain a more unified view of property testing by
uncovering deeper underlying similarities between general results on testing different kinds of objects. This
high-level goal provides the impetus behind the current work.
The aim of this paper is to obtain “canonical” testers for testable Boolean function properties, similar
to the canonical testers that have been established for testable graph properties by Goldreich and Trevisan
[GT03].1 Specialized to properties that are testable with a constant number of queries independent of n, the
[GT03] result is essentially as follows: Let P be any graph property2 that has a q(ǫ)-query testing algorithm,
independent of the number of vertices n. The [GT03] result states that property P is efficiently testable by an
algorithm that follows
a simple prescribed “canonical form:” it draws q(ǫ) vertices independently at random,
q(ǫ)
queries all 2 edges between those vertices, does some deterministic computation on the q(ǫ)-node graph
thus obtained, and outputs “accept” or “reject.”
Our work addresses the following natural question: is there a similar “canonical form” for Boolean
function property testing algorithms? Such a result would presumably say that any property of Boolean
functions that is constant-query testable is in fact constant-query testable by an algorithm that works roughly
as follows: it tosses some fair coins, exhaustively queries the function on “all inputs defined by those
coin tosses” (in some suitable sense), does some deterministic computation on the resulting query-response
pairs, and outputs “accept” or “reject.” (Note that in this first investigation we consider only constant-query
testable properties; indeed, the number of queries our canonical tester makes will be doubly exponential in
the number of queries made by the original tester.) We elaborate below in Section 1.1, where we give a
precise definition of a “canonical Boolean function testing algorithm”. But first it is useful to explain how
we may view any Boolean function property testing algorithm as a collection of probability distributions, as
a prelude to explaining our notion of a canonical tester for Boolean function properties.
Viewing a testing algorithm as a collection of distributions. Let P be any class of Boolean functions that
has a q(ǫ)-query testing algorithm A, with query complexity independent of the number of input variables
n. We may assume without loss of generality that A is nonadaptive, since if it is adaptive we can convert it
to a nonadaptive algorithm in a standard way (this incurs an exponential penalty in the query complexity but
it remains independent of n).
Since A is nonadaptive, it first generates its entire sequence of q(ǫ) query strings and then queries
them and performs some computation on the results. We may view the first (query generation) stage of
1
We note that P. Valiant [Val08] has given a “Canonical Tester” for a wide class of properties of discrete probability distributions.
Our setting of testing Boolean functions is much closer to graph property testing (in both scenarios the tester actively chooses
inputs and queries them) than it is to testing probability distributions (where the tester passively receives independent draws from
the distribution).
2
Recall that by definition, a graph property is closed under relabeling vertices.
1
the algorithm as proceeding in the following way: The first query string x1 is drawn from a probability
distribution D∅ (which may be arbitrary) over {0, 1}n . The outcome x1 ∈ {0, 1}n of this draw determines
a probability distribution Dx1 over {0, 1}n from which the second query string x2 is drawn. The outcomes
x1 , x2 of the first two draws determines a distribution Dx1 ,x2 from which the third query string x3 is drawn,
and so on. (Note that while later query strings do not depend on the answers to earlier queries, they may
depend on the outcome of the randomness that was used to construct earlier queries.) In the second stage,
once the q(ǫ) strings x1 , . . . , xq(ǫ) have been generated, the algorithm makes its queries on those strings
and gets a response bit f (xi ) for each query string. It then performs a computation on the q(ǫ) queryresponse pairs, and outputs the result (“accept/reject”) of that computation. This computation may a priori
be randomized, but a straightforward argument (given in Section 4.2.3 of [GT03]) shows that without loss
of generality it may be assumed to be deterministic.
Thus (ignoring for the moment the second “deterministic computation” stage that is performed once all
the queries have been made), any nonadaptive testing algorithm that makes q(ǫ) queries corresponds to the
collection of all distributions Dx1 ,...,xt described above, where t ranges from 1 to q(ǫ) and each xi ranges
over all possible n-bit strings. This collection is somewhat complicated and cumbersome to reason about;
different testing algorithms may correspond to different collections of probability distributions. Is there a
simpler “canonical form” for the query generation stage of every Boolean function testing algorithm?
1.1
A canonical form for Boolean function testing algorithms
In the [GT03] result, intuitively there is only one type of distribution over queries for all testing algorithms
(and this distribution is very simple) – all of the difference between two q(ǫ)-query testing algorithms comes
from the deterministic computation they do once all the query-answer pairs have been obtained. We would
like an analogous result for testing Boolean functions, which similarly involves only one kind of (simple)
distribution over queries, and where the difference between different testers comes from the deterministic
computation that is done once all query-answer pairs are in hand.
Motivated by these considerations, we consider the following canonical form for a testing algorithm (we
make this precise in Section 3):
• First stage (query generation): Let z 1 , . . . , z k be independent and uniform random strings from
{0, 1}n . This defines a natural partition of [n] into 2k blocks, which are expected to be of approximately equal size: an element i ∈ [n] lies in block B(b1 ,...,bk ) if (zi1 , . . . , zik ) = (b1 , . . . , bk ), i.e. in
each query string z j , the i-th bit is set to bi .
• We say that a string x = x1 . . . xn ∈ {0, 1}n “respects the partition” if within each block Bb , either
k
all variables xi are set to 0 or all are set to 1. There are 22 strings in {0, 1}n that respect the partition;
k
these are the 22 queries that a canonical tester makes.
k
• Second stage: With these 22 query-answer pairs in hand, the algorithm does some (deterministic)
computation and outputs “accept” or “reject.”
We view this canonical form for Boolean function testers as both simple and natural.
Some examples of known testers that can easily be converted to canonical form as described above
include the tester of [BLR93] for GF (2) linear functions and the tester of [AKK+ 03] for degree k polynomials over GF (2). Let us consider the [AKK+ 03] tester and see how to convert it to canonical form.
The [AKK+ 03] tester works by choosing k + 1 strings z 1 , . . . z k+1 ∈ {0, 1}n uniformly at random and
then querying all points in the induced subspace. If a string x is in the induced subspace of z 1 , . . . , z k+1
then it must also “respect the partition” induced by z 1 , . . . , z k+1 . So to convert the [AKK+ 03] tester to our
canonical form, all we have to do is ask some more queries.
2
A natural first hope is to generalize the above examples and show that every Boolean function property
that is testable using constantly many queries has a “canonical form” constant-query testing algorithm of
the above sort. However, E. Blais [Bla10] has observed that there is a simple property that is testable with
O(1/ǫ) queries but does not have a constant-query canonical tester of the above sort: this is the property
of being a symmetric Boolean function. Let SYM be the set of all symmetric Boolean functions (i.e., all
functions where f (x) is determined by |x|, the Hamming weight of x). SYM can be tested with a constant
number of queries with the following algorithm:
• Pick O(1/ǫ) pairs of points (xi , y i ) ∈ {0, 1}n × {0, 1}n by choosing x uniformly at random from
{0, 1}n then choosing y uniformly at random from all inputs with the same weight as x.
• Check that for each pair f (xi ) = f (y i ). Accept if this holds, and otherwise reject.
It is clear that if f ∈ SYM then the above test accepts with probability 1. On the other hand, for any f
that is ǫ-far from SYM, with probability at least ǫ the string xi is one of the “bad” inputs that has the minority
output in its level, and with probability at least 1/2 the string y i is one of the inputs with the majority output
for the same level. So with probability at least ǫ/2, (xi , y i ) is a witness to the fact that f is not symmetric,
and O(1/ǫ) queries are sufficient to reject f with probability at least 2/3.
To show that SYM cannot be tested by a constant-query canonical tester, it suffices to show that for
k
k = o(log log n), with high probability each of the 22 = no(1) queries generated by the tester has different
Hamming weight. This can be established by a straightforward but somewhat tedious argument which we
omit here (the main ingredients are the observation that each query string x generated by the canonical tester
has Hamming weight distributed according to a binomial distribution B(n, 2jk ) for some integer j, together
with standard anti-concentration bounds on binomial distributions with sufficiently large variance).
The example above shows that, unlike the graph testing setting, it is not the case that every constantquery Boolean function tester can be “canonicalized.” So in order to obtain meaningful results on “canonicalizing” Boolean function testers, one must restrict the types of properties and/or testers that are considered;
this is precisely what we do in our results, as explained below.
1.2
Our results
Our main results are that certain “nice” testing algorithms, for certain “nice” types of Boolean function
properties, can automatically be converted into the above-described canonical form. Roughly speaking, the
testing algorithms we can handle are ones for which every distribution Dx1 ,...,xt in the query generation
phase is a product of n Bernoulli distributions over the n coordinates (with some slight additional technical
restrictions that we describe later). This is a restricted class of algorithms, but it includes many different
testing algorithms that have been proposed and analyzed in the Boolean function property testing literature.
We call such testing algorithms “Independent testers” (see Section 2.1 for a precise definition), and we give
two results showing that independent testers for certain types of properties can be “canonicalized.”
Our first result applies to classes C that are closed under negating variables and contain only functions
with low noise sensitivity. We say that such a class C is closed under Noisy-Neg minors (see Definition 1).
For such classes C we show the following:
Theorem 1 (Informal) If C is closed under Noisy-Neg minors and there exists a (two-sided) independent
tester for C, then there exists a (two-sided) canonical tester for C.
Our second result applies to classes C that are closed under identification of variables, negation of
variables, and adding or removing irrelevant variables. Following [HR05], we say that such a class is closed
under ID-Neg minors (see Definition 2). For such classes C we show the following:
3
Theorem 2 (Informal) If C is closed under ID-Neg minors and there exists a one-sided independent tester
for C, then there exists a one-sided canonical tester for C.
As we describe in Section A, these two results allow us to give “canonical” versions of many different
Boolean function property testing algorithms that have appeared in the literature.
1.3
Our approach
Developing a canonical tester for Boolean function properties seems to be significantly more challenging
than for graph properties. The high-level idea behind the [GT03] graph testing canonicalization result is that
if k edges have been queried so far, then all “untouched” vertices (that are not adjacent to any of the k queried
edges) are equally good candidates for the next vertex to be involved in the query set. For Boolean function
testing the situation is more complicated because of the structure imposed by the Boolean hypercube; for
example, if the two strings 0n and 1n have been queried so far, then it is clearly not the case that all possible
3rd query strings are “created equal” in relation to these first two query strings.
A natural first effort to canonicalize a Boolean function property tester is to design a canonical tester
that makes its queries in the first stage, and then simply directly uses those queries to “internally” simulate
a run of the independent tester in its second stage. Ideally, in such an “internal” simulation, each time the
original independent tester makes a query the canonical tester would use a query-response pair obtained in
its first stage that corresponds reasonably well to the string queried by the independent tester. However,
this naive approach does not seem to suffice, since an independent tester can easily make queries which do
not correspond well to any query made by the canonical tester. As a simple example, the first query of an
independent tester could independently set each variable xi to 1 with probability 1/3. The number of 1s in
the first query of this independent tester is distributed as a draw from the binomial distribution B(n, 1/3),
but the number of 1s in any query made by a q-query canonical tester is distributed as a draw from the
binomial distribution B(n, p), where p is of the form (integer)/2q . If q is a constant independent of n, these
two distributions have variation distance nearly 1.
The high-level idea of both our constructions is that instead of trying to approximately simulate an
execution of the independent tester on the n-variable function f (which it cannot do), the canonical tester
perfectly simulates an execution of the independent tester on a different function f ′ over n′ relevant variables.
Since this simulation is perfect, the canonical tester successfully tests whether f ′ has property C. For the
case of Noisy-Neg minors the analysis shows that w.h.p. the independent tester’s view looks the same
whether the target function is f ′ or f . Therefore, a “good” answer for f ′ must also be a good answer for f .
For the case of ID-Neg minors, the analysis shows that because of the way f ′ is determined, we have that
(1) if f belongs to C then so does f ′ ; and (2) if f is far from C, then f ′ is at least slightly far from C. Along
with the fact that the canonical tester tests f ′ successfully, these conditions imply that the canonical tester
tests f successfully.
2 Preliminaries
A Boolean function property is simply a class of Boolean functions. Throughout the paper we write Fn to
n
denote the class of all 22 Boolean functions mapping {0, 1}n to {0, 1}. We write Cn to denote a class of
n-variable Boolean functions, i.e. functions from {0, 1}n to {0, 1}.
We adopt all the standard definitions of Boolean function property testing (see e.g. [PRS02, FKR+ 04,
MORS10]) and do not repeat them here because of space limitations. As mentioned in the Introduction,
we may view any nonadaptive testing algorithm T as consisting of two phases: an initial “query generation
phase” T1 in which the query strings are selected (at the end of this phase the queries are performed), and a
subsequent “computation” phase T2 in which some computation is performed on the query-answer pairs and
4
the algorithm either accepts or rejects. Throughout the paper we will describe and analyze testing algorithms
in these terms.
The classes we consider. Our first main result deals with classes of Boolean functions that are closed under
Noisy-Neg minors; we give the relevant definitions below.
Definition 1 (Noise Sensitivity of f ) Let f : {0, 1}n → {0, 1}, let ǫ ∈ [0, 1/2], and let (x, y) be a pair of
(1 − 2ǫ)-correlated random inputs (i.e. x is uniform from {0, 1}n and y is formed by independently setting
each yi to equal xi with probability 1 − 2ǫ, and to be uniform random otherwise). The noise sensitivity of f
at noise rate ǫ is defined to be N Sǫ (f ) := Pr[f (x) 6= f (y)].
(Noise Sensitivity of a class C) Let C = ∪n≥1 Cn be a class of Boolean functions. We define N Sǫ (C) :=
maxn maxf ∈Cn {N Sǫ (f )}, the noise sensitivity of C at noise rate ǫ, to be the maximum noise sensitivity of
any f ∈ C.
(C is closed under Noisy-Neg minors) Let C = ∪n≥1 Cn be a class of Boolean functions. We say that
C is closed under Noisy-Neg Minors if C is closed under negating input variables and there is a function
g(ǫ) (not depending on n) which is such that limǫ→0+ g(ǫ) = 0 and N Sǫ (C) ≤ g(ǫ).
Our second main result deals with classes C that are closed under ID-Neg Minors.
Definition 2 (ID-Neg Minors) Let f ∈ Fn and let f ′ ∈ Fn′ . We say that f ′ is an ID-Neg Minor of f if f ′
can be produced from f by a (possibly empty) sequence of the following operations: (i) Adding/Removing
irrelevant variables (recall that variable xi is irrelevant if there is no input string where flipping xi changes
the value of f ); (ii) Identifying input variables (e.g. the function f (x1 , x1 , x3 ) is obtained by identifying
variable x2 with x1 ); and (iii) Negating input variables.
(C is closed under ID-Neg Minors) Let C = ∪n≥1 Cn be a class of Boolean functions, let f ∈ Fn , and
let f ′ ∈ Fn′ . We say that C is closed under ID-Neg Minors if the following holds: If f ∈ Cn and f ′ is an
ID-Neg Minor of f , then f ′ ∈ C.
The class of GF (2) degree-d polynomials is an example of a class closed under ID-Neg minors. The class
of halfspaces is an example of a class closed under Noisy-Neg minors. For more examples and discussion,
see Section A.
We close this preliminaries section with two definitions that will be useful:
Definition 3 Let f be a function in Fn and let F+ , F− be two disjoint subsets of [n]. We define Noisy(f, F+ , F− ) ∈
Fn to be the function Noisy(f, F+ , F− )(x1 , . . . , xn ) = f (t1 , . . . , tn ), where ti := 1 if i ∈ F+ , ti := 0 if
i ∈ F− ; and ti := xi otherwise.
Intuitively, given a target function f , our canonical tester for classes C that are closed under Noisy-Neg
minors will choose F+ , F− according to some distribution (defined later) and will instead test the target
function f ′ = Noisy-Neg(f, F+ , F− ).
Definition 4 Let f be a function in Fn , F+ and F− be two disjoint subsets of [n], and id be an element of
F+ . For n′ = n − |F+ | − |F− | + 1, we define the function ID-Neg(f, F+ , F− , id) ∈ Fn′ to be the function
ID-Neg(f, F+ , F− , id)(x1 , . . . , xn′ ) = f (t1 , . . . , tn ), where ti := xid if i ∈ F+ ; ti := xid if i ∈ F− ; and
ti := xi otherwise.
Similarly to the case above, given a target function f our canonical tester for classes C that are closed
under ID-Neg minors will choose F+ , F− , id according to some distribution (defined later) and will instead
test the target function f ′ = ID-Neg(f, F+ , F− , id).
5
2.1
The testing algorithms we can canonicalize: Independent Testers
Definition 5 A q(ǫ)-query independent tester for class C is a probabilistic oracle machine T = (T1 , T2 )
which takes as input a distance parameter ǫ and is given access to a black-box oracle for an arbitrary
function f : {0, 1}n → {0, 1}.
(First Stage) The query generation algorithm T1 chooses q(ǫ) query strings in the following way: To
choose the i-th string, the algorithm partitions the set [n] into 2i−1 blocks. The block Bb1 ,...,bi−1 contains
those indices that were set to bj in the jth query string xj for all j = 1, . . . , i − 1. For each block Bb1 ,...,bi−1 ,
for each m ∈ Bb1 ,...,bi−1 , the algorithm sets xim to 1 with probability pb1 ,...,bi and to 0 with probability
1 − pb1 ,...,bi . The resulting string xi is the i-th query string. After choosing all the strings, T1 queries all
q(ǫ) strings x1 , . . . , xq(ǫ) and gets back responses f (x1 ), . . . , f (xq(ǫ) ).
(Second Stage) The computation stage T2 gets as input the q(ǫ) query-answer pairs (x1 , f (x1 )), . . . ,
(xq(ǫ) , f (xq(ǫ) )), does some deterministic computation on this input, and outputs either “accept” or “reject.”
In an independent tester the query generation algorithm T1 must satisfy the following conditions:
• For each string b = (b1 , . . . , bt ) the probability pb = pb (ǫ) is a value 0 ≤ pb ≤ 1 (which may depend
on ǫ but is independent of n).
• For each t, the 2t values pb1 ,...,bt (as b ranges over {0, 1}t ) are all rational numbers, and (over all
t) the denominator of each of these rational numbers is at most c = c(ǫ) (c may depend on ǫ but is
independent of n). We say that c(ǫ) is the granularity of the independent tester T.
If T is a one-sided tester then for any f : {0, 1}n → {0, 1}, if f belongs to C then Pr[T f = “accept”] =
1, and if f is ǫ-far from C then Pr[T f = “reject”] ≥ r(ǫ), where r(ǫ) > 0 is a positive-valued function of ǫ
only. We say that r(ǫ) is the rejection parameter of the tester.
If T is a two-sided tester then for any f : {0, 1}n → {0, 1}, if f belongs to C then Pr[T f = “accept”] =
1 − a(ǫ), and if f is ǫ-far from C then Pr[T f = “reject”] ≥ r(ǫ) where a and r are functions of ǫ only and
for 0 < ǫ < 1/2, a(ǫ) < r(ǫ). We say that a(ǫ) and r(ǫ) are the acceptance and rejection parameters of the
tester respectively.
Given an independent tester T as described above, we let Prod(ǫ) denote the product of the denominators of all probabilities pb1 ,...,bt (ǫ) where t ranges over all possible values 1, 2, . . . , q(ǫ) and b = (b1 , . . . , bt )
q(ǫ)+1
ranges over all t-bit strings. If the tester T is c(ǫ)-granular, it is easy to see that Prod(ǫ) is at most c(ǫ)2
.
It is clear that each subset Bb1 ,...,bt of [n] described above has size binomially distributed according to
B(n, ℓ/Prod(ǫ)) for some integer ℓ.
3 A canonical form for testers, and our main results
Before stating our main results precisely, we give a precise description of the canonical form mentioned in
Section 1.1.
Definition 6 Let q ′ : [0, 1) → N. A q ′ -canonical tester for class C is a probabilistic oracle machine
T = (T1 , T2 ) which takes as input a distance parameter ǫ and is given access to a black-box oracle for an
arbitrary function f : {0, 1}n → {0, 1}, and performs as follows.
Given input parameter ǫ, the query generation algorithm T1 works as follows.
′
1. z 1 , . . . , z q (ǫ) are selected to be independent uniformly random n-bit strings. These strings define a
′
partition B of [n] into 2q (ǫ) blocks: an element i ∈ [n] lies in block Bb1 ,...,bq′ (ǫ) if the i-th bit of string
z j equals bj for all j = 1, . . . , q ′ (ǫ).
6
2. Let QB ⊆ {0, 1}n be the set of all strings x such that the following condition holds: ∀i, j ∈ [n], if i
and j are in the same partition subset Bb1 ,...,bq′ (ǫ) ∈ B then xi = xj .
3. Using the oracle for f , T1 queries all 22
q ′ (ǫ)
strings x ∈ QB .
q ′ (ǫ)
The computation stage T2 gets as input the 22
query-answer pairs [(x, f (x))]x∈QB , does some deterministic computation, and outputs either “accept” or “reject.”
The success criteria for one-sided (two-sided, respectively) canonical testers are entirely similar to the
q ′ (ǫ)
criteria defined above for independent testers. We note that a q ′ -canonical tester makes 22
queries when
run with input parameter ǫ.
3.1
Main results
As our main results, we show that (i) any class that is closed under Noisy-Neg minors and is constantquery testable by a (two-sided) independent tester is also constant-query testable by a (two-sided) canonical
tester; and (ii) any class that is closed under ID-Neg minors and is constant-query testable by a one-sided
independent tester is also constant-query testable by a one-sided canonical tester. More precisely, we prove
the following:
Theorem 3 Let C be any class of functions closed under Noisy-Neg Minors and let g(ǫ) be as in Definition 1. Let T be a q(ǫ)-query independent tester for property C with acceptance and rejection parameters
a(ǫ), r(ǫ). Let q2′ (ǫ) be the smallest integer value that satisfies the following bound:
N S Prod(ǫ) ·
2
Let η ′ =
q ′ (ǫ)
Let
rejection
′
2q2 (ǫ) mod Prod(ǫ)
′
1
q ′ (ǫ)
2 2
(C) ≤
r(ǫ) − a(ǫ)
.
16q(ǫ)
where Prod is as defined in Section 2.1and let q1′ (ǫ) =
l
32
N Sη′ (C)
m
8
.
ln r(ǫ)−a(ǫ)
2q2 (ǫ)
′
= q2 (ǫ) · (q1′ (ǫ) + 1). Then there is a q ′ -canonical tester Canon(T ) for C with acceptance
parameters a′ (ǫ), r′ (ǫ), where a′ (ǫ) = 43 a(ǫ) + 14 r(ǫ), and r′ (ǫ) = 14 a(ǫ) + 34 r(ǫ).
and
Theorem 4 Let C be any class of functions closed under ID-Neg Minors. Let T be a one-sided independent
tester for property C that has query complexity q(ǫ), granularity c(ǫ), and rejection parameter r(ǫ). Let
r(ǫ)
ǫ1 = 4q(ǫ)
and let q ′ (ǫ) be a defined as q ′ (ǫ) = ⌈log(Prod(ǫ) · Prod(ǫ1 ))⌉ where Prod is as described
in Section 2.1. Then there is a one-sided q ′ -canonical tester Canon(T ) for property C which, on input
3r(ǫ)/4
) · r(ǫ1 ).
parameter ǫ, has rejection parameter ( 1−r(ǫ)/4
Throughout the rest of the paper whenever we write “T ” or “C” without any other specification, we are
referring to the tester and property from Theorem 3 or Theorem 4 (which one will be clear from context).
Applications of our main results. We can canonicalize known testers for many constant-query testable
Boolean function classes found in the literature by applying either our first or second result. Because of
space constraints we describe these applications in Appendix A.
4 Overview of the proofs of Theorems 3 and 4
In this section we give a high-level explanation of our arguments and of the constructions of our canonical
testers. Full details and complete proofs of Theorems 3 and 4 are given in the Appendix.
7
We first note that an execution of the Independent Tester T = (T1 , T2 ) (see Definition 5) with input
parameter ǫ creates a 2q(ǫ) -way partition of the n variables by independently assigning each variable to a
randomly chosen subset in the partition with the appropriate probability (of course these probabilities need
not all be equal). All queries made by the independent tester then respect this partition.
Consider the following first attempt at constructing a canonical tester Canon(T ) = (Canon(T )1 , Canon(T )2 )
′
from an independent tester T . In the first stage, Canon(T )1 partitions the n variables into 2q subsets of
′
expected size n/2q , as specified in Definition 6, and makes all corresponding queries; it then passes both
′
the queries and the responses to the second stage, Canon(T )2 . The value q ′ will be such that 2q equals
Prod(ǫ) · k + rem, where 0 ≤ rem < Prod(ǫ), and k is a positive integer. In the second stage, Canon(T )2
chooses the first Prod(ǫ) · k subsets of Canon(T )1 ’s partition (let us say these subsets collectively contain
n′ variables) and ignores the variables in the last rem subsets. For the n′ variables contained in these first
Prod(ǫ) · k subsets, Canon(T )2 can perfectly simulate a partition created by an execution of the indepen′
dent tester T on these n′ variables with parameter ǫ, by “coalescing” these Prod(ǫ) · k subsets into 2q (ǫ)
subsets of the appropriate expected sizes. (To create a subset whose size is binomially distributed according
to B(n′ , ℓ/Prod(ǫ)), Canon(T )2 “coalesces” a collection of kℓ of the Prod(ǫ) subsets.) To simulate each
of the q = q(ǫ) queries that T makes, Canon(T )2 sets the n′ variables as T1 would set them given this
partition.
Obviously, the problem with the above simulation is how to set the extra variables in the remaining rem
subsets in each of the q queries. The n′ variables described above are faithfully simulating the distribution
over query strings that T would make if it were run on an n′ -variable function with input parameter ǫ, but
of course the actual queries that Canon(T ) makes have the additional rem variables, and the corresponding responses are according to the n-variable function f. Thus, we have no guarantee that T2 will answer
correctly w.h.p. when executed on the query-response strings generated by the simulator Canon(T )2 as
described above. Nevertheless, the simulation described above is a good starting point for our actual construction.
The underlying idea of our canonical testers is that instead of (imperfectly) simulating an execution of
the independent tester on the actual n-variable target function f , the canonical tester perfectly simulates
an execution of the independent tester on a related function f ′ . Our analysis shows that due to the special
properties of the independent tester and of the classes we consider, the response of the independent tester on
target function f ′ is also a legitimate response for f .
Below we describe the construction of a canonical tester for two different types of independent testers
and classes. The first construction shows how to transform T , where T is a two-sided independent tester for
a class C that is closed under Noisy-Neg Minors, into Canon(T ), a two-sided canonical tester for class C.
The second construction shows how to transform T , where T is a one-sided independent tester for a class C
closed under ID-Neg minors, into Canon(T ), a one-sided canonical tester for C.
4.1
Construction for two-sided independent testers and classes closed under Noisy-Neg Minors
We first note that it is easy to construct an algorithm that approximates N Sη (f ) of a target function f
by non-adaptively drawing pairs of points (x, y) where x is chosen uniformly at random and y is 1 − 2η
correlated with x. It is also easy to see that if η is a rational number c1 /c2 where c2 is a power of 2, then the
distribution over queries made by such an algorithm can be simulated using a canonical tester.
For ease of understanding we view our second canonical tester as having two parts (it will be clear that
these two parts can be straightforwardly combined to obtain a canonical tester that follows the template
of Definition 6). The first part is an algorithm that approximates N Sη′ (f ) and rejects any f for which
N Sη′ (f ) is noticeably higher than N Sη′ (C) (here η ′ is a parameter of the form (integer)/(power of 2) that
will be specified later).
8
The second part of the tester simulates the partition generated by the independent tester T as described
at the start of Section 4. Let F+ contain the variables assigned to the first rem/2 subsets from the rem
“remaining” subsets, and let F− contain the variables assigned to the last rem/2 of those subsets. As a
thought experiment, we may imagine that the variables in F+ ∪ F− are each independently assigned to a
randomly selected one of the Prod(ǫ) partition subsets with the appropriate probability. In this thought
experiment, we have perfectly simulated a partition generated by running T1 over an n-variable function.
We now define f ′ based on the subsets F+ and F− . The function f ′ is simply the restriction of f under
which all variables in F+ are fixed to 1 and all variables in F− are fixed to 0.
Now, we would like Canon(T ) to generate the q query-answer pairs for f that T1 would make given
the partition from the thought experiment described above. While Canon(T ) cannot do this, a crucial
observation is that Canon(T ) can perfectly simulate q query-answer pairs that T1 would make given the
above-described partition where the answers are generated according to f ′ . Moreover, our analysis will
show (using the fact that C is closed under negation) that we may assume w.l.o.g. that each of these q
queries is individually uniformly distributed over {0, 1}n .
Thus for each individual (uniform random) query string x, we have that f ′ (x) is equivalent to f (y)
′
where y is a random string that is (1 − 2η ′ )-correlated with x, where η ′ = rem/2q . Now since N Sη′ (C)
depends only on η ′ , by choosing η ′ small enough (and q ′ large enough), we have by a union bound that
with high probability f (x) equals f ′ (x) for all the queries x that were generated. Since this is the case,
then T2 must with high probability generate the same output on target function f and f ′ . So since T is (by
hypothesis) an effective tester for f it must be the case that T ’s responses on f ′ are also ”good” for f .
4.2
Construction for one-sided independent testers and classes closed under ID-Neg minors
Our second canonical tester construction also begins by simulating a partition of the independent tester T
over n′ variables as described above. However, now we will think of the parameters as being set somewhat
′
differently: we view the canonical tester as partitioning the n variables into 2q (ǫ) subsets where now q ′ =
′
r(ǫ)
q ′ (ǫ) is such that 2q equals Prod(ǫ1 ) · k + rem, where ǫ1 ≪ ǫ (more precisely ǫ1 = 4q(ǫ)
, though this exact
expression is not important for now), 0 ≤ rem < Prod(ǫ), and k is a positive integer. The canonical tester
q ′ (ǫ)
then defines a new function f ′ over n′ variables by applying an operator F ǫ1 to the set X of 22
query
strings that Canon(T )1 generates; we now describe how this operator acts. Let F+ contain the variables
assigned to the first rem/2 subsets from the rem “remaining” subsets, and let F− contain the variables
assigned to the last rem/2 of the rem subsets. Given f , the function f ′ that is obtained by applying F ǫ1 to
X is chosen in the following way: f ′ is the same as f except that
• A variable xid is chosen by taking the lexicographically first element of F+ ;
• All variables in F+ are identified with xid , and all variables in F− are identified with xid .
Canon(T ) places id at random in one of the remaining partition subsets and then selects the appropriate
set of query strings that T1 would make given the simulated partition over n′ variables described above, and
constructs query-answer pairs for these strings in which the answers are the corresponding values of f ′ on
these strings (note that similar to the crucial observation in Section 4.1, it is indeed possible for Canon(T )
to do this). Finally, Canon(T ) passes these queries and responses to T2 and responds as T2 does.
The proof of correctness of this canonical tester proceeds in two parts. First, we show that with high
probability Canon(T ) successfully tests target function f ′ . (This is an easy consequence of the fact, mentioned above, that Canon(T ) perfectly simulates T ’s partition over the n′ variables.) Second, we show that
(1) if f ∈ C and C is closed under ID-Neg minors then f ′ ∈ C; and (2) if f is ǫ-far from C then w.h.p. f ′
is ǫ1 -far from C, where the value of ǫ1 depends only on ǫ. We note that (2) does not hold in general for f, f ′
9
where f ′ is an arbitrary ID-Neg minor of f . However, our analysis shows that assuming that there exists a
one-sided independent tester for class C, (2) holds for f ′ chosen in the way described above.
Organization of the rest of the paper. Appendix A presents applications of our main results. In Appendix B we describe how to normalize a tester, which is a useful preliminary stage employed in both of our
constructions. Then in Appendix C and Appendix D we present the construction for classes closed under
Noisy-Neg minors and the analysis. Finally, in Appendix E and Appendix F we present the construction for
classes closed under ID-Neg minors and the analysis.
Conclusions. Our work is the first attempt we know of to establish a canonical form for Boolean function
property testers. Our results show that a wide range of efficient testing algorithms for many well-studied
Boolean function classes can be transformed into a natural canonical form.
Building on our work, it would be nice to have more general results that do not require bounded noise
sensitivity or closure under ID-Neg minors, or alternately to have examples showing that some such conditions are necessary for canonicalization. Another natural goal is to improve the quantitative bounds that we
obtain.
References
[AFNS06] N. Alon, E. Fischer, I. Newman, and A. Shapira. A combinatorial characterization of the
testable graph properties: It’s all about regularity. In Proc. STOC, 2006.
[AKK+ 03] N. Alon, T. Kaufman, M. Krivelevich, S. Litsyn, and D. Ron. Testing low-degree polynomials
over GF(2). In Proc. RANDOM, pages 188–199, 2003.
[AS05a]
Noga Alon and Asaf Shapira. A characterization of the (natural) graph properties testable with
one-sided error. In Proc. FOCS, pages 429–438, 2005.
[AS05b]
Noga Alon and Asaf Shapira. Every monotone graph property is testable. In Proc. STOC, pages
128–137, 2005.
[AT08]
Tim Austin and Terry Tao. On the testability and repair of hereditary hypergraph properties.
Submitted to Random Structures and Algorithms, 2008.
[BCH+ 96] M. Bellare, D. Coppersmith, J. Hastad, M. Kiwi, and M. Sudan. Linearity testing in characteristic two. IEEE Trans. on Information Theory, 42(6):1781–1795, 1996.
[BGS98]
M. Bellare, O. Goldreich, and M. Sudan. Free bits, pcps and non-approximability-towards tight
results. SIAM J. Comput., 27(3):804–915, 1998.
[Bla08]
Eric Blais. Improved bounds for testing juntas. In Proc. RANDOM, pages 317–330, 2008.
[Bla09]
Eric Blais. Testing juntas nearly optimally. In Proc. 41st Annual ACM Symposium on Theory
of Computing (STOC), pages 151–158, 2009.
[Bla10]
Eric Blais. Personal communication. 2010.
[BLR93]
M. Blum, M. Luby, and R. Rubinfeld. Self-testing/correcting with applications to numerical
problems. J. Comp. Sys. Sci., 47:549–595, 1993. Earlier version in STOC’90.
10
[DLM+ 07] I. Diakonikolas, H. Lee, K. Matulef, K. Onak, R. Rubinfeld, R. Servedio, and A. Wan. Testing
for concise representations. In Proc. 48th Ann. Symposium on Computer Science (FOCS), pages
549–558, 2007.
[FKR+ 04] E. Fischer, G. Kindler, D. Ron, S. Safra, and A. Samorodnitsky. Testing juntas. J. Computer &
System Sciences, 68(4):753–787, 2004.
[GOS+ 09] P. Gopalan, R. O’Donnell, R. Servedio, A. Shpilka, and K. Wimmer. Testing Fourier dimensionality and sparsity. In Proc. 36th International Colloquium on Automata, Languages and
Programming (ICALP), pages 500–512, 2009.
[GT03]
Oded Goldreich and Luca Trevisan. Three theorems regarding testing graph properties. Random
Structures and Algorithms, 23(1):23–57, August 2003.
[HR05]
Lisa Hellerstein and Vijay Raghavan. Exact learning of DNF formulas using DNF hypotheses.
Journal of Computer & System Sciences, 70(4):435–470, 2005.
[MORS10] K. Matulef, R. O’Donnell, R. Rubinfeld, and R. Servedio. Testing halfspaces. SIAM J. Comp.,
39(5):2004–2047, 2010.
[PRS02]
M. Parnas, D. Ron, and A. Samorodnitsky. Testing basic boolean formulae. SIAM J. Disc.
Math., 16:20–46, 2002.
[Sam07]
A. Samorodnitsky. Low-degree tests at large distances. In Proc. 39th ACM Symposium on the
Theory of Computing (STOC’07), pages 506–515, 2007.
[Val08]
P. Valiant. Testing Symmetric Properties of Distributions. PhD thesis, M.I.T., 2008.
A
Applications of our main results.
In this section, we discuss some of the applications of our two main results (also see Table A). Recall
that Theorem 4 applies to classes closed under ID-Neg minors and to one-sided independent testers, while
Theorem 3 applies to classes closed under Noisy-Neg minors and to (two-sided) independent testers. Many
natural classes of Boolean functions that have been studied in the property testing literature are closed
under either ID-Neg minors or Noisy-Neg minors. Classes closed under ID-Neg minors include J-juntas
[FKR+ 04, Bla08, Bla09] (for J independent of n), s-term DNFs [DLM+ 07] (for s independent of n),
halfspaces [MORS10], GF (2)-linear functions (i.e. parities and their negations) [BLR93], and degree-d
GF (2) polynomials [AKK+ 03]. Classes closed under Noisy-Neg minors include all the classes of Boolean
functions for which testing algorithms were given in [DLM+ 07]; these include decision lists, size-s decision
trees, size-s branching programs, s-term DNFs, size-s Boolean formulas, s-sparse polynomials over GF (2),
size-s Boolean circuits and functions with Fourier degree at most s, where throughout s is independent of n.
For each of these classes, using the fact that functions in the class are well-approximated by juntas (which is
the key to the testing results of [DLM+ 07]) it is not difficult to verify that the class has low noise sensitivity.
We further note that most of the Boolean function property testers found in the literature can be characterized as independent testers. These include the 1-sided “size test” for testing J-juntas of [FKR+ 04],
the (nonadaptive version of) the junta tester of [Bla09], the parity tester of [BLR93], the testers of [PRS02]
for Boolean literals, conjunctions and s-term monotone DNFs, the general “testing by implicit learning”
algorithm of [DLM+ 07] for the classes listed above, and the tester of [MORS10] for halfspaces.
There are several classes that are both closed under Noisy-Neg minors and additionally are known to
have an independent tester, and so our Theorem 3 can be applied to “canonicalize” these algorithms; these
include the testers of [DLM+ 07] and [MORS10] for the classes listed above.
11
Since our ID-Neg result (Theorem 4) requires independent testers that are also one-sided, the setting is
more restrictive, but there are still several testers and classes found in the literature that satisfy our requirements. These include the tester for singletons of [PRS02], the GF (2)-linear function tester of [BLR93], the
[AKK+ 03] tester for degree-d GF (2) polynomials, and the one-sided junta testers given by [FKR+ 04].
Thus, we can canonicalize known testers for many constant-query testable Boolean function classes
found in the literature by applying either our first or second result. One exception comes from [GOS+ 09];
that work gives testing algorithms for the class of Boolean functions with Fourier dimension d (i.e. for
“d-juntas of parities”) and for the class of Boolean functions with s-sparse Fourier spectra. It is easy to
see that these classes are not closed under Noisy-Neg minors, since each class contains all parity functions,
and the testers of [GOS+ 09] do not have 1-sided error. (However, we note that inspection of the testers
provided in [GOS+ 09] shows that they can be straightforwardly “canonicalized” just like the [AKK+ 03]
tester discussed in Section 1.1.)
Function class
literals (dictators)
Reference
[PRS02]
GF (2)-linear functions
GF (2)-deg-d functions
J-juntas
decision lists; size-s
decision trees; size-s
branching programs;
s-term DNFs; size-s
Boolean formulas;
s-sparse
GF (2)
polynomials; size-s
Boolean
circuits;
functions
with
Fourier degree d
halfspaces
functions
with
Fourier dimension
d; functions with
s-sparse
Fourier
spectra
1-sided
YES
ID-Neg
YES
Noisy-Neg
YES
[BLR93]
YES
YES
NO
Theorem
Thm.
Thm. 4
Thm. 4
[AKK+ 03]
YES
YES
NO
Thm. 4
(some)
YES
YES
NO
YES
YES
Thm.
Thm. 4
Thm. 4
NO
NO
YES
NO
YES
NO
[FKR+ 04,
Bla09]
[DLM+ 07]
Bla08,
[MORS10]
[GOS+ 09]
3,
3,
Thm. 4
Table 1: An overview of some Boolean function property testing results in the literature and how they
relate to our results. All of the testing algorithms listed in the table are independent testers in the sense
of Definition 5. The parameters J, s, d are always viewed as independent of n. The “Theorem” column
indicates which of our theorems yields a canonical tester. The testers of [GOS+ 09] are the only algorithms
that cannot be canonicalized using either Theorem 4 or Theorem 3; these testers can be verified to already
essentially be in canonical form.
12
B
Normalizing a tester
In this section we explain how any independent tester T can be “normalized;” having independent testers
that satisfy this normalization condition will be useful in both our constructions that follow.
We will use the following lemma:
Lemma 1 Let T and C be as described in Theorem 3 (resp. Theorem 4). Let S ⊆ [n] be any subset of [n].
Then the following algorithm T (S) = (T (S)1 , T (S)2 ) (with input parameter ǫ) is also a two-sided (resp.
one-sided) non-adaptive tester for C with the same query and success parameters as C.
T (S)1 works as follows:
1. Run T1 with input parameter ǫ to generate the set of query strings Q.
2. For each query string x ∈ Q, if i ∈ S then flip the bit xi . Let Q′ be the resulting set of query strings.
3. Submit the query strings in Q′ to the oracle for f and receive responses R.
T (S)2 works by running T2 with the set of queries Q and the responses R.
Proof: Let f be the function that is being tested. Define the following function f ∗ : f ∗ (x1 , . . . , xn ) =
f (t1 , . . . , tn ) where ti = xi for i ∈ S and ti = xi for i ∈
/ S. Thus the responses R correspond to the output
of f ∗ on the strings in Q.
If f belongs to C, then f ∗ also belongs to C since C is closed under negating input variables. Since the
∗
pair (Q, R) is consistent with f ∗ , we have Pr[(T ′ )f = “accept”] = Pr[T f = “accept”] as required.
On the other hand, if f is ǫ-far from C, then f ∗ is ǫ-far from C. (If this were not the case, then f ∗ would
be ǫ-close to some function g ∗ ∈ C. Then the function g = (g ∗ )∗ would belong to C, and would be such
∗
that f is ǫ-close to g.) Thus, Pr[(T (S))f = “reject”] = Pr[T f = “reject”] as required.
Given T ,C as in Theorem 3 (resp. Theorem 4), we consider the following tester T ′ . (Intuitively, in each
of its executions T ′ randomly chooses the set S and runs T (S) with the chosen S.)
The first phase (T ′ )1 works as follows:
1. Choose a subset S ⊆ [n] uniformly at random (i.e. each i ∈ [n] has independent probability 1/2 of
being in S).
2. Run T (S)1 with input parameter ǫ to generate the set of queries Q′ .
3. Submit the queries in Q′ to the oracle and receive responses R.
The second phase (T ′ )2 works as follows:
1. Run T (S)2 with the set of queries Q′ and the responses R.
Note that by Lemma 1, T ′ is a one-sided (resp. two-sided) non-adaptive tester for class C. In fact, T ′
is a one-sided (resp. two-sided) independent tester for C since it can be viewed in the following way: T ′
chooses the a first query string according to probability p∅ = 1/2. Thus T ′ starts off with an initial partition
of [n] into two subsets B0 , B1 . Now, T ′ executes T on the set B0 and executes the “negation” of T (i.e. all
probability values pb are replaced with 1 − pb ) on the set B1 . So henceforth it will be convenient to view T ′
as a one-sided (resp. two-sided) independent tester for C, with a first query string whose probability p∅ is
set to 1/2.
13
C
Construction of Canon(T ) for classes closed under Noisy-Neg Minors
In this section we describe how the canonical tester Canon(T ) for C is constructed from T (or more precisely, from T ′ ). Before going into details we give the idea behind the construction.
We describe the canonical tester as two separate testers that are run sequentially. It should be obvious
how the two testers can be combined into one tester that satisfies the canonical tester definition given in
Section 3.
The first part of Canon(T ) is denoted by NoiseTest(T ) and the second part of Canon(T ) is denoted
by Simulator(T ). Canon(T ) outputs “reject” iff either NoiseTest(T ) outputs “reject” or Simulator(T )
outputs “reject”, and outputs “accept” iff both NoiseTest(T ) and Simulator(T ) output “accept.”
We now describe NoiseTest(T ) = (NoiseTest(T )1 , NoiseTest(T )2 ):
Description of NoiseTest(T )1 for class C running on input function f with input parameter ǫ:
1. Let Prod(ǫ) be as defined in Section 2.1 for the tester T ′ with input parameter ǫ.
2. We define q ′ (ǫ) to be the smallest integer value that satisfies the following bound:
2N S Prod(ǫ) ·
2
1
′
2q (ǫ)
(Simulator(T ) will subsequently make 22
(C) · (q(ǫ)) ≤
q ′ (ǫ)
r(ǫ) − a(ǫ)
8
queries.)
q (ǫ) mod Prod(ǫ). (We note that η ′ < Prod(ǫ) , since rem <
3. Define η ′ = 2·2rem
′
q ′ (ǫ) , where rem = 2
2·2q (ǫ)
Prod(ǫ). We further recall that N Sδ (f ) decreases as δ decreases for every non-constant function f ;
this is well known and is easy to verify from the Fourier expression for noise sensitivity.).
l
m
8
4. Let m = N S32′ (C) ln r(ǫ)−a(ǫ)
. (This choice of m will be used in a Chernoff bound later in the
′
η
analysis; note that it gives exp((−1/16) · N Sη′ (C) · m/2) ≤
r(ǫ)−a(ǫ)
.
8
)
5. Let Q be a set of pairs (x1 , y 1 ), . . . , (xm , y m ), where each x is chosen uniformly at random, and y is
1 − 2η ′ correlated with x.
6. Submit these m pairs of queries and receive the set of responses R: (f (x1 ), f (y 1 ), . . . , (f (xm ), f (y m )).
Description of NoiseTest(T )2 for class C running on input function f with input parameter ǫ, queries
Q and responses R: Output “reject” if the number of pairs such that [f (xi ) 6= f (y i )] is at least (3/2)N Sη′ (C)·
m, and output “accept” otherwise.
This concludes our description of NoiseTest(T ). Before defining Simulator(T ), we define two useful
distributions:
• Simulator’s Queries: Snǫ is the distribution over sets X of 22
that a q ′ -canonical tester makes with parameter ǫ.
q ′ (ǫ)
query strings to n-variable functions
• Independent Tester’s Queries: Inǫ is the distribution over sets of queries Y to n-variable functions
that the independent tester T ′ makes with input parameter ǫ.
We are now ready to define Simulator(T ) = (Simulator(T )1 , Simulator(T )2 ):
Description of Simulator(T )1 running on input function f with parameter ǫ:
14
1. Draw X ∼ Snǫ . As described in Step 1 of Definition 6, this draw of X induces a partition of [n]
′
into 2q (ǫ) subsets: P = {P1 , . . . , P2q′ (ǫ) }. Note that for each fixed j, each variable is independently
placed in subset Pj with probability 2q1′ (ǫ) .
2. Ask all 22
q ′ (ǫ)
queries x ∈ X to the oracle and receive responses [f (x)]x∈X .
Description of Simulator(T )2 running on input parameter ǫ with queries X and responses [f (x)]x∈X :
1. Given X as above, draw Y ∼ AltInǫ (X) as follows. We will later show that for X ∼ Snǫ , a set of
queries Y generated by this draw is distributed identically to a set of queries Y generated by a draw
from Inǫ . Moreover, Simulator(T ) can always respond correctly to queries in y ∈ Y with the value
f ′ (y) by returning f (x) for some x ∈ X (as described below).
′
• Let Prod(ǫ1 ) be as described in Section 4 and let rem denote 2q (ǫ) mod Prod (note that rem
is always divisible by 2 due to the way T ′ chooses the set S).
Srem/2
• Choose the first
subsets P1 , . . . , Prem of the partition P defined by X. Let F+ = i=1 Pi
Srem
and let F− = rem
i=rem/2+1 Pi .
Prod(ǫ)
• Partition F+ multinomially into Prod(ǫ) subsets F+ = {F+1 , . . . , F+
}, where each variable in F+ is assigned to subset F+i , for 1 ≤ i ≤ Prod(ǫ) independently with probability
1/Prod(ǫ).
Prod(ǫ)
• Partition F− multinomially into Prod(ǫ) subsets F− = {F−1 , . . . , F−
}, where each varii
able in F− is assigned to subset F− , for 1 ≤ i ≤ Prod(ǫ) independently with probability
1/Prod(ǫ).
• Recall that a run of T ′ with parameter ǫ induces a partition of [n] into 2q(ǫ) subsets. For each
1 ≤ i ≤ 2q(ǫ) , let ki be such that ki · n/Prod(ǫ) is the expected size of the i-th subset in this
q ′ (ǫ)
2
partition. Let K equal ⌊ Prod(ǫ)
⌋.
• We now describe how to construct a partition R = {R1 , . . . , R2q(ǫ) }. To construct subset R1 ,
remove the first Kk1 subsets Prem+1 , . . . , Prem+Kk1 remaining in the partition P, the first k1
subsets F+1 , . . . , F+k1 in the partition F+ , and the first k1 subsets F−1 , . . . , F−k1 in the partition F− ,
and place the elements of each of these sets in R1 . To construct subset Ri , for 2 ≤ i ≤ 2q(ǫ) ,
remove the first Kki remaining subsets in P, the first ki remaining subsets in F+ , and the first
ki remaining subsets in F− and place the elements of each of these sets in Ri .
• Let Y be the set of queries asked by T ′ running with input parameter ǫ given the partition R.
• Simulator(T ) answers queries in Y of the form (x1 , . . . , xn ) with f ′ (x1 , . . . , xn ) where the
function f ′ (x1 , . . . , xn ) equals Noisy(f, F+ , F− )(x1 , . . . , xn ) (recall Definition 3).
We denote by F ǫ the (randomized) operator that takes input X and returns the triple (f ′ , F+ , F− )
2. Simulator(T )2 hands the query strings and responses to T2′ and outputs whatever T2′ does.
The reader may have noticed that according to the above description the procedure Simulator(T )2 is a
randomized algorithm, whereas our definition of a canonical tester requires that the “computation stage” be
a deterministic algorithm. However, it is easy to derandomize the computation stage by directly applying
the argument of Section 4.2.3 of [GT03].
15
D
Analysis of Canon(T ) for classes closed under Noisy-Neg Minors
Theorem 5 Canon(T ) is a canonical property tester for C which makes 22
following:
q ′ (ǫ)
queries and satisfies the
• if f belongs to C then Pr[Canon(T )f = “accept”] ≥ 1 − a′ (ǫ), and
• if f is ǫ-far from C then Pr[Canon(T )f = “reject”] ≥ r′ (ǫ),
where a′ (ǫ) = 3a(ǫ)/4 + r(ǫ)/4 and r′ (ǫ) = 3r(ǫ)/4 + a(ǫ)/4.
Theorem 5 is proved via the following three intermediate lemmas:
Lemma 2 If N Sη′ (f ) ≥ 2N Sη′ (C) then NoiseTest(T ) outputs “accept” with probability at most
If N Sη′ (f ) ≤ N Sη′ (C) then NoiseTest(T ) outputs “reject” with probability at most r(ǫ)−a(ǫ)
.
8
r(ǫ)−a(ǫ)
.
8
This is a straightforward consequence of the monotonicity of N Sδ as a function of δ, standard Chernoff
bounds, and the choice of m in Step 4 of NoiseTest(T )1 .
Lemma 3 The distributions Inǫ and X ∼ Snǫ , AltInǫ (X) are identical.
This lemma follows directly from inspection of the procedure described in Step 1 of Simulator(T )2 to
obtain a draw from AltInǫ (X). The i-th subset in a partition induced by a set of queries Y drawn from Inǫ is
a random subset of [n] obtained by independently including each variable with probability ki · n/Prod(ǫ),
and it is straightforward to check that the same is true for Y drawn from AltInǫ (X); similar equivalences
are easily seen to be true for all of the other subsets, and indeed for all collections of subsets.
To state the third lemma we need the following definition and claim: Indep(n, p1 , p2 ) is a distribution
over triples (f ′ , F+ , F− ), where f ′ is an n-variable function and F+ , F− ⊆ [n] are disjoint sets. A draw
from Indep(n, p1 , p2 ) is obtained in the following way:
• Each index i ∈ [n] is independently placed in F+ with probability p1 , placed in F− with probability
p2 , and placed in neither set with probability 1 − p1 − p2 .
• f ′ is set to be Noisy(f, F+ , F− )(x1 , . . . , xn ).
Claim 4 The following two distributions are identical:
• Draw Y ∼ (X ∼ Snǫ , AltInǫ (X)). Draw (f ′ , F+ , F− ) ∼ Indep(n, η ′ =
output (Y, f ′ , F+ , F− ); and
rem
, η′
′
2q (ǫ)+1
=
rem
)
′
2q (ǫ)+1
and
• Draw X ∼ Snǫ . Draw Y ∼ AltInǫ (X) and output (Y, F ǫ (X)).
This claim follows from inspection of the specification of the operator F ǫ (X) given in Step 1 of the
description of Simulator(T )2 .
Now we give the third lemma needed to prove Theorem 5:
Lemma 5 If N Sη′ (f ) ≤ 2N Sη′ (C) then
Pr
ǫ ,(f ′ ,F ,F )=F ǫ (X),Y ∼AltI ǫ (X)
X∼Sn
+ −
n
[f (y) 6= f ′ (y) for some y ∈ Y ] ≤
16
r(ǫ) − a(ǫ)
.
8
Proof: By Claim 4 we have that
Pr
ǫ ,(f ′ ,F ,F )=F ǫ (X),Y ∼AltI ǫ (X)
X∼Sn
+ −
n
=
[f (y) 6= f ′ (y) for some y ∈ Y ]
Pr
ǫ ,AltI ǫ (X)),(f ′ ,F ,F )∼Indep(n,η ′ ,η ′ )
Y ∼(X∼Sn
+ −
n
[f (y) 6= f ′ (y) for some y ∈ Y ].
Thus, it is sufficient to consider
Pr
ǫ ,AltI ǫ (X)),(f ′ ,F ,F )∼Indep(n,η ′ ,η ′ )
Y ∼(X∼Sn
+ −
n
[f (y) 6= f ′ (y) for some y ∈ Y ].
We first observe that each individual query y ∈ Y above is uniformly distributed. Now given an individual
query y ∈ Y , we define y ′ = y1′ , . . . , yn′ where yi′ = 1 if i ∈ F+ , yi′ = −1 if i ∈ F− and yi′ = yi otherwise.
Note that by definition of f ′ we have f ′ (y) = f (y ′ ). Moreover, since variables are placed in F+ (resp. F− )
′
′
′
and set to 1 (resp. −1) independently with probability 2qrem
′ (ǫ)+1 = η , we have that y and y are (1 − 2η )r(ǫ)−a(ǫ)
8q(ǫ) , we have that for each individual
r(ǫ)−a(ǫ)
8q(ǫ) . By a union bound over all y ∈ Y ,
.
is at most r(ǫ)−a(ǫ)
8
correlated. Thus, since by assumption N Sη′ (f ) ≤ 2N Sη′ (C) ≤
query y, the probability that f (y) 6= f (y ′ ) = f ′ (y) is at most
we have that the probability that f (y) 6= f ′ (y) for some y ∈ Y
Having established Lemmas 2, 3, and 5, we now show that they imply Theorem 5.
Proof of Theorem 5: First, suppose that f belongs to C. Thus we have that N Sη′ (f ) ≤ N Sη′ (C). Since by
Lemma 3, the distributions Inǫ and X ∼ Snǫ , AltInǫ (X) are identical, we have that the output of Canon(T )
can only differ from the output of T in two cases: (1) NoiseTest(T ) outputs “reject”. (2) f (y) 6= f ′ (y) for
some y ∈ Y . Since f ∈ C, we have N Sη′ (f ) ≤ N Sη′ (C), so by Lemma 2, we have that (1) occurs with
. Moreover, since N Sη′ (f ) ≤ N Sη′ (C), we have by Lemma 5, that (2) occurs
probability at most r(ǫ)−a(ǫ)
8
r(ǫ)−a(ǫ)
with probability at most
. Thus, we have that Canon(T ) outputs “accept” with probability at least
8
1 − (3a(ǫ)/4 + r(ǫ)/4) = 1 − a′ (ǫ).
Next, suppose that f is ǫ-far from C. There are two cases to consider. The first case is that N Sη′ (f ) ≥
2N Sη′ (C). In this case, NoiseTest(T ) outputs reject with probability at least 1 − r(ǫ)−a(ǫ)
by Lemmas 2.
8
The second case is that N Sη′ (f ) ≤ 2N Sη′ (C). Since Canon(T ) always outputs “reject” if NoiseTest(T )
outputs “reject,” the probability that Canon(T ) outputs “reject” is at least the probability that Canon(T )
outputs “reject” given that NoiseTest(f ) outputs “accept”. Since by Lemma 3, the distributions Inǫ and
X ∼ Snǫ , AltInǫ (X) are identical, we have that if NoiseTest(T ) outputs “accept”, the output of Canon(T )
can only differ from the output of T if f (y) 6= f ′ (y) for some y ∈ Y . By Lemma 5, we have that this occurs
. Thus, Canon(T ) outputs “reject” with probability at least
with probability at most r(ǫ)−a(ǫ)
8
3r(ǫ)/4 + a(ǫ)/4 = r′ (ǫ).
and the theorem is proved.
E
Construction of Canon(T ) for classes closed under ID-Neg minors
In this section we describe how the canonical tester Canon(T ) for C is constructed from T (or more precisely, from T ′ ). Throughout this section q(ǫ) denotes the query complexity of T ′ on input parameter ǫ, and
c(ǫ) denotes the granularity of T ′ as defined in Definition 5.
We begin by defining two useful distributions (the same definitions that were used in Appendix C):
17
q ′ (ǫ)
• Canonical Tester’s Queries: Snǫ is the distribution over sets of 22
queries X to n-variable functions that a q ′ -canonical tester makes when it is run with parameter ǫ.
• Independent Tester’s Queries: Inǫ is the distribution over sets of q(ǫ) queries Y to n-variable functions that the independent tester T ′ makes with input parameter ǫ.
We are now ready to define the canonical tester Canon(T ) = (Canon(T )1 , Canon(T )2 ):
Description of Canon(T )1 running on input function f with parameter ǫ:
1. Draw X ∼ Snǫ . As described in Step 1 of Definition 6, this draw of X induces a partition of [n]
′
into 2q (ǫ) subsets: P = {P1 , . . . , P2q′ (ǫ) }. Note that for each fixed j, each variable is independently
placed in subset Pj with probability 2q1′ (ǫ) .
2. Ask all 22
q ′ (ǫ)
queries x ∈ X to the oracle and receive responses [f (x)]x∈X .
Description of Canon(T )2 running on input parameter ǫ with queries X and responses [f (x)]x∈X :
1. Given input X as above, the 4-tuple (f ′ , F+ , F− , id) is obtained by applying F ǫ1 to X as follows:
r(ǫ)
(recall from the statement of Theorem 4 that ǫ1 = 4q(ǫ)
)
• Let Prod(ǫ1 ) be as described in Section 4. (Recall that as noted in Section 4 we have Prod(ǫ1 ) ≤
q(ǫ )
′
c(ǫ1 )2 1 +1 .) Let rem denote 2q (ǫ) mod Prod (as before, rem is divisible by 2 due to the way
T ′ chooses the set S).
Srem/2
• Choose the first
subsets P1 , . . . , Prem of the partition P defined by X. Let F+ = i=1 Pi
Srem
and let F− = rem
i=rem/2+1 Pi . Let id be the lexicographically first element of F+ .
• Let n′ be the value such that n′ − 1 = n − |F+ | − |F− |. (Intuitively, n′ is the number of elements
in [n] after the elements of F+ and F− are removed and then id is added back in.)
• Let f ′ = ID-Neg(f, F+ , F− , id)(x1 , . . . , xn ). We shall view f ′ as a function over n′ variables.
2. Given X, (f ′ , F+ , F− , id) generated as above, draw Y ∼ AltInǫ1′ (X, id) as follows. We will later
show that a set of n′ -variable queries Y generated by this draw is distributed identically to a set of
queries Y generated by a draw from Inǫ1′ . Moreover, Canon(T ) can always respond correctly to
queries y ∈ Y with the value f ′ (y) by returning f (x) for some x ∈ X as described below.
• Let 2q(ǫ1 ) be the number of subsets in the partition R = {R1 , . . . , R2q(ǫ1 ) } induced by T ′ running
with parameter ǫ1 . Let ki be the nonnegative integer such that the expected size of subset Ri ,
′
2q (ǫ)
equals ki · n/Prod(ǫ1 ). Let K = ⌊ Prod(ǫ
⌋.
1)
• To construct subset R1 , remove the first k1 subsets Prem+1 , . . . , Prem+Kk1 remaining in the
partition P and place the elements of each of the sets in R1 . To construct subset Ri , for 2 ≤ i ≤
2q(ǫ1 ) , remove the first Kki remaining subsets in P and place the elements of each of the sets in
Ri .
• Randomly choose a subset Rj , where each subset Ri has probability ki /Prod(ǫ1 ) of being
chosen. Place the variable xid in the subset Rj .
• Let Y be the set of queries asked by T ′ running with input parameter ǫ1 given the partition R.
• Canon(T ) answers queries in Y of the form (x1 , . . . , xn′ ) with f ′ (x1 , . . . , xn ) where the function f ′ (x1 , . . . , xn ) equals ID-Neg(f, F+ , F− , id) (recall Definition 4).
3. Canon(T ) hands the query strings and responses to T2′ and outputs whatever T2′ does.
As at the end of Appendix C, the randomness in the computation stage can be eliminated using the
[GT03] approach.
18
Analysis of Canon(T ) for classes closed under ID-Neg Minors
F
In this section we prove the following result which implies Theorem 4:
Theorem 6 Canon(T ) is a canonical property tester for C which makes 22
following:
q ′ (ǫ)
queries and satisfies the
• if f belongs to C then Pr[Canon(T )f = “accept”] = 1; and
• if f is ǫ-far from C then Pr[Canon(T )f = “reject”] ≥ (1 −
1−r(ǫ)
1−r(ǫ)/4 )
· r(ǫ1 ).
We prove Theorem 6 using the following two intermediate lemmas. In Lemma 6 we write “id(X)” to denote
the element id that is determined by X (see Step 1 of the description of Canon(T )2 ).
Lemma 6 Let T be a one sided independent tester for C. Fix any possible outcome X̃ of draws from Snǫ and
˜ be the 4-tuple that is obtained from X̃ by applying the operator F ǫ1 to X̃ as described
let (F̃+ , F̃− , f˜′ , id)
in Section 4.2. Then we have
ǫ ,Y
X∼Sn
Prǫ
∼AltIn1′ (X,id(X))
˜
[Canon(T ) outputs “accept” | F ǫ1 (X) equals (f˜′ , F̃+ , F̃− , id)
and f˜′ ∈ C] = 1
and
ǫ ,Y
X∼Sn
Prǫ
∼AltIn1′ (X,id(X))
˜
[Canon(T ) outputs “reject” | F ǫ1 (X) equals (f˜′ , F̃+ , F̃− , id)
and f˜′ is ǫ1 -far from C] ≥ r(ǫ1 ).
Lemma 7 Suppose f is ǫ-far from C. Let X be drawn from Snǫ and let (f ′ , F+ , F− , id) be obtained by
1−r(ǫ)
applying F ǫ1 to X. Then with probability at least 1 − 1−r(ǫ)/4
, the function f ′ (over n′ variables) is ǫ1 -far
from C.
We first show that Lemmas 6 and 7 imply Theorem 6.
Proof: First, assume f belongs to C. Then since C is closed under ID-Neg minors, we have that for any
sequence X̃ of draws, the function f˜′ resulting from F ǫ1 (X̃) is in C. Thus, by Lemma 6 we have that
Canon(T ) outputs “accept” with probability 1.
1−r(ǫ)
Next, assume f is ǫ-far from C. Then by Lemma 7 we have that with probability 1− 1−r(ǫ)/4
, f ′ is ǫ1 -far
′
from C. If f is ǫ1 -far from C then we have by Lemma 6 that Canon(T ) outputs “reject” with probability
1−r(ǫ)
r(ǫ1 ). Thus, Canon(T ) outputs “reject” on f with overall probability at least (1 − 1−r(ǫ)/4
) · r(ǫ1 ).
We now prove Lemmas 6 and 7. To prove Lemma 6, we will use the following claim, the correctness of
which follows from inspection of the specification of AltIñǫ1′ (X) as given in Step 2 of the description of
Canon(T )2 in Appendix E.
˜ be the 4-tuple that is obClaim 8 Fix any possible outcome X̃ of draws from Snǫ and let (f˜′ , F̃+ , F̃− , id)
ǫ
˜
tained from X̃ by applying the operator F 1 to X̃ as described in Section 4.2 (note that once (f˜′ , F̃+ , F̃− , id)
′
′
˜
has been determined this fixes the value of ñ , the number of variables that f is defined over). Then the two
distributions
• Draw Y ∼ Iñǫ1′ and output Y ; and
19
˜ Draw Y ∼ AltI ǫ1′ (X) and output Y
• Draw X ∼ Snǫ , conditioned on F ǫ1 (X) = (f˜′ , F̃+ , F̃− , id).
ñ
are identical.
We are now ready to prove Lemma 6.
Proof: By Claim 8, the distributions Inǫ1′ and AltInǫ1′ (X, id(X)) are identical. Additionally, the canonical
tester is able to answer correctly with respect to f ′ every query in Y where Y ∼ AltInǫ1′ (X). Since T ′ is a
one-sided tester for class C, it must be the case that if f ′ ∈ C, Canon(T ) outputs “accept” and if f ′ is ǫ1 -far
from C then Canon(T ) outputs “reject” with probability r(ǫ1 ).
Before proving Lemma 7, we define the following distribution:
Alternative view of distribution over functions f ′ : AltF ǫ1 (Y ) is a distribution over 4-tuples (f ′ , F+ , F− , id),
where f ′ is an n′ -variable function (for n′ = n − |F+ | − |F− | + 1), F+ and F− are disjoint subsets of [n],
id is the lexicographically first element of F+ , and Y is drawn from Inǫ1 .
Given Y drawn from Inǫ1 , a draw from AltF ǫ1 (Y ) is obtained in the following way:
• Let Prod(ǫ) and Prod(ǫ1 ) be defined according to the tester T ′ as described in Section 2.1.
• Let R+ , R− be a fixed pair of two subsets of the partition defined by Y that have the following
property: for every query string y ∈ Y , for all pairs (i, j) such that i ∈ R+ , j ∈ R− , we have yi = y j .
(Note that such subsets are guaranteed to exist by the way that T ′ performs its first query.) Let k+ be
such that the expected size of R+ is k+ · n, and similarly the expected size of R− is k− · n.
• Choose the subsets F+ , (F− ) from R+ , (R− ) by placing each variable from R+ (R− ) in F+ (F− )
independently with probability pR+ = 2·k rem
. Note that due to the definition of q ′ (ǫ) and the fact
′
·2q (ǫ)
+
that rem ≤ Prod(ǫ1 ), k+ ≥ 1/Prod(ǫ) (k− ≥ 1/Prod(ǫ)) we must have that pR+ ≤ 1 (pR− ≤ 1).
• Choose the element id to be the lexicographic first element of F+ .
• Set f ′ = ID-Neg(f, F+ , F− , id).
The following claim shows that the distribution Y ∼ Inǫ , AltF ǫ1 (Y ) is identical to the distribution X ∼ Snǫ ,
F ǫ1 (X) used by Canon(T ).
Claim 9 The following two distributions are identical:
• F ǫ1 (X) (where X is drawn from Snǫ ) and
• AltF ǫ1 (Y ) (where Y is drawn from Inǫ ).
We require two more definitions:
• Restricting independent Tester’s queries to n′ variables: Restrict(Y, f ′ , F+ , F− , id) is an operator over sets of queries Z, where Y ∼ Inǫ , (f ′ , F+ , F− , id) ∼ AltF ǫ1 (Y ). A draw from Restrict(Y, f ′ , F+ , F− , id)
is obtained in the following way:
– Let Z be the restriction of Y to the variables in [n] \ (F+ ∪ F− ) ∪ {id}.
• Alternate view of distribution over independent Tester’s queries over n variables restricted
to n′ variables: AltRestrictǫ is a distribution over 5-tuples (f ′ , F+ , F− , id, Z). A draw from
AltRestrictǫ is obtained in the following way:
20
– Choose the subsets F+ , F− , [n] \ (F+ ∪ F− ) by independently placing each variable in F+ with
rem
probability 2·2rem
q ′ (ǫ) , F− with probability 2·2q ′ (ǫ) , and otherwise in [n] \ (F+ ∪ F− ).
– Fix id to be the lexicographically first element of F+ , and set f ′ to be f ′ = ID-Neg(f, F+ , F− , id).
– Let Tnǫ be the distribution over sets of q(ǫ) queries Y to n-variable functions that T makes with
input parameter ǫ. Note that for a draw of Y ∼ Tnǫ there is a subset R that corresponds to
R+ ∪ R− , where R− , R+ are the subsets described in the “Alternative view of distribution over
functions f ′ ” given above. Draw Y ∼ Tnǫ conditioned on the indices in F +, F− being placed in
this partition subset R. Set Y ′ to be the restriction of Y to the variables in [n] \ (F+ ∪ F− ).
– Choose a subset S ′ ⊆ [n] \ (F+ ∪ F− ) ∪ {id} uniformly at random (i.e. each i ∈ [n] \ (F+ ∪
F− ) ∪ {id} has independent prob. 1/2 of being in S ′ , as in Section B).
– For each query in Y ′ , flip the value of xi for all i ∈ S ′ and add the query to Z.
We have the following claim:
Claim 10 The following two distributions over sets of queries Z are identical:
• Draw Y from Inǫ , draw (f ′ , F+ , F− , id) from AltF ǫ1 (Y ), and output
Z = Restrict(Y, f ′ , F+ , F− , id).
• Draw (f ′ , F+ , F− , id, Z) from AltRestrictǫ and output Z.
Finally, we define one more operator:
Alternate view of a single query made by the independent Tester restricted to n′ variables: For 1 ≤
i ≤ q(ǫ) and Z generated from AltRestrictǫ as described above, we define Query i (Z) as the single query
string obtained by returning the i-th query string from Z.
The following claim shows that a draw from Query i (Z) is uniformly distributed.
˜ Z̃) from AltRestrictǫ . Fix any 1 ≤ i ≤ q(ǫ). The
Claim 11 Fix any possible outcome (f˜′ , F̃+ , F̃− , id,
following two distributions are identical:
• Output (f˜′ , Un′ ) (i.e. the second coordinate is a uniform random n′ -bit string); and
˜
• Draw (f ′ , F+ , F− , id, Z) from AltRestrictǫ , conditioned on (f ′ , F+ , F− , id) being identical to (f˜′ , F̃+ , F̃− , id).
′
i
Output (f˜ , Query (Z))
Proof: By the definition of AltRestrictǫ , no matter what is the outcome of (f ′ , F+ , F− , id), the final choice
of S ′ makes the i-th query string in Z uniform random.
Using the defined distributions and corresponding claims, we are now ready to prove Lemma 7:
Proof of Lemma 7: Assume that f is ǫ-far from C. Let X be drawn from Cnǫ , and let (f ′ , F+ , F− , id) be
F ǫ1 (X). Let p1 denote the probability that f ′ (a function over n′ variables) is ǫ1 -close to C. We will upper
bound p1 and thus prove the lemma.
Let Y be drawn from Inǫ , and let (f ′ , F+ , F− , id) be drawn from AltF ǫ1 (Y ). By Claim 9 the two
distributions X ∼ Snǫ , F ǫ1 (X) and Y ∼ Inǫ , AltF ǫ1 (Y ) are identical. So we view f ′ as chosen by
a draw from AltF ǫ1 (Y ). Let us now additionally consider a draw Z from the distribution Y ∼ Inǫ ,
(f ′ , F+ , F− , id) ∼ AltF ǫ1 (Y ), Restrict(Y, f ′ , F+ , F− , id).
We have that by Claim 10 that the two distributions over Z
Y ∼ Inǫ , (f ′ , F+ , F− , id) ∼ AltF ǫ1 (Y ), Z ∼ Restrict(Y, f ′ , F+ , F− , id)
21
and
Z ∼ AltRestrictǫ
are identical, so we can alternatively view Z as a draw from AltRestrictǫ .
Since for 1 ≤ i ≤ q(ǫ), (f ′ , Z ∼ AltRestrictǫ , Query i (Z)) ≡ (f ′ , Un′ ), Claim 11 tells us that each
query in Z is uniformly distributed. We note that if f ′ over n′ variables is ǫ1 -close to Cn′ , then this implies
that there is a function g ′ ∈ Cn′ such that Prz∼Un′ [f ′ (z) 6= g ′ (z)] < ǫ1 .
By a union bound since the queries in Z are individually uniformly distributed, the probability that one
of the queries z in Z is such that f ′ (z) 6= g ′ (z) is at most q(ǫ) · ǫ1 . Alternatively, by viewing Z as a draw
from the distribution Y ∼ Inǫ , (f ′ , F+ , F− , id) ∼ AltF ǫ1 (Y ), Restrict(Y, f ′ , F+ , F− , id), we have that
the probability that all the queries y in Y ∼ Inǫ are such that f ′ (y) = g(y) (where g(y) ∈ Cn is obtained
from g ′ (y) by adding back the irrelevant variables corresponding to the indices in (F+ ∪ F− ) \ {id}, and
we view f ′ here as an n-variable function) is also at least (1 − q(ǫ) · ǫ1 ). Since T ′ is one-sided, and since
we have that for any Y ∼ Inǫ , (f ′ , F+ , F− , id) ∼ AltF ǫ1 (Y ), it must be the case that f (y) = f ′ (y) for all
y ∈ Y , this means that:
Pr [T2′ (Y ) accepts]
ǫ
Y ∼In
=
≥
≥
=
=
Pr
[T2′ (Y ) accepts]
Pr
[T2′ (Y ) accepts ∧ f ′ is ǫ1 -close to Cn′ ]
Pr
[f ′ is ǫ1 -close to Cn′ ∧ f (y) = g(y) for all y ∈ Y ]
Pr
[f ′ is ǫ1 -close to Cn′ ∧ f ′ (y) = g(y) for all y ∈ Y ]
Pr
[f ′ is ǫ1 -close to Cn′ ] ·
Pr
[f ′ (y) = g(y) for all y ∈ Y |f ′ is ǫ1 -close to Cn′ ]
ǫ ,(f ′ ,F ,F ,id)∼AltF ǫ1 (Y )
Y ∼In
+ −
ǫ ,(f ′ ,F ,F ,id)∼AltF ǫ1 (Y )
Y ∼In
+ −
ǫ ,(f ′ ,F ,F ,id)∼AltF ǫ1 (Y )
Y ∼In
+ −
ǫ ,(f ′ ,F ,F ,id)∼AltF ǫ1 (Y )
Y ∼In
+ −
ǫ ,(f ′ ,F ,F ,id)∼AltF ǫ1 (Y )
Y ∼In
+ −
ǫ ,(f ′ ,F ,F ,id)∼AltF ǫ1 (Y )
Y ∼In
+ −
So PrY ∼Inǫ [T2′ (Y ) accepts ] ≥ p1 · (1 − q(ǫ) · ǫ1 ). However, since f is ǫ-far from C, T ′ running with
input parameter ǫ must accept with probability at most 1 − r(ǫ). Thus, p1 · (1 − q(ǫ) · ǫ1 ) ≤ 1 − r(ǫ). So
1−r(ǫ)
p1 ≤ 1−r(ǫ)/4
.
22