[go: up one dir, main page]

Academia.eduAcademia.edu

A Canonical Form for Testing Boolean Function Properties

2011, Lecture Notes in Computer Science

A canonical form for testing Boolean function properties Dana Dachman-Soled∗ Columbia University dg2342@columbia.edu Rocco A. Servedio† Columbia University rocco@cs.columbia.edu April 13, 2011 Abstract In a well-known result on graph property testing, [GT03] showed that every testable graph property has a “canonical” testing algorithm in which a set of vertices is selected uniformly at random and the edges queried are the complete graph over the selected vertices. In this paper we define a similar-in-spirit canonical form for Boolean function testing algorithms, and show that under some mild conditions on the function class and testing algorithm, property testers for Boolean functions can be transformed into this canonical form. We establish two main results. The first shows, roughly speaking, that every “nice” family of Boolean functions that has low noise sensitivity and is testable by an “independent tester,” has a canonical testing algorithm. The second result is similar but holds instead for families of Boolean functions that are closed under ID-negative minors. Taken together, these two results cover almost all of the constantquery Boolean function testing algorithms that we know of in the literature, and show that all of these testing algorithms can be automatically converted into a canonical form. ∗ † Supported in part by an FFSEAS Presidential Fellowship. Supported by NSF grants CCF-0347282, CCF-0523664 and CNS-0716245, and by DARPA award HR0011-08-1-0069. 0 1 Introduction Over more than a decade property testing has emerged as a exciting and intensively studied area of theoretical computer science, with close connections to diverse topics such as sublinear-time algorithms and probabilistically checkable proofs in complexity theory. As the field has matured several distinct strands of research have emerged corresponding to different types of objects to be tested: graphs, Boolean functions, error-correcting codes, probability distributions, and so on. Over the years each of these sub-areas has developed its own body of standard tools and proof techniques, and the results that have been obtained have different flavors across the different areas. As one example, in graph property testing powerful and general characterizations [AFNS06, AT08, AS05a, AS05b] have been given for what properties are constant-query testable with a number of queries depending only on the error parameter ǫ but not on the number of vertices n. In Boolean function testing, on the other hand, many specific properties have been investigated (see e.g. [BLR93, BCH+ 96, BGS98, PRS02, AKK+ 03], [Sam07, FKR+ 04, Bla08, Bla09, MORS10]) but no general characterizations have been given of what makes a property of Boolean functions constant-query testable. Given this state of affairs, a natural goal is to obtain a more unified view of property testing by uncovering deeper underlying similarities between general results on testing different kinds of objects. This high-level goal provides the impetus behind the current work. The aim of this paper is to obtain “canonical” testers for testable Boolean function properties, similar to the canonical testers that have been established for testable graph properties by Goldreich and Trevisan [GT03].1 Specialized to properties that are testable with a constant number of queries independent of n, the [GT03] result is essentially as follows: Let P be any graph property2 that has a q(ǫ)-query testing algorithm, independent of the number of vertices n. The [GT03] result states that property P is efficiently testable by an algorithm that follows a simple prescribed “canonical form:” it draws q(ǫ) vertices independently at random,  q(ǫ) queries all 2 edges between those vertices, does some deterministic computation on the q(ǫ)-node graph thus obtained, and outputs “accept” or “reject.” Our work addresses the following natural question: is there a similar “canonical form” for Boolean function property testing algorithms? Such a result would presumably say that any property of Boolean functions that is constant-query testable is in fact constant-query testable by an algorithm that works roughly as follows: it tosses some fair coins, exhaustively queries the function on “all inputs defined by those coin tosses” (in some suitable sense), does some deterministic computation on the resulting query-response pairs, and outputs “accept” or “reject.” (Note that in this first investigation we consider only constant-query testable properties; indeed, the number of queries our canonical tester makes will be doubly exponential in the number of queries made by the original tester.) We elaborate below in Section 1.1, where we give a precise definition of a “canonical Boolean function testing algorithm”. But first it is useful to explain how we may view any Boolean function property testing algorithm as a collection of probability distributions, as a prelude to explaining our notion of a canonical tester for Boolean function properties. Viewing a testing algorithm as a collection of distributions. Let P be any class of Boolean functions that has a q(ǫ)-query testing algorithm A, with query complexity independent of the number of input variables n. We may assume without loss of generality that A is nonadaptive, since if it is adaptive we can convert it to a nonadaptive algorithm in a standard way (this incurs an exponential penalty in the query complexity but it remains independent of n). Since A is nonadaptive, it first generates its entire sequence of q(ǫ) query strings and then queries them and performs some computation on the results. We may view the first (query generation) stage of 1 We note that P. Valiant [Val08] has given a “Canonical Tester” for a wide class of properties of discrete probability distributions. Our setting of testing Boolean functions is much closer to graph property testing (in both scenarios the tester actively chooses inputs and queries them) than it is to testing probability distributions (where the tester passively receives independent draws from the distribution). 2 Recall that by definition, a graph property is closed under relabeling vertices. 1 the algorithm as proceeding in the following way: The first query string x1 is drawn from a probability distribution D∅ (which may be arbitrary) over {0, 1}n . The outcome x1 ∈ {0, 1}n of this draw determines a probability distribution Dx1 over {0, 1}n from which the second query string x2 is drawn. The outcomes x1 , x2 of the first two draws determines a distribution Dx1 ,x2 from which the third query string x3 is drawn, and so on. (Note that while later query strings do not depend on the answers to earlier queries, they may depend on the outcome of the randomness that was used to construct earlier queries.) In the second stage, once the q(ǫ) strings x1 , . . . , xq(ǫ) have been generated, the algorithm makes its queries on those strings and gets a response bit f (xi ) for each query string. It then performs a computation on the q(ǫ) queryresponse pairs, and outputs the result (“accept/reject”) of that computation. This computation may a priori be randomized, but a straightforward argument (given in Section 4.2.3 of [GT03]) shows that without loss of generality it may be assumed to be deterministic. Thus (ignoring for the moment the second “deterministic computation” stage that is performed once all the queries have been made), any nonadaptive testing algorithm that makes q(ǫ) queries corresponds to the collection of all distributions Dx1 ,...,xt described above, where t ranges from 1 to q(ǫ) and each xi ranges over all possible n-bit strings. This collection is somewhat complicated and cumbersome to reason about; different testing algorithms may correspond to different collections of probability distributions. Is there a simpler “canonical form” for the query generation stage of every Boolean function testing algorithm? 1.1 A canonical form for Boolean function testing algorithms In the [GT03] result, intuitively there is only one type of distribution over queries for all testing algorithms (and this distribution is very simple) – all of the difference between two q(ǫ)-query testing algorithms comes from the deterministic computation they do once all the query-answer pairs have been obtained. We would like an analogous result for testing Boolean functions, which similarly involves only one kind of (simple) distribution over queries, and where the difference between different testers comes from the deterministic computation that is done once all query-answer pairs are in hand. Motivated by these considerations, we consider the following canonical form for a testing algorithm (we make this precise in Section 3): • First stage (query generation): Let z 1 , . . . , z k be independent and uniform random strings from {0, 1}n . This defines a natural partition of [n] into 2k blocks, which are expected to be of approximately equal size: an element i ∈ [n] lies in block B(b1 ,...,bk ) if (zi1 , . . . , zik ) = (b1 , . . . , bk ), i.e. in each query string z j , the i-th bit is set to bi . • We say that a string x = x1 . . . xn ∈ {0, 1}n “respects the partition” if within each block Bb , either k all variables xi are set to 0 or all are set to 1. There are 22 strings in {0, 1}n that respect the partition; k these are the 22 queries that a canonical tester makes. k • Second stage: With these 22 query-answer pairs in hand, the algorithm does some (deterministic) computation and outputs “accept” or “reject.” We view this canonical form for Boolean function testers as both simple and natural. Some examples of known testers that can easily be converted to canonical form as described above include the tester of [BLR93] for GF (2) linear functions and the tester of [AKK+ 03] for degree k polynomials over GF (2). Let us consider the [AKK+ 03] tester and see how to convert it to canonical form. The [AKK+ 03] tester works by choosing k + 1 strings z 1 , . . . z k+1 ∈ {0, 1}n uniformly at random and then querying all points in the induced subspace. If a string x is in the induced subspace of z 1 , . . . , z k+1 then it must also “respect the partition” induced by z 1 , . . . , z k+1 . So to convert the [AKK+ 03] tester to our canonical form, all we have to do is ask some more queries. 2 A natural first hope is to generalize the above examples and show that every Boolean function property that is testable using constantly many queries has a “canonical form” constant-query testing algorithm of the above sort. However, E. Blais [Bla10] has observed that there is a simple property that is testable with O(1/ǫ) queries but does not have a constant-query canonical tester of the above sort: this is the property of being a symmetric Boolean function. Let SYM be the set of all symmetric Boolean functions (i.e., all functions where f (x) is determined by |x|, the Hamming weight of x). SYM can be tested with a constant number of queries with the following algorithm: • Pick O(1/ǫ) pairs of points (xi , y i ) ∈ {0, 1}n × {0, 1}n by choosing x uniformly at random from {0, 1}n then choosing y uniformly at random from all inputs with the same weight as x. • Check that for each pair f (xi ) = f (y i ). Accept if this holds, and otherwise reject. It is clear that if f ∈ SYM then the above test accepts with probability 1. On the other hand, for any f that is ǫ-far from SYM, with probability at least ǫ the string xi is one of the “bad” inputs that has the minority output in its level, and with probability at least 1/2 the string y i is one of the inputs with the majority output for the same level. So with probability at least ǫ/2, (xi , y i ) is a witness to the fact that f is not symmetric, and O(1/ǫ) queries are sufficient to reject f with probability at least 2/3. To show that SYM cannot be tested by a constant-query canonical tester, it suffices to show that for k k = o(log log n), with high probability each of the 22 = no(1) queries generated by the tester has different Hamming weight. This can be established by a straightforward but somewhat tedious argument which we omit here (the main ingredients are the observation that each query string x generated by the canonical tester has Hamming weight distributed according to a binomial distribution B(n, 2jk ) for some integer j, together with standard anti-concentration bounds on binomial distributions with sufficiently large variance). The example above shows that, unlike the graph testing setting, it is not the case that every constantquery Boolean function tester can be “canonicalized.” So in order to obtain meaningful results on “canonicalizing” Boolean function testers, one must restrict the types of properties and/or testers that are considered; this is precisely what we do in our results, as explained below. 1.2 Our results Our main results are that certain “nice” testing algorithms, for certain “nice” types of Boolean function properties, can automatically be converted into the above-described canonical form. Roughly speaking, the testing algorithms we can handle are ones for which every distribution Dx1 ,...,xt in the query generation phase is a product of n Bernoulli distributions over the n coordinates (with some slight additional technical restrictions that we describe later). This is a restricted class of algorithms, but it includes many different testing algorithms that have been proposed and analyzed in the Boolean function property testing literature. We call such testing algorithms “Independent testers” (see Section 2.1 for a precise definition), and we give two results showing that independent testers for certain types of properties can be “canonicalized.” Our first result applies to classes C that are closed under negating variables and contain only functions with low noise sensitivity. We say that such a class C is closed under Noisy-Neg minors (see Definition 1). For such classes C we show the following: Theorem 1 (Informal) If C is closed under Noisy-Neg minors and there exists a (two-sided) independent tester for C, then there exists a (two-sided) canonical tester for C. Our second result applies to classes C that are closed under identification of variables, negation of variables, and adding or removing irrelevant variables. Following [HR05], we say that such a class is closed under ID-Neg minors (see Definition 2). For such classes C we show the following: 3 Theorem 2 (Informal) If C is closed under ID-Neg minors and there exists a one-sided independent tester for C, then there exists a one-sided canonical tester for C. As we describe in Section A, these two results allow us to give “canonical” versions of many different Boolean function property testing algorithms that have appeared in the literature. 1.3 Our approach Developing a canonical tester for Boolean function properties seems to be significantly more challenging than for graph properties. The high-level idea behind the [GT03] graph testing canonicalization result is that if k edges have been queried so far, then all “untouched” vertices (that are not adjacent to any of the k queried edges) are equally good candidates for the next vertex to be involved in the query set. For Boolean function testing the situation is more complicated because of the structure imposed by the Boolean hypercube; for example, if the two strings 0n and 1n have been queried so far, then it is clearly not the case that all possible 3rd query strings are “created equal” in relation to these first two query strings. A natural first effort to canonicalize a Boolean function property tester is to design a canonical tester that makes its queries in the first stage, and then simply directly uses those queries to “internally” simulate a run of the independent tester in its second stage. Ideally, in such an “internal” simulation, each time the original independent tester makes a query the canonical tester would use a query-response pair obtained in its first stage that corresponds reasonably well to the string queried by the independent tester. However, this naive approach does not seem to suffice, since an independent tester can easily make queries which do not correspond well to any query made by the canonical tester. As a simple example, the first query of an independent tester could independently set each variable xi to 1 with probability 1/3. The number of 1s in the first query of this independent tester is distributed as a draw from the binomial distribution B(n, 1/3), but the number of 1s in any query made by a q-query canonical tester is distributed as a draw from the binomial distribution B(n, p), where p is of the form (integer)/2q . If q is a constant independent of n, these two distributions have variation distance nearly 1. The high-level idea of both our constructions is that instead of trying to approximately simulate an execution of the independent tester on the n-variable function f (which it cannot do), the canonical tester perfectly simulates an execution of the independent tester on a different function f ′ over n′ relevant variables. Since this simulation is perfect, the canonical tester successfully tests whether f ′ has property C. For the case of Noisy-Neg minors the analysis shows that w.h.p. the independent tester’s view looks the same whether the target function is f ′ or f . Therefore, a “good” answer for f ′ must also be a good answer for f . For the case of ID-Neg minors, the analysis shows that because of the way f ′ is determined, we have that (1) if f belongs to C then so does f ′ ; and (2) if f is far from C, then f ′ is at least slightly far from C. Along with the fact that the canonical tester tests f ′ successfully, these conditions imply that the canonical tester tests f successfully. 2 Preliminaries A Boolean function property is simply a class of Boolean functions. Throughout the paper we write Fn to n denote the class of all 22 Boolean functions mapping {0, 1}n to {0, 1}. We write Cn to denote a class of n-variable Boolean functions, i.e. functions from {0, 1}n to {0, 1}. We adopt all the standard definitions of Boolean function property testing (see e.g. [PRS02, FKR+ 04, MORS10]) and do not repeat them here because of space limitations. As mentioned in the Introduction, we may view any nonadaptive testing algorithm T as consisting of two phases: an initial “query generation phase” T1 in which the query strings are selected (at the end of this phase the queries are performed), and a subsequent “computation” phase T2 in which some computation is performed on the query-answer pairs and 4 the algorithm either accepts or rejects. Throughout the paper we will describe and analyze testing algorithms in these terms. The classes we consider. Our first main result deals with classes of Boolean functions that are closed under Noisy-Neg minors; we give the relevant definitions below. Definition 1 (Noise Sensitivity of f ) Let f : {0, 1}n → {0, 1}, let ǫ ∈ [0, 1/2], and let (x, y) be a pair of (1 − 2ǫ)-correlated random inputs (i.e. x is uniform from {0, 1}n and y is formed by independently setting each yi to equal xi with probability 1 − 2ǫ, and to be uniform random otherwise). The noise sensitivity of f at noise rate ǫ is defined to be N Sǫ (f ) := Pr[f (x) 6= f (y)]. (Noise Sensitivity of a class C) Let C = ∪n≥1 Cn be a class of Boolean functions. We define N Sǫ (C) := maxn maxf ∈Cn {N Sǫ (f )}, the noise sensitivity of C at noise rate ǫ, to be the maximum noise sensitivity of any f ∈ C. (C is closed under Noisy-Neg minors) Let C = ∪n≥1 Cn be a class of Boolean functions. We say that C is closed under Noisy-Neg Minors if C is closed under negating input variables and there is a function g(ǫ) (not depending on n) which is such that limǫ→0+ g(ǫ) = 0 and N Sǫ (C) ≤ g(ǫ). Our second main result deals with classes C that are closed under ID-Neg Minors. Definition 2 (ID-Neg Minors) Let f ∈ Fn and let f ′ ∈ Fn′ . We say that f ′ is an ID-Neg Minor of f if f ′ can be produced from f by a (possibly empty) sequence of the following operations: (i) Adding/Removing irrelevant variables (recall that variable xi is irrelevant if there is no input string where flipping xi changes the value of f ); (ii) Identifying input variables (e.g. the function f (x1 , x1 , x3 ) is obtained by identifying variable x2 with x1 ); and (iii) Negating input variables. (C is closed under ID-Neg Minors) Let C = ∪n≥1 Cn be a class of Boolean functions, let f ∈ Fn , and let f ′ ∈ Fn′ . We say that C is closed under ID-Neg Minors if the following holds: If f ∈ Cn and f ′ is an ID-Neg Minor of f , then f ′ ∈ C. The class of GF (2) degree-d polynomials is an example of a class closed under ID-Neg minors. The class of halfspaces is an example of a class closed under Noisy-Neg minors. For more examples and discussion, see Section A. We close this preliminaries section with two definitions that will be useful: Definition 3 Let f be a function in Fn and let F+ , F− be two disjoint subsets of [n]. We define Noisy(f, F+ , F− ) ∈ Fn to be the function Noisy(f, F+ , F− )(x1 , . . . , xn ) = f (t1 , . . . , tn ), where ti := 1 if i ∈ F+ , ti := 0 if i ∈ F− ; and ti := xi otherwise. Intuitively, given a target function f , our canonical tester for classes C that are closed under Noisy-Neg minors will choose F+ , F− according to some distribution (defined later) and will instead test the target function f ′ = Noisy-Neg(f, F+ , F− ). Definition 4 Let f be a function in Fn , F+ and F− be two disjoint subsets of [n], and id be an element of F+ . For n′ = n − |F+ | − |F− | + 1, we define the function ID-Neg(f, F+ , F− , id) ∈ Fn′ to be the function ID-Neg(f, F+ , F− , id)(x1 , . . . , xn′ ) = f (t1 , . . . , tn ), where ti := xid if i ∈ F+ ; ti := xid if i ∈ F− ; and ti := xi otherwise. Similarly to the case above, given a target function f our canonical tester for classes C that are closed under ID-Neg minors will choose F+ , F− , id according to some distribution (defined later) and will instead test the target function f ′ = ID-Neg(f, F+ , F− , id). 5 2.1 The testing algorithms we can canonicalize: Independent Testers Definition 5 A q(ǫ)-query independent tester for class C is a probabilistic oracle machine T = (T1 , T2 ) which takes as input a distance parameter ǫ and is given access to a black-box oracle for an arbitrary function f : {0, 1}n → {0, 1}. (First Stage) The query generation algorithm T1 chooses q(ǫ) query strings in the following way: To choose the i-th string, the algorithm partitions the set [n] into 2i−1 blocks. The block Bb1 ,...,bi−1 contains those indices that were set to bj in the jth query string xj for all j = 1, . . . , i − 1. For each block Bb1 ,...,bi−1 , for each m ∈ Bb1 ,...,bi−1 , the algorithm sets xim to 1 with probability pb1 ,...,bi and to 0 with probability 1 − pb1 ,...,bi . The resulting string xi is the i-th query string. After choosing all the strings, T1 queries all q(ǫ) strings x1 , . . . , xq(ǫ) and gets back responses f (x1 ), . . . , f (xq(ǫ) ). (Second Stage) The computation stage T2 gets as input the q(ǫ) query-answer pairs (x1 , f (x1 )), . . . , (xq(ǫ) , f (xq(ǫ) )), does some deterministic computation on this input, and outputs either “accept” or “reject.” In an independent tester the query generation algorithm T1 must satisfy the following conditions: • For each string b = (b1 , . . . , bt ) the probability pb = pb (ǫ) is a value 0 ≤ pb ≤ 1 (which may depend on ǫ but is independent of n). • For each t, the 2t values pb1 ,...,bt (as b ranges over {0, 1}t ) are all rational numbers, and (over all t) the denominator of each of these rational numbers is at most c = c(ǫ) (c may depend on ǫ but is independent of n). We say that c(ǫ) is the granularity of the independent tester T. If T is a one-sided tester then for any f : {0, 1}n → {0, 1}, if f belongs to C then Pr[T f = “accept”] = 1, and if f is ǫ-far from C then Pr[T f = “reject”] ≥ r(ǫ), where r(ǫ) > 0 is a positive-valued function of ǫ only. We say that r(ǫ) is the rejection parameter of the tester. If T is a two-sided tester then for any f : {0, 1}n → {0, 1}, if f belongs to C then Pr[T f = “accept”] = 1 − a(ǫ), and if f is ǫ-far from C then Pr[T f = “reject”] ≥ r(ǫ) where a and r are functions of ǫ only and for 0 < ǫ < 1/2, a(ǫ) < r(ǫ). We say that a(ǫ) and r(ǫ) are the acceptance and rejection parameters of the tester respectively. Given an independent tester T as described above, we let Prod(ǫ) denote the product of the denominators of all probabilities pb1 ,...,bt (ǫ) where t ranges over all possible values 1, 2, . . . , q(ǫ) and b = (b1 , . . . , bt ) q(ǫ)+1 ranges over all t-bit strings. If the tester T is c(ǫ)-granular, it is easy to see that Prod(ǫ) is at most c(ǫ)2 . It is clear that each subset Bb1 ,...,bt of [n] described above has size binomially distributed according to B(n, ℓ/Prod(ǫ)) for some integer ℓ. 3 A canonical form for testers, and our main results Before stating our main results precisely, we give a precise description of the canonical form mentioned in Section 1.1. Definition 6 Let q ′ : [0, 1) → N. A q ′ -canonical tester for class C is a probabilistic oracle machine T = (T1 , T2 ) which takes as input a distance parameter ǫ and is given access to a black-box oracle for an arbitrary function f : {0, 1}n → {0, 1}, and performs as follows. Given input parameter ǫ, the query generation algorithm T1 works as follows. ′ 1. z 1 , . . . , z q (ǫ) are selected to be independent uniformly random n-bit strings. These strings define a ′ partition B of [n] into 2q (ǫ) blocks: an element i ∈ [n] lies in block Bb1 ,...,bq′ (ǫ) if the i-th bit of string z j equals bj for all j = 1, . . . , q ′ (ǫ). 6 2. Let QB ⊆ {0, 1}n be the set of all strings x such that the following condition holds: ∀i, j ∈ [n], if i and j are in the same partition subset Bb1 ,...,bq′ (ǫ) ∈ B then xi = xj . 3. Using the oracle for f , T1 queries all 22 q ′ (ǫ) strings x ∈ QB . q ′ (ǫ) The computation stage T2 gets as input the 22 query-answer pairs [(x, f (x))]x∈QB , does some deterministic computation, and outputs either “accept” or “reject.” The success criteria for one-sided (two-sided, respectively) canonical testers are entirely similar to the q ′ (ǫ) criteria defined above for independent testers. We note that a q ′ -canonical tester makes 22 queries when run with input parameter ǫ. 3.1 Main results As our main results, we show that (i) any class that is closed under Noisy-Neg minors and is constantquery testable by a (two-sided) independent tester is also constant-query testable by a (two-sided) canonical tester; and (ii) any class that is closed under ID-Neg minors and is constant-query testable by a one-sided independent tester is also constant-query testable by a one-sided canonical tester. More precisely, we prove the following: Theorem 3 Let C be any class of functions closed under Noisy-Neg Minors and let g(ǫ) be as in Definition 1. Let T be a q(ǫ)-query independent tester for property C with acceptance and rejection parameters a(ǫ), r(ǫ). Let q2′ (ǫ) be the smallest integer value that satisfies the following bound: N S Prod(ǫ) · 2 Let η ′ = q ′ (ǫ) Let rejection ′ 2q2 (ǫ) mod Prod(ǫ) ′ 1 q ′ (ǫ) 2 2 (C) ≤ r(ǫ) − a(ǫ) . 16q(ǫ) where Prod is as defined in Section 2.1and let q1′ (ǫ) = l 32 N Sη′ (C) m 8 . ln r(ǫ)−a(ǫ) 2q2 (ǫ) ′ = q2 (ǫ) · (q1′ (ǫ) + 1). Then there is a q ′ -canonical tester Canon(T ) for C with acceptance parameters a′ (ǫ), r′ (ǫ), where a′ (ǫ) = 43 a(ǫ) + 14 r(ǫ), and r′ (ǫ) = 14 a(ǫ) + 34 r(ǫ). and Theorem 4 Let C be any class of functions closed under ID-Neg Minors. Let T be a one-sided independent tester for property C that has query complexity q(ǫ), granularity c(ǫ), and rejection parameter r(ǫ). Let r(ǫ) ǫ1 = 4q(ǫ) and let q ′ (ǫ) be a defined as q ′ (ǫ) = ⌈log(Prod(ǫ) · Prod(ǫ1 ))⌉ where Prod is as described in Section 2.1. Then there is a one-sided q ′ -canonical tester Canon(T ) for property C which, on input 3r(ǫ)/4 ) · r(ǫ1 ). parameter ǫ, has rejection parameter ( 1−r(ǫ)/4 Throughout the rest of the paper whenever we write “T ” or “C” without any other specification, we are referring to the tester and property from Theorem 3 or Theorem 4 (which one will be clear from context). Applications of our main results. We can canonicalize known testers for many constant-query testable Boolean function classes found in the literature by applying either our first or second result. Because of space constraints we describe these applications in Appendix A. 4 Overview of the proofs of Theorems 3 and 4 In this section we give a high-level explanation of our arguments and of the constructions of our canonical testers. Full details and complete proofs of Theorems 3 and 4 are given in the Appendix. 7 We first note that an execution of the Independent Tester T = (T1 , T2 ) (see Definition 5) with input parameter ǫ creates a 2q(ǫ) -way partition of the n variables by independently assigning each variable to a randomly chosen subset in the partition with the appropriate probability (of course these probabilities need not all be equal). All queries made by the independent tester then respect this partition. Consider the following first attempt at constructing a canonical tester Canon(T ) = (Canon(T )1 , Canon(T )2 ) ′ from an independent tester T . In the first stage, Canon(T )1 partitions the n variables into 2q subsets of ′ expected size n/2q , as specified in Definition 6, and makes all corresponding queries; it then passes both ′ the queries and the responses to the second stage, Canon(T )2 . The value q ′ will be such that 2q equals Prod(ǫ) · k + rem, where 0 ≤ rem < Prod(ǫ), and k is a positive integer. In the second stage, Canon(T )2 chooses the first Prod(ǫ) · k subsets of Canon(T )1 ’s partition (let us say these subsets collectively contain n′ variables) and ignores the variables in the last rem subsets. For the n′ variables contained in these first Prod(ǫ) · k subsets, Canon(T )2 can perfectly simulate a partition created by an execution of the indepen′ dent tester T on these n′ variables with parameter ǫ, by “coalescing” these Prod(ǫ) · k subsets into 2q (ǫ) subsets of the appropriate expected sizes. (To create a subset whose size is binomially distributed according to B(n′ , ℓ/Prod(ǫ)), Canon(T )2 “coalesces” a collection of kℓ of the Prod(ǫ) subsets.) To simulate each of the q = q(ǫ) queries that T makes, Canon(T )2 sets the n′ variables as T1 would set them given this partition. Obviously, the problem with the above simulation is how to set the extra variables in the remaining rem subsets in each of the q queries. The n′ variables described above are faithfully simulating the distribution over query strings that T would make if it were run on an n′ -variable function with input parameter ǫ, but of course the actual queries that Canon(T ) makes have the additional rem variables, and the corresponding responses are according to the n-variable function f. Thus, we have no guarantee that T2 will answer correctly w.h.p. when executed on the query-response strings generated by the simulator Canon(T )2 as described above. Nevertheless, the simulation described above is a good starting point for our actual construction. The underlying idea of our canonical testers is that instead of (imperfectly) simulating an execution of the independent tester on the actual n-variable target function f , the canonical tester perfectly simulates an execution of the independent tester on a related function f ′ . Our analysis shows that due to the special properties of the independent tester and of the classes we consider, the response of the independent tester on target function f ′ is also a legitimate response for f . Below we describe the construction of a canonical tester for two different types of independent testers and classes. The first construction shows how to transform T , where T is a two-sided independent tester for a class C that is closed under Noisy-Neg Minors, into Canon(T ), a two-sided canonical tester for class C. The second construction shows how to transform T , where T is a one-sided independent tester for a class C closed under ID-Neg minors, into Canon(T ), a one-sided canonical tester for C. 4.1 Construction for two-sided independent testers and classes closed under Noisy-Neg Minors We first note that it is easy to construct an algorithm that approximates N Sη (f ) of a target function f by non-adaptively drawing pairs of points (x, y) where x is chosen uniformly at random and y is 1 − 2η correlated with x. It is also easy to see that if η is a rational number c1 /c2 where c2 is a power of 2, then the distribution over queries made by such an algorithm can be simulated using a canonical tester. For ease of understanding we view our second canonical tester as having two parts (it will be clear that these two parts can be straightforwardly combined to obtain a canonical tester that follows the template of Definition 6). The first part is an algorithm that approximates N Sη′ (f ) and rejects any f for which N Sη′ (f ) is noticeably higher than N Sη′ (C) (here η ′ is a parameter of the form (integer)/(power of 2) that will be specified later). 8 The second part of the tester simulates the partition generated by the independent tester T as described at the start of Section 4. Let F+ contain the variables assigned to the first rem/2 subsets from the rem “remaining” subsets, and let F− contain the variables assigned to the last rem/2 of those subsets. As a thought experiment, we may imagine that the variables in F+ ∪ F− are each independently assigned to a randomly selected one of the Prod(ǫ) partition subsets with the appropriate probability. In this thought experiment, we have perfectly simulated a partition generated by running T1 over an n-variable function. We now define f ′ based on the subsets F+ and F− . The function f ′ is simply the restriction of f under which all variables in F+ are fixed to 1 and all variables in F− are fixed to 0. Now, we would like Canon(T ) to generate the q query-answer pairs for f that T1 would make given the partition from the thought experiment described above. While Canon(T ) cannot do this, a crucial observation is that Canon(T ) can perfectly simulate q query-answer pairs that T1 would make given the above-described partition where the answers are generated according to f ′ . Moreover, our analysis will show (using the fact that C is closed under negation) that we may assume w.l.o.g. that each of these q queries is individually uniformly distributed over {0, 1}n . Thus for each individual (uniform random) query string x, we have that f ′ (x) is equivalent to f (y) ′ where y is a random string that is (1 − 2η ′ )-correlated with x, where η ′ = rem/2q . Now since N Sη′ (C) depends only on η ′ , by choosing η ′ small enough (and q ′ large enough), we have by a union bound that with high probability f (x) equals f ′ (x) for all the queries x that were generated. Since this is the case, then T2 must with high probability generate the same output on target function f and f ′ . So since T is (by hypothesis) an effective tester for f it must be the case that T ’s responses on f ′ are also ”good” for f . 4.2 Construction for one-sided independent testers and classes closed under ID-Neg minors Our second canonical tester construction also begins by simulating a partition of the independent tester T over n′ variables as described above. However, now we will think of the parameters as being set somewhat ′ differently: we view the canonical tester as partitioning the n variables into 2q (ǫ) subsets where now q ′ = ′ r(ǫ) q ′ (ǫ) is such that 2q equals Prod(ǫ1 ) · k + rem, where ǫ1 ≪ ǫ (more precisely ǫ1 = 4q(ǫ) , though this exact expression is not important for now), 0 ≤ rem < Prod(ǫ), and k is a positive integer. The canonical tester q ′ (ǫ) then defines a new function f ′ over n′ variables by applying an operator F ǫ1 to the set X of 22 query strings that Canon(T )1 generates; we now describe how this operator acts. Let F+ contain the variables assigned to the first rem/2 subsets from the rem “remaining” subsets, and let F− contain the variables assigned to the last rem/2 of the rem subsets. Given f , the function f ′ that is obtained by applying F ǫ1 to X is chosen in the following way: f ′ is the same as f except that • A variable xid is chosen by taking the lexicographically first element of F+ ; • All variables in F+ are identified with xid , and all variables in F− are identified with xid . Canon(T ) places id at random in one of the remaining partition subsets and then selects the appropriate set of query strings that T1 would make given the simulated partition over n′ variables described above, and constructs query-answer pairs for these strings in which the answers are the corresponding values of f ′ on these strings (note that similar to the crucial observation in Section 4.1, it is indeed possible for Canon(T ) to do this). Finally, Canon(T ) passes these queries and responses to T2 and responds as T2 does. The proof of correctness of this canonical tester proceeds in two parts. First, we show that with high probability Canon(T ) successfully tests target function f ′ . (This is an easy consequence of the fact, mentioned above, that Canon(T ) perfectly simulates T ’s partition over the n′ variables.) Second, we show that (1) if f ∈ C and C is closed under ID-Neg minors then f ′ ∈ C; and (2) if f is ǫ-far from C then w.h.p. f ′ is ǫ1 -far from C, where the value of ǫ1 depends only on ǫ. We note that (2) does not hold in general for f, f ′ 9 where f ′ is an arbitrary ID-Neg minor of f . However, our analysis shows that assuming that there exists a one-sided independent tester for class C, (2) holds for f ′ chosen in the way described above. Organization of the rest of the paper. Appendix A presents applications of our main results. In Appendix B we describe how to normalize a tester, which is a useful preliminary stage employed in both of our constructions. Then in Appendix C and Appendix D we present the construction for classes closed under Noisy-Neg minors and the analysis. Finally, in Appendix E and Appendix F we present the construction for classes closed under ID-Neg minors and the analysis. Conclusions. Our work is the first attempt we know of to establish a canonical form for Boolean function property testers. Our results show that a wide range of efficient testing algorithms for many well-studied Boolean function classes can be transformed into a natural canonical form. Building on our work, it would be nice to have more general results that do not require bounded noise sensitivity or closure under ID-Neg minors, or alternately to have examples showing that some such conditions are necessary for canonicalization. Another natural goal is to improve the quantitative bounds that we obtain. References [AFNS06] N. Alon, E. Fischer, I. Newman, and A. Shapira. A combinatorial characterization of the testable graph properties: It’s all about regularity. In Proc. STOC, 2006. [AKK+ 03] N. Alon, T. Kaufman, M. Krivelevich, S. Litsyn, and D. Ron. Testing low-degree polynomials over GF(2). In Proc. RANDOM, pages 188–199, 2003. [AS05a] Noga Alon and Asaf Shapira. A characterization of the (natural) graph properties testable with one-sided error. In Proc. FOCS, pages 429–438, 2005. [AS05b] Noga Alon and Asaf Shapira. Every monotone graph property is testable. In Proc. STOC, pages 128–137, 2005. [AT08] Tim Austin and Terry Tao. On the testability and repair of hereditary hypergraph properties. Submitted to Random Structures and Algorithms, 2008. [BCH+ 96] M. Bellare, D. Coppersmith, J. Hastad, M. Kiwi, and M. Sudan. Linearity testing in characteristic two. IEEE Trans. on Information Theory, 42(6):1781–1795, 1996. [BGS98] M. Bellare, O. Goldreich, and M. Sudan. Free bits, pcps and non-approximability-towards tight results. SIAM J. Comput., 27(3):804–915, 1998. [Bla08] Eric Blais. Improved bounds for testing juntas. In Proc. RANDOM, pages 317–330, 2008. [Bla09] Eric Blais. Testing juntas nearly optimally. In Proc. 41st Annual ACM Symposium on Theory of Computing (STOC), pages 151–158, 2009. [Bla10] Eric Blais. Personal communication. 2010. [BLR93] M. Blum, M. Luby, and R. Rubinfeld. Self-testing/correcting with applications to numerical problems. J. Comp. Sys. Sci., 47:549–595, 1993. Earlier version in STOC’90. 10 [DLM+ 07] I. Diakonikolas, H. Lee, K. Matulef, K. Onak, R. Rubinfeld, R. Servedio, and A. Wan. Testing for concise representations. In Proc. 48th Ann. Symposium on Computer Science (FOCS), pages 549–558, 2007. [FKR+ 04] E. Fischer, G. Kindler, D. Ron, S. Safra, and A. Samorodnitsky. Testing juntas. J. Computer & System Sciences, 68(4):753–787, 2004. [GOS+ 09] P. Gopalan, R. O’Donnell, R. Servedio, A. Shpilka, and K. Wimmer. Testing Fourier dimensionality and sparsity. In Proc. 36th International Colloquium on Automata, Languages and Programming (ICALP), pages 500–512, 2009. [GT03] Oded Goldreich and Luca Trevisan. Three theorems regarding testing graph properties. Random Structures and Algorithms, 23(1):23–57, August 2003. [HR05] Lisa Hellerstein and Vijay Raghavan. Exact learning of DNF formulas using DNF hypotheses. Journal of Computer & System Sciences, 70(4):435–470, 2005. [MORS10] K. Matulef, R. O’Donnell, R. Rubinfeld, and R. Servedio. Testing halfspaces. SIAM J. Comp., 39(5):2004–2047, 2010. [PRS02] M. Parnas, D. Ron, and A. Samorodnitsky. Testing basic boolean formulae. SIAM J. Disc. Math., 16:20–46, 2002. [Sam07] A. Samorodnitsky. Low-degree tests at large distances. In Proc. 39th ACM Symposium on the Theory of Computing (STOC’07), pages 506–515, 2007. [Val08] P. Valiant. Testing Symmetric Properties of Distributions. PhD thesis, M.I.T., 2008. A Applications of our main results. In this section, we discuss some of the applications of our two main results (also see Table A). Recall that Theorem 4 applies to classes closed under ID-Neg minors and to one-sided independent testers, while Theorem 3 applies to classes closed under Noisy-Neg minors and to (two-sided) independent testers. Many natural classes of Boolean functions that have been studied in the property testing literature are closed under either ID-Neg minors or Noisy-Neg minors. Classes closed under ID-Neg minors include J-juntas [FKR+ 04, Bla08, Bla09] (for J independent of n), s-term DNFs [DLM+ 07] (for s independent of n), halfspaces [MORS10], GF (2)-linear functions (i.e. parities and their negations) [BLR93], and degree-d GF (2) polynomials [AKK+ 03]. Classes closed under Noisy-Neg minors include all the classes of Boolean functions for which testing algorithms were given in [DLM+ 07]; these include decision lists, size-s decision trees, size-s branching programs, s-term DNFs, size-s Boolean formulas, s-sparse polynomials over GF (2), size-s Boolean circuits and functions with Fourier degree at most s, where throughout s is independent of n. For each of these classes, using the fact that functions in the class are well-approximated by juntas (which is the key to the testing results of [DLM+ 07]) it is not difficult to verify that the class has low noise sensitivity. We further note that most of the Boolean function property testers found in the literature can be characterized as independent testers. These include the 1-sided “size test” for testing J-juntas of [FKR+ 04], the (nonadaptive version of) the junta tester of [Bla09], the parity tester of [BLR93], the testers of [PRS02] for Boolean literals, conjunctions and s-term monotone DNFs, the general “testing by implicit learning” algorithm of [DLM+ 07] for the classes listed above, and the tester of [MORS10] for halfspaces. There are several classes that are both closed under Noisy-Neg minors and additionally are known to have an independent tester, and so our Theorem 3 can be applied to “canonicalize” these algorithms; these include the testers of [DLM+ 07] and [MORS10] for the classes listed above. 11 Since our ID-Neg result (Theorem 4) requires independent testers that are also one-sided, the setting is more restrictive, but there are still several testers and classes found in the literature that satisfy our requirements. These include the tester for singletons of [PRS02], the GF (2)-linear function tester of [BLR93], the [AKK+ 03] tester for degree-d GF (2) polynomials, and the one-sided junta testers given by [FKR+ 04]. Thus, we can canonicalize known testers for many constant-query testable Boolean function classes found in the literature by applying either our first or second result. One exception comes from [GOS+ 09]; that work gives testing algorithms for the class of Boolean functions with Fourier dimension d (i.e. for “d-juntas of parities”) and for the class of Boolean functions with s-sparse Fourier spectra. It is easy to see that these classes are not closed under Noisy-Neg minors, since each class contains all parity functions, and the testers of [GOS+ 09] do not have 1-sided error. (However, we note that inspection of the testers provided in [GOS+ 09] shows that they can be straightforwardly “canonicalized” just like the [AKK+ 03] tester discussed in Section 1.1.) Function class literals (dictators) Reference [PRS02] GF (2)-linear functions GF (2)-deg-d functions J-juntas decision lists; size-s decision trees; size-s branching programs; s-term DNFs; size-s Boolean formulas; s-sparse GF (2) polynomials; size-s Boolean circuits; functions with Fourier degree d halfspaces functions with Fourier dimension d; functions with s-sparse Fourier spectra 1-sided YES ID-Neg YES Noisy-Neg YES [BLR93] YES YES NO Theorem Thm. Thm. 4 Thm. 4 [AKK+ 03] YES YES NO Thm. 4 (some) YES YES NO YES YES Thm. Thm. 4 Thm. 4 NO NO YES NO YES NO [FKR+ 04, Bla09] [DLM+ 07] Bla08, [MORS10] [GOS+ 09] 3, 3, Thm. 4 Table 1: An overview of some Boolean function property testing results in the literature and how they relate to our results. All of the testing algorithms listed in the table are independent testers in the sense of Definition 5. The parameters J, s, d are always viewed as independent of n. The “Theorem” column indicates which of our theorems yields a canonical tester. The testers of [GOS+ 09] are the only algorithms that cannot be canonicalized using either Theorem 4 or Theorem 3; these testers can be verified to already essentially be in canonical form. 12 B Normalizing a tester In this section we explain how any independent tester T can be “normalized;” having independent testers that satisfy this normalization condition will be useful in both our constructions that follow. We will use the following lemma: Lemma 1 Let T and C be as described in Theorem 3 (resp. Theorem 4). Let S ⊆ [n] be any subset of [n]. Then the following algorithm T (S) = (T (S)1 , T (S)2 ) (with input parameter ǫ) is also a two-sided (resp. one-sided) non-adaptive tester for C with the same query and success parameters as C. T (S)1 works as follows: 1. Run T1 with input parameter ǫ to generate the set of query strings Q. 2. For each query string x ∈ Q, if i ∈ S then flip the bit xi . Let Q′ be the resulting set of query strings. 3. Submit the query strings in Q′ to the oracle for f and receive responses R. T (S)2 works by running T2 with the set of queries Q and the responses R. Proof: Let f be the function that is being tested. Define the following function f ∗ : f ∗ (x1 , . . . , xn ) = f (t1 , . . . , tn ) where ti = xi for i ∈ S and ti = xi for i ∈ / S. Thus the responses R correspond to the output of f ∗ on the strings in Q. If f belongs to C, then f ∗ also belongs to C since C is closed under negating input variables. Since the ∗ pair (Q, R) is consistent with f ∗ , we have Pr[(T ′ )f = “accept”] = Pr[T f = “accept”] as required. On the other hand, if f is ǫ-far from C, then f ∗ is ǫ-far from C. (If this were not the case, then f ∗ would be ǫ-close to some function g ∗ ∈ C. Then the function g = (g ∗ )∗ would belong to C, and would be such ∗ that f is ǫ-close to g.) Thus, Pr[(T (S))f = “reject”] = Pr[T f = “reject”] as required. Given T ,C as in Theorem 3 (resp. Theorem 4), we consider the following tester T ′ . (Intuitively, in each of its executions T ′ randomly chooses the set S and runs T (S) with the chosen S.) The first phase (T ′ )1 works as follows: 1. Choose a subset S ⊆ [n] uniformly at random (i.e. each i ∈ [n] has independent probability 1/2 of being in S). 2. Run T (S)1 with input parameter ǫ to generate the set of queries Q′ . 3. Submit the queries in Q′ to the oracle and receive responses R. The second phase (T ′ )2 works as follows: 1. Run T (S)2 with the set of queries Q′ and the responses R. Note that by Lemma 1, T ′ is a one-sided (resp. two-sided) non-adaptive tester for class C. In fact, T ′ is a one-sided (resp. two-sided) independent tester for C since it can be viewed in the following way: T ′ chooses the a first query string according to probability p∅ = 1/2. Thus T ′ starts off with an initial partition of [n] into two subsets B0 , B1 . Now, T ′ executes T on the set B0 and executes the “negation” of T (i.e. all probability values pb are replaced with 1 − pb ) on the set B1 . So henceforth it will be convenient to view T ′ as a one-sided (resp. two-sided) independent tester for C, with a first query string whose probability p∅ is set to 1/2. 13 C Construction of Canon(T ) for classes closed under Noisy-Neg Minors In this section we describe how the canonical tester Canon(T ) for C is constructed from T (or more precisely, from T ′ ). Before going into details we give the idea behind the construction. We describe the canonical tester as two separate testers that are run sequentially. It should be obvious how the two testers can be combined into one tester that satisfies the canonical tester definition given in Section 3. The first part of Canon(T ) is denoted by NoiseTest(T ) and the second part of Canon(T ) is denoted by Simulator(T ). Canon(T ) outputs “reject” iff either NoiseTest(T ) outputs “reject” or Simulator(T ) outputs “reject”, and outputs “accept” iff both NoiseTest(T ) and Simulator(T ) output “accept.” We now describe NoiseTest(T ) = (NoiseTest(T )1 , NoiseTest(T )2 ): Description of NoiseTest(T )1 for class C running on input function f with input parameter ǫ: 1. Let Prod(ǫ) be as defined in Section 2.1 for the tester T ′ with input parameter ǫ. 2. We define q ′ (ǫ) to be the smallest integer value that satisfies the following bound: 2N S Prod(ǫ) · 2 1 ′ 2q (ǫ) (Simulator(T ) will subsequently make 22 (C) · (q(ǫ)) ≤ q ′ (ǫ) r(ǫ) − a(ǫ) 8 queries.) q (ǫ) mod Prod(ǫ). (We note that η ′ < Prod(ǫ) , since rem < 3. Define η ′ = 2·2rem ′ q ′ (ǫ) , where rem = 2 2·2q (ǫ) Prod(ǫ). We further recall that N Sδ (f ) decreases as δ decreases for every non-constant function f ; this is well known and is easy to verify from the Fourier expression for noise sensitivity.). l m 8 4. Let m = N S32′ (C) ln r(ǫ)−a(ǫ) . (This choice of m will be used in a Chernoff bound later in the ′ η analysis; note that it gives exp((−1/16) · N Sη′ (C) · m/2) ≤ r(ǫ)−a(ǫ) . 8 ) 5. Let Q be a set of pairs (x1 , y 1 ), . . . , (xm , y m ), where each x is chosen uniformly at random, and y is 1 − 2η ′ correlated with x. 6. Submit these m pairs of queries and receive the set of responses R: (f (x1 ), f (y 1 ), . . . , (f (xm ), f (y m )). Description of NoiseTest(T )2 for class C running on input function f with input parameter ǫ, queries Q and responses R: Output “reject” if the number of pairs such that [f (xi ) 6= f (y i )] is at least (3/2)N Sη′ (C)· m, and output “accept” otherwise. This concludes our description of NoiseTest(T ). Before defining Simulator(T ), we define two useful distributions: • Simulator’s Queries: Snǫ is the distribution over sets X of 22 that a q ′ -canonical tester makes with parameter ǫ. q ′ (ǫ) query strings to n-variable functions • Independent Tester’s Queries: Inǫ is the distribution over sets of queries Y to n-variable functions that the independent tester T ′ makes with input parameter ǫ. We are now ready to define Simulator(T ) = (Simulator(T )1 , Simulator(T )2 ): Description of Simulator(T )1 running on input function f with parameter ǫ: 14 1. Draw X ∼ Snǫ . As described in Step 1 of Definition 6, this draw of X induces a partition of [n] ′ into 2q (ǫ) subsets: P = {P1 , . . . , P2q′ (ǫ) }. Note that for each fixed j, each variable is independently placed in subset Pj with probability 2q1′ (ǫ) . 2. Ask all 22 q ′ (ǫ) queries x ∈ X to the oracle and receive responses [f (x)]x∈X . Description of Simulator(T )2 running on input parameter ǫ with queries X and responses [f (x)]x∈X : 1. Given X as above, draw Y ∼ AltInǫ (X) as follows. We will later show that for X ∼ Snǫ , a set of queries Y generated by this draw is distributed identically to a set of queries Y generated by a draw from Inǫ . Moreover, Simulator(T ) can always respond correctly to queries in y ∈ Y with the value f ′ (y) by returning f (x) for some x ∈ X (as described below). ′ • Let Prod(ǫ1 ) be as described in Section 4 and let rem denote 2q (ǫ) mod Prod (note that rem is always divisible by 2 due to the way T ′ chooses the set S). Srem/2 • Choose the first subsets P1 , . . . , Prem of the partition P defined by X. Let F+ = i=1 Pi Srem and let F− = rem i=rem/2+1 Pi . Prod(ǫ) • Partition F+ multinomially into Prod(ǫ) subsets F+ = {F+1 , . . . , F+ }, where each variable in F+ is assigned to subset F+i , for 1 ≤ i ≤ Prod(ǫ) independently with probability 1/Prod(ǫ). Prod(ǫ) • Partition F− multinomially into Prod(ǫ) subsets F− = {F−1 , . . . , F− }, where each varii able in F− is assigned to subset F− , for 1 ≤ i ≤ Prod(ǫ) independently with probability 1/Prod(ǫ). • Recall that a run of T ′ with parameter ǫ induces a partition of [n] into 2q(ǫ) subsets. For each 1 ≤ i ≤ 2q(ǫ) , let ki be such that ki · n/Prod(ǫ) is the expected size of the i-th subset in this q ′ (ǫ) 2 partition. Let K equal ⌊ Prod(ǫ) ⌋. • We now describe how to construct a partition R = {R1 , . . . , R2q(ǫ) }. To construct subset R1 , remove the first Kk1 subsets Prem+1 , . . . , Prem+Kk1 remaining in the partition P, the first k1 subsets F+1 , . . . , F+k1 in the partition F+ , and the first k1 subsets F−1 , . . . , F−k1 in the partition F− , and place the elements of each of these sets in R1 . To construct subset Ri , for 2 ≤ i ≤ 2q(ǫ) , remove the first Kki remaining subsets in P, the first ki remaining subsets in F+ , and the first ki remaining subsets in F− and place the elements of each of these sets in Ri . • Let Y be the set of queries asked by T ′ running with input parameter ǫ given the partition R. • Simulator(T ) answers queries in Y of the form (x1 , . . . , xn ) with f ′ (x1 , . . . , xn ) where the function f ′ (x1 , . . . , xn ) equals Noisy(f, F+ , F− )(x1 , . . . , xn ) (recall Definition 3). We denote by F ǫ the (randomized) operator that takes input X and returns the triple (f ′ , F+ , F− ) 2. Simulator(T )2 hands the query strings and responses to T2′ and outputs whatever T2′ does. The reader may have noticed that according to the above description the procedure Simulator(T )2 is a randomized algorithm, whereas our definition of a canonical tester requires that the “computation stage” be a deterministic algorithm. However, it is easy to derandomize the computation stage by directly applying the argument of Section 4.2.3 of [GT03]. 15 D Analysis of Canon(T ) for classes closed under Noisy-Neg Minors Theorem 5 Canon(T ) is a canonical property tester for C which makes 22 following: q ′ (ǫ) queries and satisfies the • if f belongs to C then Pr[Canon(T )f = “accept”] ≥ 1 − a′ (ǫ), and • if f is ǫ-far from C then Pr[Canon(T )f = “reject”] ≥ r′ (ǫ), where a′ (ǫ) = 3a(ǫ)/4 + r(ǫ)/4 and r′ (ǫ) = 3r(ǫ)/4 + a(ǫ)/4. Theorem 5 is proved via the following three intermediate lemmas: Lemma 2 If N Sη′ (f ) ≥ 2N Sη′ (C) then NoiseTest(T ) outputs “accept” with probability at most If N Sη′ (f ) ≤ N Sη′ (C) then NoiseTest(T ) outputs “reject” with probability at most r(ǫ)−a(ǫ) . 8 r(ǫ)−a(ǫ) . 8 This is a straightforward consequence of the monotonicity of N Sδ as a function of δ, standard Chernoff bounds, and the choice of m in Step 4 of NoiseTest(T )1 . Lemma 3 The distributions Inǫ and X ∼ Snǫ , AltInǫ (X) are identical. This lemma follows directly from inspection of the procedure described in Step 1 of Simulator(T )2 to obtain a draw from AltInǫ (X). The i-th subset in a partition induced by a set of queries Y drawn from Inǫ is a random subset of [n] obtained by independently including each variable with probability ki · n/Prod(ǫ), and it is straightforward to check that the same is true for Y drawn from AltInǫ (X); similar equivalences are easily seen to be true for all of the other subsets, and indeed for all collections of subsets. To state the third lemma we need the following definition and claim: Indep(n, p1 , p2 ) is a distribution over triples (f ′ , F+ , F− ), where f ′ is an n-variable function and F+ , F− ⊆ [n] are disjoint sets. A draw from Indep(n, p1 , p2 ) is obtained in the following way: • Each index i ∈ [n] is independently placed in F+ with probability p1 , placed in F− with probability p2 , and placed in neither set with probability 1 − p1 − p2 . • f ′ is set to be Noisy(f, F+ , F− )(x1 , . . . , xn ). Claim 4 The following two distributions are identical: • Draw Y ∼ (X ∼ Snǫ , AltInǫ (X)). Draw (f ′ , F+ , F− ) ∼ Indep(n, η ′ = output (Y, f ′ , F+ , F− ); and rem , η′ ′ 2q (ǫ)+1 = rem ) ′ 2q (ǫ)+1 and • Draw X ∼ Snǫ . Draw Y ∼ AltInǫ (X) and output (Y, F ǫ (X)). This claim follows from inspection of the specification of the operator F ǫ (X) given in Step 1 of the description of Simulator(T )2 . Now we give the third lemma needed to prove Theorem 5: Lemma 5 If N Sη′ (f ) ≤ 2N Sη′ (C) then Pr ǫ ,(f ′ ,F ,F )=F ǫ (X),Y ∼AltI ǫ (X) X∼Sn + − n [f (y) 6= f ′ (y) for some y ∈ Y ] ≤ 16 r(ǫ) − a(ǫ) . 8 Proof: By Claim 4 we have that Pr ǫ ,(f ′ ,F ,F )=F ǫ (X),Y ∼AltI ǫ (X) X∼Sn + − n = [f (y) 6= f ′ (y) for some y ∈ Y ] Pr ǫ ,AltI ǫ (X)),(f ′ ,F ,F )∼Indep(n,η ′ ,η ′ ) Y ∼(X∼Sn + − n [f (y) 6= f ′ (y) for some y ∈ Y ]. Thus, it is sufficient to consider Pr ǫ ,AltI ǫ (X)),(f ′ ,F ,F )∼Indep(n,η ′ ,η ′ ) Y ∼(X∼Sn + − n [f (y) 6= f ′ (y) for some y ∈ Y ]. We first observe that each individual query y ∈ Y above is uniformly distributed. Now given an individual query y ∈ Y , we define y ′ = y1′ , . . . , yn′ where yi′ = 1 if i ∈ F+ , yi′ = −1 if i ∈ F− and yi′ = yi otherwise. Note that by definition of f ′ we have f ′ (y) = f (y ′ ). Moreover, since variables are placed in F+ (resp. F− ) ′ ′ ′ and set to 1 (resp. −1) independently with probability 2qrem ′ (ǫ)+1 = η , we have that y and y are (1 − 2η )r(ǫ)−a(ǫ) 8q(ǫ) , we have that for each individual r(ǫ)−a(ǫ) 8q(ǫ) . By a union bound over all y ∈ Y , . is at most r(ǫ)−a(ǫ) 8 correlated. Thus, since by assumption N Sη′ (f ) ≤ 2N Sη′ (C) ≤ query y, the probability that f (y) 6= f (y ′ ) = f ′ (y) is at most we have that the probability that f (y) 6= f ′ (y) for some y ∈ Y Having established Lemmas 2, 3, and 5, we now show that they imply Theorem 5. Proof of Theorem 5: First, suppose that f belongs to C. Thus we have that N Sη′ (f ) ≤ N Sη′ (C). Since by Lemma 3, the distributions Inǫ and X ∼ Snǫ , AltInǫ (X) are identical, we have that the output of Canon(T ) can only differ from the output of T in two cases: (1) NoiseTest(T ) outputs “reject”. (2) f (y) 6= f ′ (y) for some y ∈ Y . Since f ∈ C, we have N Sη′ (f ) ≤ N Sη′ (C), so by Lemma 2, we have that (1) occurs with . Moreover, since N Sη′ (f ) ≤ N Sη′ (C), we have by Lemma 5, that (2) occurs probability at most r(ǫ)−a(ǫ) 8 r(ǫ)−a(ǫ) with probability at most . Thus, we have that Canon(T ) outputs “accept” with probability at least 8 1 − (3a(ǫ)/4 + r(ǫ)/4) = 1 − a′ (ǫ). Next, suppose that f is ǫ-far from C. There are two cases to consider. The first case is that N Sη′ (f ) ≥ 2N Sη′ (C). In this case, NoiseTest(T ) outputs reject with probability at least 1 − r(ǫ)−a(ǫ) by Lemmas 2. 8 The second case is that N Sη′ (f ) ≤ 2N Sη′ (C). Since Canon(T ) always outputs “reject” if NoiseTest(T ) outputs “reject,” the probability that Canon(T ) outputs “reject” is at least the probability that Canon(T ) outputs “reject” given that NoiseTest(f ) outputs “accept”. Since by Lemma 3, the distributions Inǫ and X ∼ Snǫ , AltInǫ (X) are identical, we have that if NoiseTest(T ) outputs “accept”, the output of Canon(T ) can only differ from the output of T if f (y) 6= f ′ (y) for some y ∈ Y . By Lemma 5, we have that this occurs . Thus, Canon(T ) outputs “reject” with probability at least with probability at most r(ǫ)−a(ǫ) 8 3r(ǫ)/4 + a(ǫ)/4 = r′ (ǫ). and the theorem is proved. E Construction of Canon(T ) for classes closed under ID-Neg minors In this section we describe how the canonical tester Canon(T ) for C is constructed from T (or more precisely, from T ′ ). Throughout this section q(ǫ) denotes the query complexity of T ′ on input parameter ǫ, and c(ǫ) denotes the granularity of T ′ as defined in Definition 5. We begin by defining two useful distributions (the same definitions that were used in Appendix C): 17 q ′ (ǫ) • Canonical Tester’s Queries: Snǫ is the distribution over sets of 22 queries X to n-variable functions that a q ′ -canonical tester makes when it is run with parameter ǫ. • Independent Tester’s Queries: Inǫ is the distribution over sets of q(ǫ) queries Y to n-variable functions that the independent tester T ′ makes with input parameter ǫ. We are now ready to define the canonical tester Canon(T ) = (Canon(T )1 , Canon(T )2 ): Description of Canon(T )1 running on input function f with parameter ǫ: 1. Draw X ∼ Snǫ . As described in Step 1 of Definition 6, this draw of X induces a partition of [n] ′ into 2q (ǫ) subsets: P = {P1 , . . . , P2q′ (ǫ) }. Note that for each fixed j, each variable is independently placed in subset Pj with probability 2q1′ (ǫ) . 2. Ask all 22 q ′ (ǫ) queries x ∈ X to the oracle and receive responses [f (x)]x∈X . Description of Canon(T )2 running on input parameter ǫ with queries X and responses [f (x)]x∈X : 1. Given input X as above, the 4-tuple (f ′ , F+ , F− , id) is obtained by applying F ǫ1 to X as follows: r(ǫ) (recall from the statement of Theorem 4 that ǫ1 = 4q(ǫ) ) • Let Prod(ǫ1 ) be as described in Section 4. (Recall that as noted in Section 4 we have Prod(ǫ1 ) ≤ q(ǫ ) ′ c(ǫ1 )2 1 +1 .) Let rem denote 2q (ǫ) mod Prod (as before, rem is divisible by 2 due to the way T ′ chooses the set S). Srem/2 • Choose the first subsets P1 , . . . , Prem of the partition P defined by X. Let F+ = i=1 Pi Srem and let F− = rem i=rem/2+1 Pi . Let id be the lexicographically first element of F+ . • Let n′ be the value such that n′ − 1 = n − |F+ | − |F− |. (Intuitively, n′ is the number of elements in [n] after the elements of F+ and F− are removed and then id is added back in.) • Let f ′ = ID-Neg(f, F+ , F− , id)(x1 , . . . , xn ). We shall view f ′ as a function over n′ variables. 2. Given X, (f ′ , F+ , F− , id) generated as above, draw Y ∼ AltInǫ1′ (X, id) as follows. We will later show that a set of n′ -variable queries Y generated by this draw is distributed identically to a set of queries Y generated by a draw from Inǫ1′ . Moreover, Canon(T ) can always respond correctly to queries y ∈ Y with the value f ′ (y) by returning f (x) for some x ∈ X as described below. • Let 2q(ǫ1 ) be the number of subsets in the partition R = {R1 , . . . , R2q(ǫ1 ) } induced by T ′ running with parameter ǫ1 . Let ki be the nonnegative integer such that the expected size of subset Ri , ′ 2q (ǫ) equals ki · n/Prod(ǫ1 ). Let K = ⌊ Prod(ǫ ⌋. 1) • To construct subset R1 , remove the first k1 subsets Prem+1 , . . . , Prem+Kk1 remaining in the partition P and place the elements of each of the sets in R1 . To construct subset Ri , for 2 ≤ i ≤ 2q(ǫ1 ) , remove the first Kki remaining subsets in P and place the elements of each of the sets in Ri . • Randomly choose a subset Rj , where each subset Ri has probability ki /Prod(ǫ1 ) of being chosen. Place the variable xid in the subset Rj . • Let Y be the set of queries asked by T ′ running with input parameter ǫ1 given the partition R. • Canon(T ) answers queries in Y of the form (x1 , . . . , xn′ ) with f ′ (x1 , . . . , xn ) where the function f ′ (x1 , . . . , xn ) equals ID-Neg(f, F+ , F− , id) (recall Definition 4). 3. Canon(T ) hands the query strings and responses to T2′ and outputs whatever T2′ does. As at the end of Appendix C, the randomness in the computation stage can be eliminated using the [GT03] approach. 18 Analysis of Canon(T ) for classes closed under ID-Neg Minors F In this section we prove the following result which implies Theorem 4: Theorem 6 Canon(T ) is a canonical property tester for C which makes 22 following: q ′ (ǫ) queries and satisfies the • if f belongs to C then Pr[Canon(T )f = “accept”] = 1; and • if f is ǫ-far from C then Pr[Canon(T )f = “reject”] ≥ (1 − 1−r(ǫ) 1−r(ǫ)/4 ) · r(ǫ1 ). We prove Theorem 6 using the following two intermediate lemmas. In Lemma 6 we write “id(X)” to denote the element id that is determined by X (see Step 1 of the description of Canon(T )2 ). Lemma 6 Let T be a one sided independent tester for C. Fix any possible outcome X̃ of draws from Snǫ and ˜ be the 4-tuple that is obtained from X̃ by applying the operator F ǫ1 to X̃ as described let (F̃+ , F̃− , f˜′ , id) in Section 4.2. Then we have ǫ ,Y X∼Sn Prǫ ∼AltIn1′ (X,id(X)) ˜ [Canon(T ) outputs “accept” | F ǫ1 (X) equals (f˜′ , F̃+ , F̃− , id) and f˜′ ∈ C] = 1 and ǫ ,Y X∼Sn Prǫ ∼AltIn1′ (X,id(X)) ˜ [Canon(T ) outputs “reject” | F ǫ1 (X) equals (f˜′ , F̃+ , F̃− , id) and f˜′ is ǫ1 -far from C] ≥ r(ǫ1 ). Lemma 7 Suppose f is ǫ-far from C. Let X be drawn from Snǫ and let (f ′ , F+ , F− , id) be obtained by 1−r(ǫ) applying F ǫ1 to X. Then with probability at least 1 − 1−r(ǫ)/4 , the function f ′ (over n′ variables) is ǫ1 -far from C. We first show that Lemmas 6 and 7 imply Theorem 6. Proof: First, assume f belongs to C. Then since C is closed under ID-Neg minors, we have that for any sequence X̃ of draws, the function f˜′ resulting from F ǫ1 (X̃) is in C. Thus, by Lemma 6 we have that Canon(T ) outputs “accept” with probability 1. 1−r(ǫ) Next, assume f is ǫ-far from C. Then by Lemma 7 we have that with probability 1− 1−r(ǫ)/4 , f ′ is ǫ1 -far ′ from C. If f is ǫ1 -far from C then we have by Lemma 6 that Canon(T ) outputs “reject” with probability 1−r(ǫ) r(ǫ1 ). Thus, Canon(T ) outputs “reject” on f with overall probability at least (1 − 1−r(ǫ)/4 ) · r(ǫ1 ). We now prove Lemmas 6 and 7. To prove Lemma 6, we will use the following claim, the correctness of which follows from inspection of the specification of AltIñǫ1′ (X) as given in Step 2 of the description of Canon(T )2 in Appendix E. ˜ be the 4-tuple that is obClaim 8 Fix any possible outcome X̃ of draws from Snǫ and let (f˜′ , F̃+ , F̃− , id) ǫ ˜ tained from X̃ by applying the operator F 1 to X̃ as described in Section 4.2 (note that once (f˜′ , F̃+ , F̃− , id) ′ ′ ˜ has been determined this fixes the value of ñ , the number of variables that f is defined over). Then the two distributions • Draw Y ∼ Iñǫ1′ and output Y ; and 19 ˜ Draw Y ∼ AltI ǫ1′ (X) and output Y • Draw X ∼ Snǫ , conditioned on F ǫ1 (X) = (f˜′ , F̃+ , F̃− , id). ñ are identical. We are now ready to prove Lemma 6. Proof: By Claim 8, the distributions Inǫ1′ and AltInǫ1′ (X, id(X)) are identical. Additionally, the canonical tester is able to answer correctly with respect to f ′ every query in Y where Y ∼ AltInǫ1′ (X). Since T ′ is a one-sided tester for class C, it must be the case that if f ′ ∈ C, Canon(T ) outputs “accept” and if f ′ is ǫ1 -far from C then Canon(T ) outputs “reject” with probability r(ǫ1 ). Before proving Lemma 7, we define the following distribution: Alternative view of distribution over functions f ′ : AltF ǫ1 (Y ) is a distribution over 4-tuples (f ′ , F+ , F− , id), where f ′ is an n′ -variable function (for n′ = n − |F+ | − |F− | + 1), F+ and F− are disjoint subsets of [n], id is the lexicographically first element of F+ , and Y is drawn from Inǫ1 . Given Y drawn from Inǫ1 , a draw from AltF ǫ1 (Y ) is obtained in the following way: • Let Prod(ǫ) and Prod(ǫ1 ) be defined according to the tester T ′ as described in Section 2.1. • Let R+ , R− be a fixed pair of two subsets of the partition defined by Y that have the following property: for every query string y ∈ Y , for all pairs (i, j) such that i ∈ R+ , j ∈ R− , we have yi = y j . (Note that such subsets are guaranteed to exist by the way that T ′ performs its first query.) Let k+ be such that the expected size of R+ is k+ · n, and similarly the expected size of R− is k− · n. • Choose the subsets F+ , (F− ) from R+ , (R− ) by placing each variable from R+ (R− ) in F+ (F− ) independently with probability pR+ = 2·k rem . Note that due to the definition of q ′ (ǫ) and the fact ′ ·2q (ǫ) + that rem ≤ Prod(ǫ1 ), k+ ≥ 1/Prod(ǫ) (k− ≥ 1/Prod(ǫ)) we must have that pR+ ≤ 1 (pR− ≤ 1). • Choose the element id to be the lexicographic first element of F+ . • Set f ′ = ID-Neg(f, F+ , F− , id). The following claim shows that the distribution Y ∼ Inǫ , AltF ǫ1 (Y ) is identical to the distribution X ∼ Snǫ , F ǫ1 (X) used by Canon(T ). Claim 9 The following two distributions are identical: • F ǫ1 (X) (where X is drawn from Snǫ ) and • AltF ǫ1 (Y ) (where Y is drawn from Inǫ ). We require two more definitions: • Restricting independent Tester’s queries to n′ variables: Restrict(Y, f ′ , F+ , F− , id) is an operator over sets of queries Z, where Y ∼ Inǫ , (f ′ , F+ , F− , id) ∼ AltF ǫ1 (Y ). A draw from Restrict(Y, f ′ , F+ , F− , id) is obtained in the following way: – Let Z be the restriction of Y to the variables in [n] \ (F+ ∪ F− ) ∪ {id}. • Alternate view of distribution over independent Tester’s queries over n variables restricted to n′ variables: AltRestrictǫ is a distribution over 5-tuples (f ′ , F+ , F− , id, Z). A draw from AltRestrictǫ is obtained in the following way: 20 – Choose the subsets F+ , F− , [n] \ (F+ ∪ F− ) by independently placing each variable in F+ with rem probability 2·2rem q ′ (ǫ) , F− with probability 2·2q ′ (ǫ) , and otherwise in [n] \ (F+ ∪ F− ). – Fix id to be the lexicographically first element of F+ , and set f ′ to be f ′ = ID-Neg(f, F+ , F− , id). – Let Tnǫ be the distribution over sets of q(ǫ) queries Y to n-variable functions that T makes with input parameter ǫ. Note that for a draw of Y ∼ Tnǫ there is a subset R that corresponds to R+ ∪ R− , where R− , R+ are the subsets described in the “Alternative view of distribution over functions f ′ ” given above. Draw Y ∼ Tnǫ conditioned on the indices in F +, F− being placed in this partition subset R. Set Y ′ to be the restriction of Y to the variables in [n] \ (F+ ∪ F− ). – Choose a subset S ′ ⊆ [n] \ (F+ ∪ F− ) ∪ {id} uniformly at random (i.e. each i ∈ [n] \ (F+ ∪ F− ) ∪ {id} has independent prob. 1/2 of being in S ′ , as in Section B). – For each query in Y ′ , flip the value of xi for all i ∈ S ′ and add the query to Z. We have the following claim: Claim 10 The following two distributions over sets of queries Z are identical: • Draw Y from Inǫ , draw (f ′ , F+ , F− , id) from AltF ǫ1 (Y ), and output Z = Restrict(Y, f ′ , F+ , F− , id). • Draw (f ′ , F+ , F− , id, Z) from AltRestrictǫ and output Z. Finally, we define one more operator: Alternate view of a single query made by the independent Tester restricted to n′ variables: For 1 ≤ i ≤ q(ǫ) and Z generated from AltRestrictǫ as described above, we define Query i (Z) as the single query string obtained by returning the i-th query string from Z. The following claim shows that a draw from Query i (Z) is uniformly distributed. ˜ Z̃) from AltRestrictǫ . Fix any 1 ≤ i ≤ q(ǫ). The Claim 11 Fix any possible outcome (f˜′ , F̃+ , F̃− , id, following two distributions are identical: • Output (f˜′ , Un′ ) (i.e. the second coordinate is a uniform random n′ -bit string); and ˜ • Draw (f ′ , F+ , F− , id, Z) from AltRestrictǫ , conditioned on (f ′ , F+ , F− , id) being identical to (f˜′ , F̃+ , F̃− , id). ′ i Output (f˜ , Query (Z)) Proof: By the definition of AltRestrictǫ , no matter what is the outcome of (f ′ , F+ , F− , id), the final choice of S ′ makes the i-th query string in Z uniform random. Using the defined distributions and corresponding claims, we are now ready to prove Lemma 7: Proof of Lemma 7: Assume that f is ǫ-far from C. Let X be drawn from Cnǫ , and let (f ′ , F+ , F− , id) be F ǫ1 (X). Let p1 denote the probability that f ′ (a function over n′ variables) is ǫ1 -close to C. We will upper bound p1 and thus prove the lemma. Let Y be drawn from Inǫ , and let (f ′ , F+ , F− , id) be drawn from AltF ǫ1 (Y ). By Claim 9 the two distributions X ∼ Snǫ , F ǫ1 (X) and Y ∼ Inǫ , AltF ǫ1 (Y ) are identical. So we view f ′ as chosen by a draw from AltF ǫ1 (Y ). Let us now additionally consider a draw Z from the distribution Y ∼ Inǫ , (f ′ , F+ , F− , id) ∼ AltF ǫ1 (Y ), Restrict(Y, f ′ , F+ , F− , id). We have that by Claim 10 that the two distributions over Z Y ∼ Inǫ , (f ′ , F+ , F− , id) ∼ AltF ǫ1 (Y ), Z ∼ Restrict(Y, f ′ , F+ , F− , id) 21 and Z ∼ AltRestrictǫ are identical, so we can alternatively view Z as a draw from AltRestrictǫ . Since for 1 ≤ i ≤ q(ǫ), (f ′ , Z ∼ AltRestrictǫ , Query i (Z)) ≡ (f ′ , Un′ ), Claim 11 tells us that each query in Z is uniformly distributed. We note that if f ′ over n′ variables is ǫ1 -close to Cn′ , then this implies that there is a function g ′ ∈ Cn′ such that Prz∼Un′ [f ′ (z) 6= g ′ (z)] < ǫ1 . By a union bound since the queries in Z are individually uniformly distributed, the probability that one of the queries z in Z is such that f ′ (z) 6= g ′ (z) is at most q(ǫ) · ǫ1 . Alternatively, by viewing Z as a draw from the distribution Y ∼ Inǫ , (f ′ , F+ , F− , id) ∼ AltF ǫ1 (Y ), Restrict(Y, f ′ , F+ , F− , id), we have that the probability that all the queries y in Y ∼ Inǫ are such that f ′ (y) = g(y) (where g(y) ∈ Cn is obtained from g ′ (y) by adding back the irrelevant variables corresponding to the indices in (F+ ∪ F− ) \ {id}, and we view f ′ here as an n-variable function) is also at least (1 − q(ǫ) · ǫ1 ). Since T ′ is one-sided, and since we have that for any Y ∼ Inǫ , (f ′ , F+ , F− , id) ∼ AltF ǫ1 (Y ), it must be the case that f (y) = f ′ (y) for all y ∈ Y , this means that: Pr [T2′ (Y ) accepts] ǫ Y ∼In = ≥ ≥ = = Pr [T2′ (Y ) accepts] Pr [T2′ (Y ) accepts ∧ f ′ is ǫ1 -close to Cn′ ] Pr [f ′ is ǫ1 -close to Cn′ ∧ f (y) = g(y) for all y ∈ Y ] Pr [f ′ is ǫ1 -close to Cn′ ∧ f ′ (y) = g(y) for all y ∈ Y ] Pr [f ′ is ǫ1 -close to Cn′ ] · Pr [f ′ (y) = g(y) for all y ∈ Y |f ′ is ǫ1 -close to Cn′ ] ǫ ,(f ′ ,F ,F ,id)∼AltF ǫ1 (Y ) Y ∼In + − ǫ ,(f ′ ,F ,F ,id)∼AltF ǫ1 (Y ) Y ∼In + − ǫ ,(f ′ ,F ,F ,id)∼AltF ǫ1 (Y ) Y ∼In + − ǫ ,(f ′ ,F ,F ,id)∼AltF ǫ1 (Y ) Y ∼In + − ǫ ,(f ′ ,F ,F ,id)∼AltF ǫ1 (Y ) Y ∼In + − ǫ ,(f ′ ,F ,F ,id)∼AltF ǫ1 (Y ) Y ∼In + − So PrY ∼Inǫ [T2′ (Y ) accepts ] ≥ p1 · (1 − q(ǫ) · ǫ1 ). However, since f is ǫ-far from C, T ′ running with input parameter ǫ must accept with probability at most 1 − r(ǫ). Thus, p1 · (1 − q(ǫ) · ǫ1 ) ≤ 1 − r(ǫ). So 1−r(ǫ) p1 ≤ 1−r(ǫ)/4 . 22