Evaluating spectral norms for constant depth circuits with symmetric gates

1995, Computational Complexity

EVALUATING SPECTRAL NORMS FOR CONSTANT DEPTH CIRCUITS WITH SYMMETRIC GATES Meera Sitharam Abstract. - The Fourier spectrum and its norms are given as explicit arithmetic expressions and evaluated, for Boolean functions computed by classes of constant depth, read-once circuits consisting of an arbitrary set of symmetric gates. Previous results of this nature estimate the spectral L1 norm of functions computed by certain types of decision trees [20] [7], and in some cases, give randomized procedures that evaluate the spectrum by clever rounding [20]. One corollary of our results provides a large class of AC 0 functions whose spectral L1 norm is exponential, thus generalizing the single example of such a function given in [9]. This shows that almost every read-once AC 0 function does not belong in the class PL1 of functions with polynomially bounded spectral norms. - Implications of our results and technique are discussed, for estimating the spectral norms of any function in a constant depth circuit class, using the coding theoretic concept of weight distributions. Evaluating the spectral norms for any such function reduces to estimating certain non-trivial weight distributions of simple, linear codes. Key words. Circuit complexity; Lower bounds; Fourier transforms. Subject classi cations. 68Q15, 68Q99. 1. Introduction Complexity bounds for classes of constant depth circuits consisting of speci c sets of gates have been the subject of extensive study. The gates are often chosen to be a particular set of symmetric Boolean functions that form a complete basis. For example the set f^; _; :g yielding AC 0 is studied in ([13], [30], [36], [17], [23], [12], survey [8]); the sets f^; modpg, f_; :; modpg, f_; :; modq g, for p prime, q composite are studied in ([2], [31], [33]), [4]. In some cases, the chosen gates are non-symmetric and less constrained, for example, threshold 2 Sitharam gates, the most powerful of which is ( ( )), for a sparse multilinear real polynomial ([18], [1], [37], [6], [38], [9], [15], [29], [34], [21], [22], [?], [3]). Spectral analysis is one of the techniques employed recently to study some of these classes, and sometimes Boolean functions in general. The Fourier spectrum of a Boolean function is so named since it is obtained by viewing as a function from the group ZZn2 to the eld of complex numbers. However, there are several natural transformations - one of which is the Hadamard transform of a Boolean function that result in the Fourier spectrum. Another example: if a Boolean function is viewed as a function from f1 ?1gn ! f0 1g, the Fourier coecients of are exactly coecients of the unique multilinear polynomial ~ over R n that represents , i.e, interpolates the values of on its domain. This versatility of the Fourier spectrum of the Boolean function enables its wide applicability, and furthermore, transports some of the classical spectral analytic techniques to the study of Boolean function complexity. Estimating spectral norms of a Boolean function ( 1( ^), 1( ^), 2( ^), etc.) is essential for answering the basic questions of approximability and nonapproximability of Boolean functions. Therefore such estimations have predictably found several applications. For example, the 1 and the inverse of the 1 norm of ^ respectively provide upper and lower bounds on the number of terms in a polynomial whose sign represents , and is used to determine the threshold complexity of [9]. Estimations of the size of the support of ^ has also found applications. This corresponds to the degree and sparsity of the multilinear polynomial ~ that interpolates on f1 ?1gn . The sparsity of the function that approximates ^ with respect to the 1 or 2 norms is related to the time complexity of learning as shown in and [20], [23], and [32]. In addition, the multiparty communication complexity of Boolean functions can also be bounded via their spectral 1 norm [16]. (See [9] for a brief survey of some of these applications). Although some of these spectral analytic studies have concerned relatively general Boolean functions [19], [20], [7], [27], [28], [12], [4], many others have been speci c to Boolean functions computed either by (constant depth) circuits consisting of very speci c sets of (symmetric) gates [23], [5], [3], or by depth 1 or depth 2 circuits consisting of a less constrained set of gates, [6], [9], [34] [?]. We consider Boolean functions computed by constant depth circuits consisting of arbitrary sets of symmetric gates. However, we restrict ourselves primarily to functions computed by read-once circuits, and give explicit arithmetic expressions that evaluate to the spectral coecients of such functions, and thereby also to their norms. In addition, we directly evaluate the specsign p x p f f f ; ; f f f f f L f L f L L L f f f f f f f ; L f L L f Spectral norms for Boolean functions 3 tral L1 norm, without evaluating the individual spectral coecients. Previous results of this nature give upper bounds on the spectral L1 norm of functions computed by certain types of decision trees [20] [7], and in some cases, give randomized procedures that evaluate the spectrum by clever rounding [20]. One corollary of our results provides a large class of read-once AC 0 functions whose spectral L1 norm is exponential, thus generalizing the single example of such a function given in [9]. We show that almost every read-once AC 0 function is not contained in the class PL1 of functions with polynomially bounded spectral L1 norms. This shows the limitations of methods that bound complexity in terms of the spectral L1 norm, such as the method in [20] which establishes that the (randomized) complexity of learning Boolean functions is polynomial in their spectral L1 norms. While functions computed by read-once circuits are simple and easy to deal with in a certain sense, any non-read-once Boolean functions f of n variables computed by any constant depth, polynomial size circuit of a set of symmetric gates - can be expressed in terms of read-once functions fr of m poly(n) variables, in the same class, using projections. It therefore follows that a Fourier coecient of f can be expressed as a sum of our explicitly evaluated Fourier coecients of fr over a translate of a simple, linear subspace of IFm2 . This observation, the properties of fr that follow from our evaluation, and an application of the coding theory concept of weight distributions, altogether imply a new method for the estimation of the spectral norms of f . The usefulness of this method is yet to be investigated and depends on whether for simple, linear codes, the weight distribution with respect to certain non-trivial weights can be estimated with reasonable accuracy. Our technique for evaluating spectral norms is based on the simple idea that for a function f computed by any read-once circuit C the Fourier coecient f^(x) can be expressed in terms of the expected values of the functions computed by the maximal subcircuits of C (marked 4 below) whose inputs do not include any of the bits xi that equal `1' (marked in the gure below). 4 Sitharam For the case of homogeneous circuits with symmetric gates, this idea yields a technique for the explicit expression of ^( ) in terms of natural parameters, which we call the parameters of , where is the depth of the homogeneous, read-once circuit computing . By the same idea, 1( ^) can also be expressed recursively using the expected values of the functions computed by the subircuits of , thereby permitting the evaluation of 1( ^) by evaluating at most as many Fourier coecients of as the number of subcircuits of . The paper is organized as follows. Section 2 clari es notational conventions, and provides some basic background. Section 3 de nes the parameters, and gives explicit arithmetic expressions, in terms of the parameters, for evaluating the spectra and their 1 norm, for functions computed by homogeneous read-once circuits with arbitrary sets of symmetric gates. The speci c example of read-once AC 0 functions is worked through, their spectral 1 norm is evaluated, and shown to be exponentially large. Section 4 gives a method for the estimation of the spectral norms of arbitrary functions computed by constant depth circuits of symmetric gates, by observing some properties of parameters, and applying the concept of weight distributions. f x type x d d f L C L f f f C type type L L type 2. Background and conventions Unless otherwise speci ed, all -tuples are elements of either R or the nite vector space IF2 . The number of non-zero entries in is denoted j j, the -tuple ( ) is denoted ( ), and for any , the -tuple of all zeroes is denoted 0. When 2 IF2 , is identi ed with the set of co-ordinates 1 where = 1. Thus, given vectors and , we will refer to the vectors [ , \ , n , (for the bitwise complement of ), and expressions such as 2 (meaning = 1). The inner product for 2 IF2 is `1' if the parity of j \ j is odd. Boolean functions of variables map from IF2 to f0 1g or f1 ?1g to facilitate the use of the Fourier transform (see [10]) for functions from the group ZZ2 to the eld of complex numbers. The Fourier transform of is denoted ^ and is given by ^( ) = 1 2 X ( )(?1) ; n n n x a; : : : ; a ~ a n x n n x n n x i xi x y x x xi < x; y > x; y y x i x y x y x y n n n x n ; n ; f f x n = n u2IF P f u f <x;u> 2 thereby ( ) can be written as n ^( )(?1) The domains of Boolean 2IF2 functions and their transforms ^ are always considered to be IF2 . All other functions of variables are multilinear polynomials that map from R to R For a Boolean function , ~ will be used to denote the unique multilinear f x f u <x;u> : u f f n n n f f : Spectral norms for Boolean functions 5 polynomial over Rn that interpolates f at f0; 1gn or at f1; ?1gn, depending on what the real domain of f , denoted R domain (f ), and the range of f are chosen to be. In other words, a real representation of IFn2 is chosen, either with 0 and 1 (in IFn2 ) mapping to the real values 0 and 1 or to 1 and ?1 respectively. While a Boolean function f and its Fourier transform are independent of this choice of R domain (f ), the polynomial f~ does depend on this choice. Note that when Rdomain (fP) = f1; ?Q1gn , and range (f ) = f1; ?1g then for f^(y) xi. In other words, the coecient of x 2 f1; ?1gn , f (x) = f~(x) = i2y y2IF2 Q xi in the multilinear polynomial f~ over Rn that represents f on the Rdomain i2y f1; ?1gn is nothing but the yth Fourier coecient of f . Notice that given f~ that represents f on the R domain f0; 1gn , the Fourier coecients of f can thus be obtained by applying the change of variable xi ! 1?2x , and xi = (1?xi) ! 1+2x , to f~; and nding the coecients of the resulting polynomial in standard power form. Finally, for y 2 IFn2 we denote the yth partial derivative of order jyj over Q Rn , i.e, Dx , by Dy . Finally, the norms used are the following: L1(f ) =def P jf (xi2)jy; L2(f ) =def P f 2(x); and L1 (f ) =def maxx2IF jf (x)j. 2 n i i i x2IFn 2 n x2IFn 2 The following are basic properties of the Fourier spectra of Boolean functions. n Fact 2.1. For functions f and g over IF2 the following hold. (i) Parseval's identity: X f 2(x) = X f^2(x) = L (f^): (1=2n )L2(f ) = (1=2n ) def 2 x2IFn 2 x2IFn 2 Notice that if range (f ) = f0; 1g, then X jf (x)j = X f 2(x) = L (f ): L1(f ) =def def 2 x2IFn 2 x2IFn 2 Thus, for Boolean f , bounds on the L2 norm of f^ provide bounds on the L1 norm of f , and furthermore, the L1 norm of f^ provides an upper bound on the L2 norm of f^, since jf^(x)j 1 for all x. In addition, the L1 norm of f^ provides a lower bound on the size of the support of f^, and an upper bound on the sparsity of the polynomial approximating f~ when R domain = f1; ?1gn. Moreover, the L1 norm of f^ gives a lower bound on the L1 norm of f^. These facts are crucial in the development of several of the learning algorithms [23], [20], [12], [7], [32], and also used in some of the results about threshold circuits, for example [6], [9]. 6 Sitharam (ii) The value of the transform at 0 is the expected value of the function: ^(0) = (1 2n ) X ( ) ~ f ~ = f u : u (iii) If range ( ) = f0 1g, then ^(0) = (1 2n )jsupport ( )j and if range ( ) = f1 ?1g, and R domain ( ) = f1 ?1gn , then ^(0) = ~(0). (iv) If range ( ) = f0 1g range ( ) = f1 ?1g and = (1 ? ) 2 then ^(0) = (1 2) (1 ? ^(0)) and 8 : j j 0 ^( ) = ?(1 2) ^( ) f f ~ ; ; f f ; = ; ; g ; f ~ u f f ~ ; f f ~ f g ~ = u > ; f u g = ; ; = g u : The following fact gives a simple property of the multilinear polynomial that represents a Boolean function. Let be a Boolean function of variables. (i) For any 2 IFn2 , and 6= 0, the multilinear polynomial ~ over Rn can be expressed as follows: ~( ) = y0 ( n ) + ~y ( n ) Y i Fact 2.2. f n y ~ y f x f f x y f x y i2y x ; where y0 and ~y are unique multilinear polynomials over Rjyj. Moreover, ~y = y ~, where y denotes the th partial derivative (see the rst paragraph of this section). (ii) If R domain ( ) = f0 1gn , then ^( ) = (1 2jyj ) ~y ( 21 n?jyj ) and if R domain ( ) = f1 ?1gn , then ^( ) = ~y (0) f f f D f D f y ; f y f ; = f y ; f f ~ : The proof of (i) is straightforward using basic properties of multilinear polynomials. (Note that ~~0 = ~). For (ii), if R domainQ( ) = f1 ?1gn , then by the de nition of ^, ^( ) is simply the coecient of i in the stani2y ~y (0). If Rdomain ( ) = f0 1gn , then dard power form of ~, which is clearly Q ~ ^( ) is given by the coecient of i in the standard power form of , afi2y ter the change of variables: i ! 1?2x , and (1 ? i ) ! 1+2x , which is clearly (1 2jyj ) ~y ( 12 n?jyj ). 2 Proof. f f f ~ f x f y x f ; x f y f = f f i ; f x i 7 Spectral norms for Boolean functions 3. Spectral norms for functions computed by read once circuits. The main result of this section, Theorem 3.5, gives explicit expressions for the spectral values and their norm for functions computed by (resp. expressible as) constant depth, homogeneous, read-once circuits (resp. formulae) of an arbitrary pair of symmetric gates. These expressions are given in terms of natural parameters, called the type parameters, of the argument to . We begin by setting up the machinery to state and prove the main theorem. The rst theorem we prove is a direct application of Fact 2.2 to functions computed by any read-once circuit consisting of possibly nonsymmetric gates. The theorem gives recursive expressions for the spectrum of and its norm, in terms of the spectra of the functions computed by the subcircuits of . L1 f d f C f L1 C Theorem 3.1. Let f be a Boolean function computable by a read-once circuit. Without loss of generality, let be de ned as f ( )= f x g h1 (1( )) x ( ( )) ; : : : ; hk k x ; where the tuples of arguments, ( ) to the 's form a partition of the arguments to , i.e, ( ) \ ( ) = 0 when 6= , and S (1 ) = (1 ). The functions and n respectively. are computed by read-once Boolean circuits over IF and IF For any 2 IF , let ~ be expressed as in Fact 2.2, as follows: i x f i x k ~ j x hi i i j n n i=1 ji(1 k hi ~( ) = ( n ) + ~ ( n ) 0 gy z Furthermore, for any 2 IF , let set f : ( ) 6= 0g. Then n w 2 y yw ~ i w (i) ^( ) = f w and (ii) ( ^) = L1 f )j g 2 g z i 2 2 k y g k zi : i2y k 2 hi ~ ^ (0)) n h1 ~ ; : : : ; hk ~ gy (^ ) ? ^ (0) L1 hi Y 2 IF be the characteristic vector of the hi i w i2yw i2y y ^ ( ( )) ~ w ^ (0) Y X Y y 2IF2 gy z j~ (^ (0) gy yw ; ^ (0)) n h1 ~ ; : : : ; hk ~ y j : 8 Sitharam By applying the chain rule and noticing that the 's are representable by multilinear polynomials ~ with disjoint sets of variables, it follows that ~ w ~ ~= Y Proof. hi hi Dw f Di(w ) hi Dy g : 2 i yw Now, if Rdomain ( ) = f0 1g , it follows from Fact 2.2 that f n ; ^( ) = (1 2j j ) ~ ( 1 2 f w w = ?jwj n fw ^( ) = f w 2 i yw w = and substituting the above expression for Y ~( 1 2 ) = (1 2j j ) (1=2ji(w)j)D ?jwj n Dw f ~, we get Dw f j ~ (1 2 n j?ji(w)j i(1 ) ) i(w ) hi j nj j nj ~ (1 ~ (~ ( 21 ) )) n 2 and applying Fact 2.2 again to the ~ 's, the above quantity Y ^ (0)) n ^ ( ( )) ~ w (^ (0) = 1(1 ) n(1 ) h1 gyw ) ; : : : ; hn yw hi hi i w 2 h1 ~ ; : : : ; hk ~ gy yw : i yw On the other hand, if Rdomain ( ) = f1 ?1g , it follows from Fact 2.2 that ^( ) = ~ (0) = ~(0) and substituting the above expression for ~, we get f ; fw ~ f w n Dw f ~ ; Dw f ^( ) = Y f w 2 ~ (0) ~ w (~ (0) Di(w ) hi ~ i yw gy ~ (0)) n h1 ~ ; : : : hn ~ yw ; and just as in the earlier case, applying Fact 2.2 to the ~ 's, the above quantity becomes Y ^ (0)) n ^ ( ( )) ~ w (^ (0) = hi 2 hi i w gy h1 ~ ; : : : ; hk ~ yw : i yw To show (ii), we observe ( ^) = L1 f j ~ w (^ (0) n X w 2IF2 gy ^ (0)) n h1 ~ ; : : : ; hk ~ yw Y j j 2 i yw ^ ( ( ))j hi i w 9 Spectral norms for Boolean functions which can be written as X j~ (^ 1(0) y 2IFk2 gy h ^ (0)) n ~ ;:::;h ~ k But X j f(): ( 2)6=g0 ~ w i w i y can be rewritten as Y y y ~ w i w i y j ^ ( ( ))j Y hi i w 2 i : y ^ ( ( ))j hi i w 2 i X f(): ( 2)6=g0 Y y X 2 fi(w):w2IF2 i j n j^ ( ( ))j hi i w & i(w)6=~0g from which the result follows since X j ^ ( ( )) j = 1(^ ) ? ^ (0) hi i w f( ): 2IFn2 & i(w)6=~0g L hi ~ : hi i w w 2 Next we give a uniform de nition of symmetric Boolean functions. For f0 1 g the symmetric Boolean function is de ned as ( ) = 1 if and only if j j 2 . When Rdomain ( ) = f0 1g and range ( ) = f0 1g, for example, it is clear that Definition 3.2. u ; ;:::;k su x su su x u su ; k ; ~ ( )= su x X X Y 2 y2IFk2 jyj=i i u j 2 y xj Y j 62 (1 ? ) xj : y The following is a direct application of Fact 2.2 to symmetric Boolean functions: for any 2 IF2 , with j j = , the multilinear polynomial ~ can be expressed as Y ~ ( )= 0 ( n )+~ ( n ) y k y su x j su su;j x y su;j x y 2 i where 0 su;j xi y and ~ are unique multilinear polynomials over R ? . su;j k j Next, we formally de ne the class of homogeneous, read-once circuits of pairs of symmetric gates. 10 Sitharam The class [ ] is the class of Boolean functions over IF2 computable by homogeneous read-once circuits of depth and uniform fan-in , consisting of alternating levels of and gates of variables. The arguments to are the inputs to the circuit, which may be individually negated. For convenience, we will often assume that these are the only negations that appear in the circuit, and that when is even, the topmost gate computes and when is odd, the topmost gate computes . Definition 3.3. d k RO k; d; u; v f d k su f k sv k d d su d sv Next we formally de ne the type parameters and other quantities that are used in the statement of the main theorem. d (i) For 2 IF2d represented eitherdas f 0 1 g or as ?i f1 ?1g , the parameters ( ) 2 f0 g , for 0 , are de ned recursively. Let tier ( ) denote the entry of the vector tier ( ) tier 0 ( ) = 1 () = 1 tier +1 ( ) = jf( ? 1) : tier ( ) 6= 0gj For example, for = (0 1 1 0 1 0 1 1), tier 2 0( ) = , tier 2 1( ) = (1 1 1 2), tier 2 2( ) = (2 2), and tier 2 3( ) = (2). Definition 3.4. d k k x ; ; tierk;i x k;i;j ;:::;k x k j i d th x : k;i x k; ;j k;i ;j xj def x ; ; ; : j def x ; k ; ; x ; k ; ; l ; jk ; ; k;i;l ; ; x : x x x ; x ; ? g , for 0 ( ) 2 f0 (ii) For 2 IF2d the parameters are de ned as follows. type ( ) = jf : tier ( ) = gj; i.e, the entry in the vector type ( ) is the number of entries in P type +1 ( ) = P type ( ); tier ( ) that equal . Notice that k x ; typek;i x x k;i;j j l def k;i;l d i ;j x ;:::;k x k i j th k;i x k k k;i x d j =1 j k;i =1 k;i;j x ( ) = P type ?1 ( ); and P type ( ) = 1 if 6= 0 =1 =1 =1 and 0 otherwise. Furthermore, type 0 1( ) = j j and for 2 , type 0 = 0. For example, for = (0 1 1 0 1 0 1 1), type 2 0( ) = (5 0), type 2 1( ) = (3 1), type 2 2( ) = (0 2), and type 2 3( ) = (0 1). (iii) The following quantities, depending on 2 IN and f0 g, are also used extensively in the next theorem. We will omit the subscripts and on these quantities when the context is clear. P? tier k j d i j k k k;i;j x k;i j j ;j x k;d;j j x k; ; x ~ x x j k k; ;j x ; ; x ; ; ; ; ; ; ; ; ; ; x v ; ; x ; k; d u x u; v ;:::;k 11 Spectral norms for Boolean functions For 2 i d, and 0 j k, the quantities Tk;i;j 2 R are de ned as follows. k?j Tk;i+1;j =def s~u;j (Tk;i; 0) when i is even, u is replaced when i is odd, and by v. Recall the de nition (3.2) of the symmetric Boolean functions su and sv and the related multilinear polynomials s~u;j and s~v;j over R k?j . The de nition of Tk; ;j depends on whether the R domain of su and sv is viewed as f0; 1gk or as f1; ?1gk . In the latter case, Tk; ;j =def s~v;j (~0); Tk; ; = 0; and in the former, Tk; ;j =def s~v;j ( k?j ); Tk; ; = 1=2. 1 1 00 1 2 1 00 We have now set up all the machinery required to state and and prove the main theorem. Theorem 3.5. Let f be computed by an RO[k; d; u; v ] circuit, and let 2 IFk be the characteristic vector of those arguments to f that are negated inputs to the circuit. (i) If R domain (f ) is f0; 1gk , then if x 6= ~0, d 2 d f^(x) = (?1)jx\j(1=2)jxj k d Y Y i=1 j =1 type Tk;i;j x) k;i;j ( ; if R domain (f ) is f1; ?1gk , then if x 6= ~1, d f^(x) = (?1)jx\j k d Y Y i=1 j =1 type Tk;i;j x) k;i;j ( ; and in both cases, f^(~0) = Tk;d; . Recall from the De nition 3.4 that the quantities Tk;i;j depend on the choice of su , sv , and Rdomain (f ). (ii) L (f^) = Lk;d , where Lk; = 1 and in general, 0 1 0 Lk;i =def k X j =0 (Lk;i? ? Tk;i? ; ) jTk;i;j j 1 10 j ! k : j The proofs are by induction on d. For (i), the induction basis is straightforward: when d = 1, then f is computed by a single sv gate, thus, from Fact 2.2, 1 k?jxj) f^(x) = (?1)jx\j(1=2jxj )~sv;jxj ( (3:1) 2 Proof. 12 Sitharam If 2 IFk and = 0, then ^(0) = ~v; ( k ), which is de ned in 3.4 as k; ; . From the general de nition of k;i;j in 3.4, and since, when = 1, 2 IFk , and 6= 0, then there is exactly one , namely j j, where type k; ;j ( ) is non-zero, it follows that k k?jxj Y type 1 x ~v;jxj( 12 ) = k; ;jxj = k; ;j x ~ x 2 1 s 0 2 f ~ T T d ~ x j x s T j =1 k; ;j ( ) 1 10 2 x 1 T 1 x : Substituting the above in (3.1) completes the proof of the induction basis. For the induction step, we apply Theorem 3.1 with = u, and k 2 k RO [ ? 1 ], with j (1 )j, (the size of the set of arguments to l) being d? for all 1 . This yields ^ k (0)) n w ^( ) = Y ^ l( ( )) ~y ^ (0) g k; d k ; u; v 1 l l s h1 ; : : : ; h d h k f w h l2yw l w g w h1 ~ ; : : : ; h ~ y : Assuming the induction hypothesis that the theorem holds for the functions ? 1 ], the above equation becomes k 2 RO [ h1 ; : : : ; h k; d ^( ) = ~u;jy f w s w j ; u; v k ?1 Y dY type j l (w )\j j l (w )j (?1) (1=2 ) T k?jy j Y ? w Tk;d 1;0 l2yw k;i;j i=1 j =1 l(w)) k;i;j ( : (3 2) : Now, by de nition of the type parameters, it follows that Y l2yw and Y l2yw typek;i;j (l(w)) Tk;i;j = typek;i;j (w) Tk;i;j (3 3) ; : (?1)jl w \j(1 2jl w j ) = (?1)jw\j(1 2jwj ) ( ) ( ) = = (3 4) ; : for every 1 ?1 and 1 . Furthermore, by the de nition of k;i;j , k?jy j k we get ~u;jy j k;d ? ; = k;d;jy j and since 2 IF , 6= 0, by the de nition of the type parameters, there is exactly one , namely = j w j, where type k;d;j ( ) is non-zero, it follows that i s w T d j k T d w T 10 w ; w j ~ jy j su; w k?jy j T k;d?1;0 w = ~ w 2 j k Y type j =1 Tk;d;j y w) k;d;j ( w (3 5) : : Now substituting (3.3),(3.4),and (3.5) in (3.2), it follows that when 6= 0, w d k ^( ) = (?1)jw\j(1 2jwj ) Y Y f w = i=1 j =1 typek;i;j (w) Tk;i;j : ~ 13 Spectral norms for Boolean functions Furthermore, when w = ~0, it follows that y = ~0 as well, and thus (3.2) reduces to f^(~0) = s~ 0 T ?1 0 = T 0. This proves (i). We show (ii) again by induction on d. The induction basis, for d = 0 is direct. For the induction step, we apply Theorem 3.1 again with g = s , and h1 ; : : : ; h 2 RO [k; d ? 1; u; v ], to get w k k;d u; k;d; ; u k L1 (f^) = X Y y 2IFk2 i 2 i L1(^h ) ? h^ (~0) i j s~ (^h1(~0); : : : ; ^h (~0)) n y j: jj u; y k y Assuming the induction hypothesis that the theorem holds for the functions 2 RO [k; d ? 1; u; v], the above equation becomes h1 ; : : : ; h k L1 (f^) X = y 2IFk2 L ?1 ? T ?1 0 k;d k;d y 2IFk2 L ?1 ? T ?1 0 k;d k;d y ; X = j j j s~ ?j j j T jj ?1 0 u; y j j y ; jT k y k;d ; j j j; k;d; y The summands of the last quantity depend only on jyj allowing it to be rewritten as: ! X k k j =1 j L ?1 ? T ?1 0 k;d k;d ; j jT j; k;d;j thus proving (ii). 2 Remark 3.6. The above proof extends to read-once circuits of arbitrary struc- ture, constructed from arbitrary, but xed sets of symmetric gates. We apply De nition 3.4 and Theorem 3.5 to obtain explicit expressions for the Fourier coecients and their L1 norm for the special case of homogeneous, read-once AC 0 functions, f . In this case, we will show that the values f^(x) do not depend on the vector-valued parameters type (x) but rather on the d P scalars: type (x) for 1 i d. Noticing that _ = sf1 g, and ^ = sf g, we study the class RO [k; d; u = f1; : : : ; kg; v = fkg]. Again, we assume that the topmost gate is an _ (^) gate if the depth is even (odd). k;i j k;i;j ;:::;k k Corollary 3.7. Let f be computed by an RO [k; d; u = f1; : : : ; k g; v = fk gd ] circuit and let be the vector of its negated inputs. Let Rdomain (f ) = f0; 1g , and range (f ) = f0; 1g: Then k 14 Sitharam (i) ^(0) = , which, for the chosen and , becomes: k k k; i ; = ( k; i; ) and k; i; = 1 ? (1 ? k; i? ; ) ; and k; ; = 1 2 f ~ T Tk;d;0 u T 2 +1 0 T ; 2 0 T 2 0 2 ~ b k P j =1 j ^( )j = (1 2b1 k ) f x = type k;i;j (x) dY ?1 =1 ( Tk;i;0 ) i ( ^) = = kbi+1 ?bi d ?1 dY (1 ? =1 i i Tk;i;0 = (?1) jx\j+ P d =1 bi i )kb +1 ?b i i ; i odd and (iii) = 1 2k ; for 0 , then i even sign (f^(x)) Tk;1;0 10 = : 00 (ii) If 6= 0, and i =def x T v if d is even and (?1) jx\j+ ?1 P d bi =1 i if d is odd : where k k; =def 1; k; =def 1; k;i =def ( k;i? ) if is odd and =def (1 ? 2 k;i? ; + k;i? )k if is even Proof. We rst de ne the quantities k;i;j as in De nition 3.4, for the special case where = f1 g and = f g. Noticing that L1 f L Lk;d L 0 L 1 T L L 10 i 1 i 1 : T u ;:::;k v k k Y ~ ( ) = 1 ? (1 ? i) and ~v ( ) = su x x s x i=1 k Y xi ; i=1 it is not hard to see that De nition 3.4, for k;i; 0 matches that given in statement (i) of the corollary. Thus (i) is straightforward, from Theorem 3.5. For general 1 , De nition 3.4 gives k?j ; k?j k; ;j =def 1 2 k; i ;j = ( k; i; ) and j (1 ? k; i? ; )k?j k; i;j =def (?1) Now, applying Theorem 3.5, and substituting the new, speci c expressions for k;i;j , we get T j T i d k = 1 T 0; T +1 2 T 2 +1 T 2 ; 2 0 10 : T ^( ) = (?1)jx\j(1 2jxj ) f x = k dY ?1 Y =0 j =1 i i even ? +1 (x) (k j )typek;i Tk;i;0 ;j 15 Spectral norms for Boolean functions k ?1 Y dY =1 j =1 i (1 ? Tk;i; ) k?j type 0 ( ) ?1) j +1 (x)( k;i ;j ( +1)typek;i +1 (x) ;j (3:6) : i odd Assembling the exponents of the two products over j in (3.6), noticing - by k k de nition of the type parameters - that P j typek;i ;j (x) = P typek;i;j (x); j j and by the de nition of the symbols bi, (3.6) reduces to +1 =1 f^(x) = (?1)jx\j(1=2jxj )(Tk;0;0)kb1 ?b0 ?1 dY =1 (Tk;i; )kb +1 ?b 0 =1 i i i i even ?1 dY =1 i (1 ? Tk;i; )kb +1 ?b (?1)b 0 i b +1 i+ i i (3:7) : i odd Now, noticing that Tk; ; = 1=2, and, by de nition of the type parameters, that Pk b = j type k; ;j (x) = jxj; (3.7) gives the values of jf^(x)j and sign (f^(x)) required to prove (ii). To prove (iii), we apply Theorem 3.5(ii), with the newde nitions of Tk;i;j and the fact that Lk; = 1, to get Lk; = Pkj (1 ? )j k?j kj = 1: In general, if i is odd, then Tk;i;j = Tk;ik??j ; , thus 00 0 =1 0 0 1 1 2 =0 1 2 10 Lk;i = k X j =0 (Lk;i? ? Tk;i? ; ) j(Tk;i? ; ) j 1 10 j 10 k?j k j ! = (Lk;i? )k ; 1 and if i is even, then Lk;i = k X j =0 (Lk;i? ? Tk;i? ; ) j(?1) (1 ? Tk;i? ; ) j 1 10 j j +1 10 k?j k j ! = (1 ? 2Tk;i? ; + Lk;i? )k ; 10 1 thus proving (iii). 2 We illustrate the above corollary with an example. Example 3.8. Consider f vector of negated indices is 2 RO [k = 2; d = 3; f1; : : : ; kg; fkg]; (0; 1; 1; 1; 1; 1; 0; 1) x = (1; 0; 1; 0; 0; 0; 0; 0); with . If , the then 16 Sitharam 2 de ning bi =def P type 2;i;j (x); we get b0 = 2, b1 = 2, b2 = 1, b3 = 1, and j =1 jx \ j = 1. In addition, it is clear that 1 1 1 T2;1;0 = 2 ; T2;2;0 = 1 ? (1 ? 2 )2 ; and T2;3;0 = (1 ? (1 ? 2 )2 )2 : 2 2 2 In general, i times }| { 1 z k )k : : :)k )k )k : Tk;i;0 = (1 ? (1 ? (1 ? : : : (1 ? 2 Thus 1 1 1 f^(x) = (?1)b1 +b2 +jx;\j b1 k (1 ? k )b2 k?b1 (1 ? (1 ? k )k )b3 k?b2 2 2 2 = (?1)4 214 (1 ? (1 ? 212 )2): Furthermore, it is clear that 1 1 L2;1 = 1; L2;2 = 22 (1 ? 2 )2; and L1(f^) = L2;3 = 24 (1 ? 2 )4: 2 2 It follows from the recursive expression for Lk;i and the fact that that the spectral L1 norm of functions in RO [k; d; f1; : : : ; kg; fkg] is roughly exponential in kd, since the quantities Tk;i;0 are inverse exponentials in k. This generalizes the result in [9], and shows that RO [k; d; f1; : : : ; kg; fkg] AC 0 n PL1. 4. General constant depth circuits with symmetric gates. The main results of this section, Theorem 4.3, and Proposition 4.5 respectively give the weight distribution of IFk2d with respect to the type k;i parameters for 1 i d and show that the spectrum and its norms for arbitrary, non-readonce functions computed by constant depth circuits with symmetric gates can be estimated by determining the weight distributions of simple subspaces with respect to the type parameters. The following basic fact expresses the spectrum of a non-read-once function in terms of the spectra of read-once functions. This fact is used to prove Proposition 4.5 from Theorem 4.3. 17 Spectral norms for Boolean functions Let a Boolean function over IF be computed by a depth circuit of size , with the symmetric gates and . Then there is a function , computed by an RO [ ] circuit, for no larger than a polynomial in (but possibly exponential in ), such that (i) X ) ( )= ( Fact 4.1. n f d 2 M su sv k; d; u; v fr k M d n f x xi bi ; fr i=1 where 2 IF d are xed, mutually disjoint vectors; k bi 2 (ii) ^( ) = f x X y where Sx Sx 2Sx ^( ) fr y ; is a translate (coset) of a linear subspace of IF d : k 2 = f 2 IF d : k y = 1 8 : = 1 and < y; bi > 2 ; i xi = 0 otherwiseg ; < y; bi > ; (recall is the inner product over IF d : has been referred to earlier in our discussion as the parity of the number j \ j); and (iii) for any subspace IF X ^( ) = X ^ ( ) k <> < x; y > 2 x n T 2; 2T f x y x where Proof. (ii). Since fr y ; S 2 x Sx : T (i) is straightforward. We show (ii), and (iii) follows directly from ^( ) = 1 2 f x expressing 2ST is the linear subspace de ned as ST y X n u n if (X u b ); (?1)h u;x i r i i=1 in terms of its Fourier coecients, ^( ) becomes X X = 21 (?1) 1 h 1 i (?1) n h n i ^ ( ) (?1)h i 2 n2 2 k2 d X X (?1)h h 1 i h ni i ^ ( ) (?1)h i = 21 2 n2 2 k2 d X 1 X h i(?1)h h 1 i h n i i ^ ( ) = ( ? 1) 2 2 n2 2 kd fr f x u u;x n u IF y u IF y y IF2 u IF y;b fr y u;( y;b ;:::; y;b u;( y;b ;::: y;b ) fr y IF u;x n u ::: IF u;x n y;b ) fr y : 18 Sitharam Noticing that the inner sum is 0 whenever x 6= (hy; b1i; : : : hy; b i); i.e, whenever y 62 S ; and that it equals f^ (y ) otherwise, we get X f^(x) = f^ (y ): n x r y 2 r 2 Sx Before we state the main theorem of the section, we formally de ne weights and weight distributions. Definition 4.2. For x 2 IF2 , a set of weights or parameters of x, p1 (x); : : : p (x), where p typically take vector values in IN for some l < n, must satisfy (i) the sets (a1; : : : ; a ) de ned as fx 2 IF2 : p1(x) = a1; : : : ; p (x) = a g for distinct (a1; : : : ; a ) form a partition of IF2 , and (ii) j(a1; : : : ; a )j depends only on a1; : : : ; a . The weight distribution of a subset S IF2 with respect to the weights p is given by specifying the quantities j(a1; : : : ; a ) \ S j=jS j for all relevant a1 ; : : : ; a . Theorem 4.3. Let a 2 f0; : : : ; k ? g , for 1 i d, and let the set (a1 ; : : : ; a ) be de ned as: n d li i i d n d d n d d d n i d d d i i k k;d d fx 2 IF2 : type 1(x) = a1; : : : ; type (x) = a g: k d k; d k;d In other words, the sets (a1; : : : ; a ) de ne a partition of IF2 . Denoting the j entry of a as a , observe that by the de nition of the type parameters, P P (a1; : : : ; a ) = ;, if for some 1 i d, ja +1 6= a . Otherwise, k k;d th i i;j k k k;d d j (i) j (a1; : : : ; a )j = k;d d d =1 d Y X d k ( =1 =1 i a j i;j )! Y k j i ;j j =1 i;j ai;j k j =1 a i;j ! : (ii) If f is computed by a RO [k; d; u; v] circuit, for some u; v f0; : : : ; kg, with the vector 2 IF2 representing the negated inputs, and if x; y both belong in the same set (a1; : : : ; a ), then jf^(x)j = jf^(y)j; and sign (f^(y)) 6= sign (f^(x)) exactly when jx \ j 6= jy \ j. k d k;d d 19 Spectral norms for Boolean functions For (i), we consider a k-ary tree of depth d, such that each x 2 IF2 marks a unique set of jxj leaves. Now (a1; : : : ; a ) is the set of all x's such that at level i of the tree there are P a \marked nodes" whose descendants =1 have at least one marked leaf. Furthermore, for any 1 j k, there are a nodes that have exactly j \marked children." Therefore, assuming that the a nodes, with j marked leaves, at level i, are indistinguishable, ( P a )! = Q a ! =1 =1 is simply the number of distinct permutations of marked nodes at the i level. Furthermore, Q is the number of ways of choosing the marked children =1 of the marked nodes at the i level. It is not hard to see that the product of these two quantities with i ranging from 1 to d gives the size of the set (a1; : : : ; a ) thus showing (i). The proof of (ii) is a direct consequence of Theorem 3.5, and the de nition of the sets (a1; : : : ; a ). 2 k Proof. k;d d d k i;j j i;j i;j k k i;j i;j j j th k k ai;j j j th k;d d k;d d Notice that the above theorem, and Fact 4.1(iii) directly enable us to estimate the sum of the Fourier transform of functions f in any constant depth circuit class over certain subspaces T . This applies to those subspaces T for which the distribution of the corresponding subspace S IF2 (from Fact 4.1(iii)) with respect to the type parameters is close to that of IF2 itself. In particular, if the distribution of S is identical to IF2 , then the required sum of Fourier coecients is exactly jS j=2 or zero, depending on the value of the corresponding read-once function f at ~0. The following corollary applies speci cally to read once AC 0 functions. k d T k k T k T d d d r Let a 2 f0; : : : ; k ) be de ned as: Corollary 4.4. ( k;d a1; : : : ; ad fx 2 IF2 : k d i X k j =1 type 0 k; ;j d g , for 0 i d ? 1, and let the set k (x) = jxj = a0; : : : ; X k j =1 type ?1 k;d ;j (x) = a ?1g: d Then (i) j (a0; : : : ; a ?1)j = k;d Y d d j =1 ( ) k;j a0; : : : ; ad?1 ; 20 Sitharam where for all 1 j < d, k;j (a ; : : : ; a ? ) = 0 d 1 X (?1) def l k;d (a ; : : : ; a ? ) = 0 d 1 ! k a ?1 def (aj ! ?) a lk a ?1 j l l ! j ; : d (ii) If f is computed by a RO [k; d; f1; : : : ; kg; fkg] circuit, with the vector d 2 IF representing the negated inputs, and if x; y both belong in the same set (a ; : : : ; a ), then jf^(x)j = jf^(y)j; and sign (f^(x)) and sign (f^(x)) di er exactly when jx \ j 6= jy \ j. Proof. The proof of (i) could be derived from Theorem 4.3 (i), however, a d direct proof is easier. As in Theorem 4.3 (i), if x 2 IF marks the leaves of a k-ary tree of depth d, then (a ; : : : ; a ) is the set of all x's that have a marked nodes at the i level, i.e, nodes whose descendants include at least one marked leaf. Now the proof of (i) follows from observing that for 1 j < d, (a ; : : : ; a ? ) is the number of ways in which a ? distinct children at the j ? 1 level can be chosen from a distinct nodes at the j level, each of which contains k distinct children; and (a ; : : : ; a ? ) is the number of ways in which a ? children can be chosen from a single root node with k children. The proof of (ii) is straightforward from the de nition of the sets . 2 The next proposition points out that it would be useful to develop a technique d to determine the weight distribution of simple subspaces of IF with respect to the type parameters. Proposition 4.5. For a Boolean function f over IF computed by a depth d circuit with symmetric gates s and s , let f be the corresponding function computed by an RO [k; d; u; v] circuit, as in Fact 4.1, and furthermore let S be the translate such that X ^ f^(x) = f (y ); k 2 1 k;d d k 2 k;d 1 d i th k;j 0 d 1 j st 1 th j 0 k;d d d 1 1 k;d k 2 n 2 u v r x r 2 Sx and for any subspace T IF ; let S be the subspace S S : Then the weight 2 distributions of each S and S with respect to the weights type (z); : : : ; type (z); andjz \ j determine the quantities: f^(x) for any x 2 IF ; L (f^); and P f^(x). In particular, if f 2 AC [d], then these quantities are deter2 mined by obtaining the distribution of S and S with respect to the d weights P type (x) for 1 i d. y n 2 x T x x T k;1 T n 2 k;d 0 x T x j k;i;j T 1 21 Spectral norms for Boolean functions The proof follows directly from Fact 4.1, De nition 4.2, Theorem 3.5, Corollary 3.7 and Theorem 4.3(ii). 2 Proof. Coding theory provides some tools - called the MacWilliams identities [25] for determining weight distributions with respect to various weights. However, these weights usually satisfy properties that it is not clear if the type (x) parameters satisfy. For instance, a \valid" parameter p for which a MacWilliams identity exists usually satis es the property of linearity: there must exist \valid" parameters p1 and p2 of vectors over IF2 1 and IF2 2 respectively, such that if x 2 IF2 is expressed as the direct sum of two vectors x1 2 IF2 1 and x2 2 IF2 2 , where n1 + n2 = n, i.e, x = x1 x2, then p(x) = p1 (x1) + p2 (x2): The parameter jx \ j, however, does have a MacWilliams identity for any xed vector . k;i n n n n n Acknowledgements I thank Jehoshua Bruck and Andrew Odlyzko for interesting discussions related to this paper. References [1] E. Allender, A note on the power of threshold circuits. In Proc. 30 Ann. IEEE Symp. Foundations of Computer Science, 1989, 580-584. th [2] D. A. Barrington, Bounded width, polynomial size branching programs recognize exactly those languages in NC. In Proc. 18 Ann. ACM Symp. Theory of Computing, 1986, 1-5. th [3] R. Beigel, \Why do extra majority gates help?" In Proc. 24 Ann. CM Symp. Theory of Computing, 1992, 450-454. th [4] D.A. Barrington, R. Beigel, S. Rudich, Representing Boolean functions as polynomials modulo composite numbers. In Proc. 24 Ann. CM Symp. Theory of Computing, 1992, 455-461. th [5] Y. Brandman, A. Orlitsky, J. Hennessy, A spectral lower bound technique for the size of decision trees and two-level and-or circuits, In IEEE Trans. on Computers, 39 (2), 1990, 282-287. [6] J. Bruck, Harmonic analysis of polynomial threshold functions, In SIAM Journal of Discrete Mathematics, 3 (2), 1990, 168-177. 22 Sitharam [7] M. Bellare, A technique for upper bounding the spectral norm with applications to learning. In Proc. 5th Ann. IEEE Symp. Computational Learning Theory (COLT), 1992, 62-70. [8] R. Boppana, M. Sipser, The complexity of nite functions. Technical Report, Massachussetts Institute of Technology, Laboratory for Computer Sciences, MIT/LCS/TM-405, 1989. [9] J. Bruck, R. Smolensky, Polynomial Threshold Functions, AC0 Functions and Spectral Norms. SIAM J. Computing, 21 (1), 1992, 33-42. [10] H. Dym, H.P. McKean, Fourier series and integrals. Probability and Mathematical Statistics series, Academic Press, 1972. [11] M. Dowd, M. Sitharam, Shannon bounds for functions over ZZ2n . Technical Report 90-12-1, Kent State University, Department of Mathematics and Computer Science, 1990. [12] M. Furst, J. Jackson, S. Smith, Improved learning of AC 0 functions. In Proc. 5th Ann. IEEE Symp. on Computational Learning Theory (COLT), 1992, 317-325. [13] M. Furst, J. Saxe, M. Sipser, Parity, circuits and the polynomial time hierarchy. Mathematical Systems Theory, 17, 1984, 17-27. [14] M. Goldman, J. Hastad, A.A. Razborov, Majority gates vs. general weighted threshold gates. In J. Computational Complexity, 2, 1992, 277-300. [15] M. Goldman, J. Hastad, On the power of small depth threshold circuits, In Proc. 31st Ann. IEEE Symp. Foundations of Computer Science, 1990, 610-618. [16] V. Grolmusz Harmonic analysis, real approximation and communication complexity of Boolean functions, Manuscript, 1994. [17] J. Hastad, Computational limitations of small depth circuits. Ph. D thesis, Massachussetts Institute of Technology press, 1986. [18] A. Hajnal, W. Maass, P. Pudla k, M. Szegedy, G. Tura n, Threshold circuits of bounded depth. In Proc. 28th Ann. IEEE Symp. Foundations of Computer Science, 1987, 99-110. [19] J. Kahn, J. Kalai, N. Linial, The in uence of variables on Boolean functions. In Proc. 29th Ann. IEEE Symp. Foundations of Computing (FOCS), 1988, 68-80. [20] E. Kushilevitz, Y. Mansour, Learning decision trees using the Fourier transform. SIAM Journal on Computing, 22 (6), 1993, 1331-1348. Spectral norms for Boolean functions 23 [21] M. Krause, Geometric arguments yield better bounds for threshold circuits and distributed computing, In Proc. 6th Ann. IEEE Symp. in Structure in Complexity Theory, 1991, 314-322. [22] M. Krause, S. Waack, Variation ranks of communication matrices and lower bounds for depth two circuits having symmetric gates and unbounded fan-in, In Proc. 32nd Ann. IEEE Symp. in Foundations of Computer Science, 1991, 777-782. [23] N. Linial, Y. Mansour, N. Nisan, Constant depth circuits, Fourier transforms, and learnability. To appear in JACM; In Proc. 30th Ann. IEEE Symp. on Foundations of Computing (FOCS), 1989, 574-579. [24] N. Linial, N. Nisan, Approximate inclusion-exclusion. Combinatorica 10 (4), 1990, 349-365. [25] F.J. MacWiliams, N.J.A. Sloane, The theory of error-correcting codes. North Holland, 1977. [26] N. Nisan, A.W. Widgerson, Hardness vs. randomness. To appear in JCSS; In Proc. 29th Ann. IEEE Symp. on Foundations of Computing, 1988, 2-11. [27] N. Nisan, M. Szegedy, On the degree of Boolean functions as real polynomials. In Proc. 24th Ann. ACM Symp. Theory of Computing, 1992, 462-467. [28] R. Paturi, On the degree of polynomials that approximate symmetric Boolean functions. In Proc. 24th Ann. ACM Symp. Theory of Computing, 1992, 468-474. [29] R. Paturi, M. Saks, Threshold circuits for parity. In Proc. 31st Ann. IEEE Symp. on Foundations of Computing, 1990, 397-404. [30] A.A. Razborov, Lower bounds on the monotone complexity of some Boolean functions. Soviet Mathematics Doklady, 31, 1985, 354-357. [31] A.A. Razborov, Lower bounds on the size of circuits of bounded depth with basis f^; g. In Math. notes of the Aca. of Science of the USSR, 41 (4), 1987, 333-338. [32] M. Sitharam Pseudorandom generators and learning algorithms for AC 0. In Proc. 26th Ann. ACM Symp. Theory of Computing, 1994, 478-488; to appear in Computational Complexity. [33] R. Smolensky, Algebraic methods in the theory of lower bounds for boolean circuit complexity, In Proc. 19th Ann. ACM Symp. Theory of Computing, 1987, 77-82. 24 Sitharam [34] K.I. Siu, J. Bruck, On the power of threshold circuits with small weights. SIAM Journal of Discrete Mathematics, 4 (3), 1991, 423-435. [35] A.C. Yao, Lower bounds by probabilistic arguments. In Proc. 24th Ann. IEEE Symp. Foundations of Computing, 1983, 420-428. [36] A.C. Yao, Separating the polynomial time hierarchy by oracles. In Proc. 26th Ann. IEEE Symp. on Foundations of Computing , 1985, 1-10. [37] A.C. Yao, Circuits and local computation, In Proc. 21st Ann. ACM Symp. Theory of Computing, 1989, 186-196. [38] A.C. Yao, On ACC and threshold circuits, In Proc. 31st Ann. IEEE Symp. on Foundations of Computing, 1990, 619-627. Manuscript received 10 January 1994 Meera Sitharam Department of Mathematics and Computer Sciences Kent State University Kent, OH 44240, USA sitharam@mcs.kent.edu

Log In

Evaluating spectral norms for constant depth circuits with symmetric gates

Related papers

Related papers

Related topics