Our main result is a formulation and proof of the reverse hypercontractive inequality in the sum-... more Our main result is a formulation and proof of the reverse hypercontractive inequality in the sum-of-squares (SOS) proof system. As a consequence we show that for any constant $0 < \gamma \leq 1/4$, the SOS/Lasserre SDP hierarchy at degree $4\lceil \frac{1}{4\gamma}\rceil$ certifies the statement "the maximum independent set in the Frankl--R\"odl graph $\mathrm{FR}^{n}_{\gamma}$ has fractional size~$o(1)$". Here $\mathrm{FR}^{n}_{\gamma} = (V,E)$ is the graph with $V = \{0,1\}^n$ and $(x,y) \in E$ whenever $\Delta(x,y) = (1-\gamma)n$ (an even integer). In particular, we show the degree-$4$ SOS algorithm certifies the chromatic number lower bound "$\chi(\mathrm{FR}^{n}_{1/4}) = \omega(1)$", even though $\mathrm{FR}^{n}_{1/4}$ is the canonical integrality gap instance for which standard SDP relaxations cannot even certify "$\chi(\mathrm{FR}^{n}_{1/4}) > 3$". Finally, we also give an SOS proof of (a generalization of) the sharp $(2,q)$-hypercontractive inequality for any even integer $q$.
Page 1. Testing (Subclasses of) Halfspaces Kevin Matulef1, Ryan O&amp;#x27;Donnell2, Ronitt R... more Page 1. Testing (Subclasses of) Halfspaces Kevin Matulef1, Ryan O&amp;#x27;Donnell2, Ronitt Rubinfeld3, and Rocco Servedio4 ... Journal of the ACM 45, 653750 (1998) 9. Hajnal, A., Maass, W., Pudlak, P., Szegedy, M., Turan, G.: Threshold circuits of bounded depth. ...
We consider the problem of learning mixtures of product distributions over discrete domains in th... more We consider the problem of learning mixtures of product distributions over discrete domains in the distribution learning framework introduced by Kearns et al. [18]. We give a poly(n= ) time algorithm for learning a mixture of k arbitrary product distributions over the n-dimensional Boolean cube f0; 1g, to accuracy, for any constant k. Previous polynomial time algorithms could only achieve this for k = 2 product distributions; our result answers an open question stated independently in [8] and [14]. We further give evidence that no polynomial time algorithm can succeed when k is superconstant, by reduction from a notorious open problem in PAC learning. Finally, we generalize our poly(n= ) time algorithm to learn any mixture of k = O(1) product distributions over f0; 1; : : : ; bg,, for any b = O(1).
2013 IEEE 54th Annual Symposium on Foundations of Computer Science, 2013
ABSTRACT Let S = X1+···+Xn be a sum of n independent integer random variables Xi, where each Xi i... more ABSTRACT Let S = X1+···+Xn be a sum of n independent integer random variables Xi, where each Xi is supported on {0, 1, ..., k - 1} but otherwise may have an arbitrary distribution (in particular the Xi&#39;s need not be identically distributed). How many samples are required to learn the distribution S to high accuracy? In this paper we show that the answer is completely independent of n, and moreover we give a computationally efficient algorithm which achieves this low sample complexity. More precisely, our algorithm learns any such S to ε-accuracy (with respect to the total variation distance between distributions) using poly(k, 1/ε) samples, independent of n. Its running time is poly(k, 1/ε) in the standard word RAM model. Thus we give a broad generalization of the main result of [DDS12b] which gave a similar learning result for the special case k = 2 (when the distribution S is a Poisson Binomial Distribution). Prior to this work, no nontrivial results were known for learning these distributions even in the case k = 3. A key difficulty is that, in contrast to the case of k = 2, sums of independent {0, 1, 2}-valued random variables may behave very differently from (discretized) normal distributions, and in fact may be rather complicated - they are not log-concave, they can be Θ(n)-modal, there is no relationship between Kolmogorov distance and total variation distance for the class, etc. Nevertheless, the heart of our learning result is a new limit theorem which characterizes what the sum of an arbitrary number of arbitrary independent {0, 1, ... , k-1}-valued random variables may look like. Previous limit theorems in this setting made strong assumptions on the “shift invariance” of the random variables Xi in order to force a discretized normal limit. We believe that our new limit theorem, as the first result for truly arbitrary sums of independent {0, 1, ... - k-1}-valued random variables, is of independent interest.
Our main result is a formulation and proof of the reverse hypercontractive inequality in the sum-... more Our main result is a formulation and proof of the reverse hypercontractive inequality in the sum-of-squares (SOS) proof system. As a consequence we show that for any constant $0 < \gamma \leq 1/4$, the SOS/Lasserre SDP hierarchy at degree $4\lceil \frac{1}{4\gamma}\rceil$ certifies the statement "the maximum independent set in the Frankl--R\"odl graph $\mathrm{FR}^{n}_{\gamma}$ has fractional size~$o(1)$". Here $\mathrm{FR}^{n}_{\gamma} = (V,E)$ is the graph with $V = \{0,1\}^n$ and $(x,y) \in E$ whenever $\Delta(x,y) = (1-\gamma)n$ (an even integer). In particular, we show the degree-$4$ SOS algorithm certifies the chromatic number lower bound "$\chi(\mathrm{FR}^{n}_{1/4}) = \omega(1)$", even though $\mathrm{FR}^{n}_{1/4}$ is the canonical integrality gap instance for which standard SDP relaxations cannot even certify "$\chi(\mathrm{FR}^{n}_{1/4}) > 3$". Finally, we also give an SOS proof of (a generalization of) the sharp $(2,q)$-hypercontractive inequality for any even integer $q$.
Page 1. Testing (Subclasses of) Halfspaces Kevin Matulef1, Ryan O&amp;#x27;Donnell2, Ronitt R... more Page 1. Testing (Subclasses of) Halfspaces Kevin Matulef1, Ryan O&amp;#x27;Donnell2, Ronitt Rubinfeld3, and Rocco Servedio4 ... Journal of the ACM 45, 653750 (1998) 9. Hajnal, A., Maass, W., Pudlak, P., Szegedy, M., Turan, G.: Threshold circuits of bounded depth. ...
We consider the problem of learning mixtures of product distributions over discrete domains in th... more We consider the problem of learning mixtures of product distributions over discrete domains in the distribution learning framework introduced by Kearns et al. [18]. We give a poly(n= ) time algorithm for learning a mixture of k arbitrary product distributions over the n-dimensional Boolean cube f0; 1g, to accuracy, for any constant k. Previous polynomial time algorithms could only achieve this for k = 2 product distributions; our result answers an open question stated independently in [8] and [14]. We further give evidence that no polynomial time algorithm can succeed when k is superconstant, by reduction from a notorious open problem in PAC learning. Finally, we generalize our poly(n= ) time algorithm to learn any mixture of k = O(1) product distributions over f0; 1; : : : ; bg,, for any b = O(1).
2013 IEEE 54th Annual Symposium on Foundations of Computer Science, 2013
ABSTRACT Let S = X1+···+Xn be a sum of n independent integer random variables Xi, where each Xi i... more ABSTRACT Let S = X1+···+Xn be a sum of n independent integer random variables Xi, where each Xi is supported on {0, 1, ..., k - 1} but otherwise may have an arbitrary distribution (in particular the Xi&#39;s need not be identically distributed). How many samples are required to learn the distribution S to high accuracy? In this paper we show that the answer is completely independent of n, and moreover we give a computationally efficient algorithm which achieves this low sample complexity. More precisely, our algorithm learns any such S to ε-accuracy (with respect to the total variation distance between distributions) using poly(k, 1/ε) samples, independent of n. Its running time is poly(k, 1/ε) in the standard word RAM model. Thus we give a broad generalization of the main result of [DDS12b] which gave a similar learning result for the special case k = 2 (when the distribution S is a Poisson Binomial Distribution). Prior to this work, no nontrivial results were known for learning these distributions even in the case k = 3. A key difficulty is that, in contrast to the case of k = 2, sums of independent {0, 1, 2}-valued random variables may behave very differently from (discretized) normal distributions, and in fact may be rather complicated - they are not log-concave, they can be Θ(n)-modal, there is no relationship between Kolmogorov distance and total variation distance for the class, etc. Nevertheless, the heart of our learning result is a new limit theorem which characterizes what the sum of an arbitrary number of arbitrary independent {0, 1, ... , k-1}-valued random variables may look like. Previous limit theorems in this setting made strong assumptions on the “shift invariance” of the random variables Xi in order to force a discretized normal limit. We believe that our new limit theorem, as the first result for truly arbitrary sums of independent {0, 1, ... - k-1}-valued random variables, is of independent interest.
Uploads
Papers