Estimating Differential Entropy using Recursive Copula Splitting
<p>A schematic sketch of the proposed method. (<b>a</b>) A sample of 1000 points from a 2D Gaussian distribution. The blue lines depict the empirical density (obtained using uniform bins). (<b>b</b>) Following the rank transform (numbering the sorted data in each dimension), the same data provides samples for the copula in <math display="inline"><semantics> <msup> <mrow> <mo>[</mo> <mn>0</mn> <mo>,</mo> <mn>1</mn> <mo>]</mo> </mrow> <mn>2</mn> </msup> </semantics></math>. Splitting the data according to the median in one of the axes (always at 0.5) yields (<b>c</b>) (left half) and (<b>d</b>) (right half). The blue lines depict the empirical density in each half. They continue recursively.</p> "> Figure 2
<p>Estimating the entropy for given analytically-computable examples (dashed red line) with compact distributions (<math display="inline"><semantics> <msup> <mrow> <mo>[</mo> <mn>0</mn> <mo>,</mo> <mn>1</mn> <mo>]</mo> </mrow> <mi>D</mi> </msup> </semantics></math>). Black: Using the recursive copula splitting method, blue: <span class="html-italic">k</span>DP, green: <span class="html-italic">k</span>NN, and magenta: Lossless compression (magenta). (<b>Left</b>): The estimated entropy as a function of dimension. (<b>Right</b>): Running times (on a log-log scale), showing only relevant methods. The number of samples is <math display="inline"><semantics> <mrow> <mi>N</mi> <mo>=</mo> <mn>10,000</mn> <msup> <mi>D</mi> <mn>2</mn> </msup> </mrow> </semantics></math>. See also <a href="#entropy-22-00236-t001" class="html-table">Table 1</a> and <a href="#entropy-22-00236-t002" class="html-table">Table 2</a> for detailed numerical results with <math display="inline"><semantics> <mrow> <mi>D</mi> <mo>=</mo> <mn>10</mn> </mrow> </semantics></math> and 20.</p> "> Figure 3
<p>Estimating the entropy for given analytically-computable examples (dashed red line) with non-compact distributions. Black: Using the recursive copula splitting method, blue: <span class="html-italic">k</span>DP, green: <span class="html-italic">k</span>NN, and magenta: Lossless compression (magenta). (<b>Left</b>): The estimated entropy as a function of dimension. (<b>Right</b>): Running times (on a log-log scale), showing only relevant methods. The number of samples is <math display="inline"><semantics> <mrow> <mi>N</mi> <mo>=</mo> <mn>10,000</mn> <msup> <mi>D</mi> <mn>2</mn> </msup> </mrow> </semantics></math>. The inaccuracy of our and the <span class="html-italic">k</span>NN method is primarily due to the relatively small number of samples. See also <a href="#entropy-22-00236-t001" class="html-table">Table 1</a> and <a href="#entropy-22-00236-t002" class="html-table">Table 2</a> for detailed numerical results with <math display="inline"><semantics> <mrow> <mi>D</mi> <mo>=</mo> <mn>10</mn> </mrow> </semantics></math> and 20.</p> "> Figure 4
<p>Convergence rates of CADEE: The average absolute value of the error as a function of <span class="html-italic">N</span>. (<b>Left</b>): <math display="inline"><semantics> <mrow> <mi>D</mi> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math>. (<b>Right</b>): <math display="inline"><semantics> <mrow> <mi>D</mi> <mo>=</mo> <mn>5</mn> </mrow> </semantics></math>.</p> "> Figure A1
<p>Numerical evaluation of the cumulative distribution function for the entropy of two scalar, independent, uniformly distributed random variables. After scaling with the sample size, we find that <math display="inline"><semantics> <mrow> <mi>P</mi> <mo>(</mo> <mi>H</mi> <msup> <mi>N</mi> <mrow> <mo>−</mo> <mn>0.62</mn> </mrow> </msup> <mo><</mo> <mo>−</mo> <mn>0.75</mn> <mo>)</mo> </mrow> </semantics></math> is approximately 0.05. Hence, it can be considered as a statistics for accepting the hypothesis that the random variables are independent.</p> ">
Abstract
:1. Introduction
2. CADEE Method
- C1: A uniform distribution;
- C2: Dependent pairs. The dimensions are divided into pairs. The density in each pair is , supported on . Different pairs are independent;
- C3: Independent boxes. Uniform density in a set consisting of D small hypercubes, .
- UB1: Gaussian distribution. The covariance is chosen to be a randomly rotated diagonal matrix with eigenvalues , . Then, the samples are rotated to a random orthonormal basis in . The support of the distribution is ;
- UB2: Power-law distribution. Each dimension k is sampled independently from a density , in . Then, the samples are rotated to a random orthonormal basis in . The support of the distribution is a fraction of that is not aligned with the principal axes.
- A lossless compression approach [6,7]. Following [6], samples are binned into 256 equal bins in each dimension, and the data is converted into a matrix of 8-bit unsigned integers. The matrix is compressed using the Lempel–Ziv–Welch (LZW) algorithm (implemented in Matlab’s imwrite function to a gif file). In order to estimate the entropy, the file size is interpolated linearly between a constant matrix (minimal entropy) and a random matrix with independent uniformly distributed values (maximal entropy), both of the same size.
3. Convergence Analysis
3.1. Analytical Example
3.2. Analytical Bound
3.3. Numerical Examples
4. Implementation Details
Algorithm 1 Recursive entropy estimator |
|
- The rank of an array x is the order in which values appear. Since the support of all marginals in the copula is , we take . For example, . This implies that the minimal and maximal samples are not mapped into , which would artificially change the support of the distribution. The rank transformation is easily done using sorting;
- 1D entropy: One-dimensional entropy of compact distributions (whose support is ) is estimated using a histogram with uniformly spaced bins. The number of bins can be taken to depend of N, and order is typically used (we used or for spacing or bin-based methods, respectively. For additional considerations and methods for choosing the number of bins see [34]. At the first iteration, the distribution may not be compact, and the entropy is estimated using -spacings (see [8], Equation (16));
- Finding blocks in the adjacency matrix A: Let A be a matrix whose entries are 0 and 1, where implies that and are independent. By construction, A is symmetric. Let D denote the diagonal matrix whose diagonal elements are the sums of rows of A. Then, is the Laplacian associated with the graph described by A. In particular, the sum of all rows of L is zero. We seek a rational basis for the kernel of a matrix L: Let ker(L) denote the kernel of a matrix L. By a rational basis we mean an orthogonal basis (for ker(L)), in which all the coordinates are either 0 or 1 and the number of 1’s is minimal. In each vector in the basis, components with 1’s form a cluster (or block), which is pair-wise independent of all other marginals. In Matlab, this can be obtained using the command null(L,’r’). For example, consider the adjacency matrix:A rational basis for the kernel of L (which is 2D) is:
- Calculate the Spearman correlation matrix of the samples , denoted R. Note that this is the same as the Pearson correlation matrix of the ranked data ;
- Assuming normality and independence (which does not hold), the distribution of elements in R is asymptotically given by the t-distribution with degrees of freedom. Denoting the CDF of the t-distribution with n degrees of freedom by , two marginals are considered uncorrelated if , where is the acceptance threshold. We take the standard . Note that because we do tests, the probability of observing independent vectors by chance grows with D. This can be corrected by looking at the statistics of the maximal value for R (in absolute value), which tends to a Gumbel distribution [35]. This approach (using Gumbel) is not used because below we also consider independence between blocks;
- Pairwise independence using mutual information: Two 1D RVs X and Y are independent if and only if their mutual information vanishes, [10]. In our case, the marginals are and , hence . This suggests a statistical test for the hypothesis that X and Y are independent as follows. Suppose X and Y are independent. Draw N independent samples and plot the density of the 2D entropy . For a given acceptance threshold , find the cutoff value such that . Figure A1 shows the distribution for different values of N. With , the cutoff can be approximated by . Accordingly, any pair of marginals which were found to be statistically uncorrelated, are also tested for independence using they mutual information (see below);
- 2D entropy: Two-dimensional entropy (which, in our case, is always compact with support ) is estimated using a 2D histogram with uniformly spaced bins in each dimension.
- Sorting of 1D samples: In the first level, samples may be unbounded and sorting can cost . However, for the next levels, the samples are approximately uniformly distributed in and bucket sort works with an average cost of . This is multiplied by the number of levels, which is . As all D dimensions need to be sorted, the overall cost of sorting is ;
- Calculating 1D entropies. Since the data is already sorted, calculating the entropy using either binning or spacing has a cost per dimension, per level. Overall ;
- Pairwise correlations: pre-sorted pairs, each costs per level. Overall ;
- Pairwise entropy: The worst-case is that all pairs are uncorrelated but dependent, which implies that all pairwise mutual information need to be calculated at all levels. However, pre-sorting again reduces the cost of calculating histograms to per level. With levels, the cost is .
5. Summary
Author Contributions
Funding
Conflicts of Interest
Appendix A. Additional Pseudo-Code Used for Numerical Examples
Algorithm A1 Estimation of 1D entropy |
|
Algorithm A2 Check for pairwise independence |
|
Algorithm A3 Estimation of 2D entropy |
|
References
- Kwak, N.; Choi, C.-H. Input feature selection by mutual information based on parzen window. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 1667–1671. [Google Scholar] [CrossRef] [Green Version]
- Kerroum, M.A.; Hammouch, A.; Aboutajdine, D. Textural feature selection by joint mutual information based on gaussian mixture model for multispectral image classification. Pattern Recognit. Lett. 2010, 31, 1168–1174. [Google Scholar] [CrossRef]
- Zhu, S.; Wang, D.; Yu, K.; Li, T.; Gong, Y. Feature selection for gene expression using model-based entropy. IEEE/ACM Trans. Comput. Biol. Bioinform. 2008, 7, 25–36. [Google Scholar]
- Faivishevsky, L.; Goldberger, J. ICA based on a smooth estimation of the differential entropy. In Proceedings of the Advances in Neural Information Processing Systems 21 (NIPS 2008), Vancouver, BC, Canada, 8–10 December 2008; pp. 433–440. [Google Scholar]
- Calsaverini, R.S.; Vicente, R. An information-theoretic approach to statistical dependence: Copula information. Europhys. Lett. 2009, 88, 68003. [Google Scholar] [CrossRef] [Green Version]
- Avinery, R.; Kornreich, M.; Beck, R. Universal and accessible entropy estimation using a compression algorithm. Phys. Rev. Lett. 2019, 123, 178102. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Martiniani, S.; Chaikin, P.M.; Levine, D. Quantifying hidden order out of equilibrium. Phys. Rev. X 2019, 9, 011031. [Google Scholar] [CrossRef] [Green Version]
- Beirlant, J.; Dudewicz, E.J.; Györfi, L.; Van der Meulen, E.C. Nonparametric entropy estimation: An overview. Int. J. Math. Stat. Sci. 1997, 6, 17–39. [Google Scholar]
- Paninski, L. Estimation of entropy and mutual information. Neural Comput. 2003, 15, 1191–1253. [Google Scholar] [CrossRef] [Green Version]
- Granger, C.; Lin, J.L. Using the mutual information coefficient to identify lags in nonlinear models. J. Time Ser. Anal. 1994, 15, 371–384. [Google Scholar] [CrossRef]
- Sricharan, K.; Raich, R.; Hero, A.O., III. Empirical estimation of entropy functionals with confidence. arXiv 2010, arXiv:1012.4188. [Google Scholar]
- Darbellay, G.A.; Vajda, I. Estimation of the information by an adaptive partitioning of the observation space. IEEE Trans. Inf. Theory 1999, 45, 1315–1321. [Google Scholar] [CrossRef] [Green Version]
- Stowell, D.; Plumbley, M.D. Fast multidimensional entropy estimation by k-d partitioning. IEEE Signal Process. Lett. 2009, 16, 537–540. [Google Scholar] [CrossRef]
- Kozachenko, L.; Leonenko, N.N. Sample estimate of the entropy of a random vector. Probl. Peredachi Informatsii 1987, 23, 9–16. [Google Scholar]
- Kraskov, A.; Stögbauer, H.; Grassberger, P. Estimating mutual information. Phys. Rev. E 2004, 69, 066138. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Gao, W.; Oh, S.; Viswanath, P. Density functional estimators with k-nearest neighbor bandwidths. In Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT), Aachen, Germany, 25–30 June 2017; pp. 1351–1355. [Google Scholar]
- Lord, W.M.; Sun, J.; Bollt, E.M. Geometric k-nearest neighbor estimation of entropy and mutual information. Chaos 2018, 28, 033114. [Google Scholar] [CrossRef] [Green Version]
- Joe, H. Estimation of entropy and other functionals of a multivariate density. Ann. Inst. Stat. Math. 1989, 41, 683. [Google Scholar] [CrossRef]
- Singh, H.; Misra, N.; Hnizdo, V.; Fedorowicz, A.; Demchuk, E. Nearest neighbor estimates of entropy. Am. J. Math. Manag. Sci. 2003, 23, 301–321. [Google Scholar] [CrossRef]
- Shwartz, S.; Zibulevsky, M.; Schechner, Y.Y. Fast kernel entropy estimation and optimization. Signal Process. 2005, 85, 1045–1058. [Google Scholar] [CrossRef] [Green Version]
- Ozertem, U.; Uysal, I.; Erdogmus, D. Continuously differentiable sample-spacing entropy estimates. IEEE Trans. Neural Netw. 2008, 19, 1978–1984. [Google Scholar] [CrossRef]
- Gao, W.; Oh, S.; Viswanath, P. Breaking the bandwidth barrier: Geometrical adaptive entropy estimation. In Proceedings of the Advances in Neural Information Processing Systems 29 (NIPS 2016), Barcelona, Spain, 5–10 December 2016; pp. 2460–2468. [Google Scholar]
- Indyk, P.; Kleinberg, R.; Mahabadi, S.; Yuan, Y. Simultaneous nearest neighbor search. arXiv 2016, arXiv:1604.02188. [Google Scholar]
- Miller, E.G. A new class of entropy estimators for multi-dimensional densities. In Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, Hong Kong, China, 6–10 April 2003. [Google Scholar]
- Sricharan, K.; Wei, D.; Hero, A.O. Ensemble estimators for multivariate entropy estimation. IEEE Trans. Inf. Theory 2013, 59, 4374–4388. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Jaworski, P.; Durante, F.; Hardle, W.K.; Rychlik, T. Copula Theory and Its Applications; Springer: New York, NY, USA, 2010. [Google Scholar]
- Durante, F.; Sempi, C. Copula theory: An introduction. In Copula Theory and Its Applications; Springer: New York, NY, USA, 2010; pp. 3–33. [Google Scholar]
- Giraudo, M.T.; Sacerdote, L.; Sirovich, R. Non–parametric estimation of mutual information through the entropy of the linkage. Entropy 2013, 15, 5154–5177. [Google Scholar] [CrossRef] [Green Version]
- Hao, Z.; Singh, V.P. Integrating entropy and copula theories for hydrologic modeling and analysis. Entropy 2015, 17, 2253–2280. [Google Scholar] [CrossRef] [Green Version]
- Xue, T. Transfer entropy estimation via copula. Adv. Eng. Res. 2017, 138, 887. [Google Scholar]
- Embrechts, P.; Hofert, M. Statistical inference for copulas in high dimensions: A simulation study. Astin Bull. J. IAA 2013, 43, 81–95. [Google Scholar] [CrossRef] [Green Version]
- Dan Stowell. k-d Partitioning Entropy Estimator: A Fast Estimator for the Entropy of Multidimensional Data Distributions. Available online: https://github.com/danstowell/kdpee (accessed on 16 February 2020).
- Kalle Rutanen. TIM, A C++ Library for Efficient Estimation of Information-Theoretic Measures from Time-Series’ in Arbitrary Dimensions. Available online: https://kaba.hilvi.org/homepage/main.htm (accessed on 16 February 2020).
- Knuth, K.H. Optimal data-based binning for histograms. arXiv 2006, arXiv:physics/0605197. [Google Scholar]
- Han, F.; Chen, S.; Liu, H. Distribution-free tests of independence in high dimensions. Biometrika 2017, 104, 813–828. [Google Scholar] [CrossRef]
Example | Exact | CADEE | kDP | kNN | Compression |
---|---|---|---|---|---|
C1—uniform | 0 | 0.81 | |||
C2—pairs | 0.30 | ||||
C3—boxes | |||||
UB1—Gauss | 9.1 | 5.1 | |||
UB2—power-law | 12.6 | 15.7 | 92.3 | 14.7 | 67.2 |
Example | Exact | CADEE | kDP | kNN | Compression |
---|---|---|---|---|---|
C1—uniform | 0 | 3.3 | |||
C2—pairs | 2.3 | ||||
C3—boxes | |||||
UB1—Gauss | 18.6 | 5.0 | |||
UB2—power-law | 30.2 | 47.2 | 296.6 | 40.3 | 131.6 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ariel, G.; Louzoun, Y. Estimating Differential Entropy using Recursive Copula Splitting. Entropy 2020, 22, 236. https://doi.org/10.3390/e22020236
Ariel G, Louzoun Y. Estimating Differential Entropy using Recursive Copula Splitting. Entropy. 2020; 22(2):236. https://doi.org/10.3390/e22020236
Chicago/Turabian StyleAriel, Gil, and Yoram Louzoun. 2020. "Estimating Differential Entropy using Recursive Copula Splitting" Entropy 22, no. 2: 236. https://doi.org/10.3390/e22020236
APA StyleAriel, G., & Louzoun, Y. (2020). Estimating Differential Entropy using Recursive Copula Splitting. Entropy, 22(2), 236. https://doi.org/10.3390/e22020236