Quantifying Synergistic Information Using Intermediate Stochastic Variables †
<p>The values of the two MSRVs <math display="inline"> <semantics> <mrow> <msub> <mi>S</mi> <mn>1</mn> </msub> </mrow> </semantics> </math> and <math display="inline"> <semantics> <mrow> <msub> <mi>S</mi> <mn>2</mn> </msub> </mrow> </semantics> </math> which are mutually independent but highly synergistic about two 3-valued variables <math display="inline"> <semantics> <mrow> <msub> <mi>X</mi> <mn>1</mn> </msub> </mrow> </semantics> </math> and <math display="inline"> <semantics> <mrow> <msub> <mi>X</mi> <mn>2</mn> </msub> </mrow> </semantics> </math>. <math display="inline"> <semantics> <mrow> <msub> <mi>X</mi> <mn>1</mn> </msub> </mrow> </semantics> </math> and <math display="inline"> <semantics> <mrow> <msub> <mi>X</mi> <mn>2</mn> </msub> </mrow> </semantics> </math> are uniformly distributed and independent.</p> "> Figure 2
<p>Two independent input bits and as output the XOR-gate. (<b>a</b>) The relation <math display="inline"> <semantics> <mrow> <msub> <mi>Y</mi> <mo>⊕</mo> </msub> <mo>=</mo> <msub> <mi>X</mi> <mn>1</mn> </msub> <mo>⊕</mo> <msub> <mi>X</mi> <mn>2</mn> </msub> </mrow> </semantics> </math>. In (<b>b</b>) an additional input bit is added which copies the XOR output, adding individual (unique) information <math display="inline"> <semantics> <mrow> <mi>I</mi> <mrow> <mo>(</mo> <mrow> <msub> <mi>X</mi> <mn>3</mn> </msub> <mo>:</mo> <mi>Y</mi> </mrow> <mo>)</mo> </mrow> <mo>=</mo> <mn>1</mn> </mrow> </semantics> </math>.</p> "> Figure 3
<p>Effectiveness of the numerical implementation to find a single SRV. The input consists of two variables with 2, 3, 4, or 5 possible values each (<span class="html-italic">x</span>-axis). Red line with dots: probability that an SRV could be found with at most 10% relative error in 50 randomly generated Pr(<span class="html-italic">X</span><sub>1</sub>,<span class="html-italic">X</span><sub>2</sub>,<span class="html-italic">Y</span>) distributions. The fact that it is lowest for binary variables is consistent with the observation that perfect orthogonal decomposition is impossible in this case under at least one known condition (<a href="#secAdot6-entropy-19-00085" class="html-sec">Appendix A.6</a>). The fact that it converges to 1 is consistent with our suggestion that orthogonal decomposition could be possible for continuous variables (<a href="#sec7dot1-entropy-19-00085" class="html-sec">Section 7.1</a>). Blue box plot: expected relative error of the entropy of a single SRV, once successfully found.</p> "> Figure 4
<p>Synergistic entropy of a single SRV normalized by the theoretical upper bound. The input consists of two randomly generated stochastic variables with 2, 3, 4, or 5 possible values per variable (<span class="html-italic">x</span>-axis). The SRV is constrained to have the same number of possible values. The initial downward trend shows that individual SRVs become less efficient in storing synergistic information as the state space per variable grows. The apparent settling to a non-zero constant suggests that estimating synergistic information does not require a diverging number of SRVs to be found for any number of values per variable.</p> "> Figure 5
<p>Left: The median relative change of the mutual information <math display="inline"> <semantics> <mrow> <mi>I</mi> <mrow> <mo>(</mo> <mrow> <msub> <mi>X</mi> <mn>1</mn> </msub> <mo>,</mo> <msub> <mi>X</mi> <mn>2</mn> </msub> <mo>:</mo> <mi>Y</mi> </mrow> <mo>)</mo> </mrow> </mrow> </semantics> </math> after perturbing a single input variable’s marginal distribution <math display="inline"> <semantics> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <mrow> <msub> <mi>X</mi> <mn>1</mn> </msub> </mrow> <mo>)</mo> </mrow> </mrow> </semantics> </math> (“local” perturbation). Error bars indicate the 25th and 75th percentiles. A perturbation is implemented by adding a random vector with norm 0.1 to the point in unit hypercube that defines the marginal distribution <math display="inline"> <semantics> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <mrow> <msub> <mi>X</mi> <mn>1</mn> </msub> </mrow> <mo>)</mo> </mrow> </mrow> </semantics> </math>. Each bar is based on 100 randomly generated joint distributions <math display="inline"> <semantics> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <mrow> <msub> <mi>X</mi> <mn>1</mn> </msub> <mo>,</mo> <msub> <mi>X</mi> <mn>2</mn> </msub> <mo>,</mo> <mi>Y</mi> </mrow> <mo>)</mo> </mrow> </mrow> </semantics> </math>, where in the synergistic case <math display="inline"> <semantics> <mi>Y</mi> </semantics> </math> is constrained to be an SRV of <math display="inline"> <semantics> <mrow> <msub> <mi>X</mi> <mn>1</mn> </msub> <mo>,</mo> <msub> <mi>X</mi> <mn>2</mn> </msub> </mrow> </semantics> </math>. Right: the same as left except that the perturbation is “non-local” in the sense that it is applied to <math display="inline"> <semantics> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <mrow> <msub> <mi>X</mi> <mn>2</mn> </msub> <mrow> <mo>|</mo> <mrow> <msub> <mi>X</mi> <mn>1</mn> </msub> </mrow> </mrow> </mrow> <mo>)</mo> </mrow> </mrow> </semantics> </math> while keeping <math display="inline"> <semantics> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <mrow> <msub> <mi>X</mi> <mn>1</mn> </msub> </mrow> <mo>)</mo> </mrow> </mrow> </semantics> </math> and <math display="inline"> <semantics> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <mrow> <msub> <mi>X</mi> <mn>2</mn> </msub> </mrow> <mo>)</mo> </mrow> </mrow> </semantics> </math> unchanged.</p> "> Figure 6
<p>The conditional probabilities of an SRV conditioned on two independent binary inputs. Here e.g., <math display="inline"> <semantics> <mrow> <mo>∑</mo> <msub> <mi>a</mi> <mi>i</mi> </msub> <mo>=</mo> <mn>1</mn> </mrow> </semantics> </math> and <span class="html-italic">a<sub>i</sub></span> denotes the probability that <span class="html-italic">S</span> equals to state <span class="html-italic">i</span> in casein case <math display="inline"> <semantics> <mrow> <msub> <mi>X</mi> <mn>1</mn> </msub> <mo>=</mo> <msub> <mi>X</mi> <mn>2</mn> </msub> <mo>=</mo> <mn>0</mn> </mrow> </semantics> </math>.</p> ">
Abstract
:1. Introduction
2. Definitions
2.1. Preliminaries
Definition 1: Orthogonal Decomposition
2.2. Proposed Framework
2.2.1. Synergistic Random Variable
2.2.2. Maximally Synergistic Random Variables
2.2.3. Synergistic Entropy of
2.2.4. Orthogonalized SRVs
2.2.5. Total Synergistic Information
Outline of Intuition of the Proposed Definition
3. Basic Properties
3.1. Non-Negativity
3.2. Upper-Bounded by Mutual Information
3.3. Equivalence Class of Reordering in Arguments
3.4. Zero Synergy about a Single Variable
3.5. Zero Synergy in a Single Variable
3.6. Identity Maximizes Synergistic Information
4. Consequential Properties
4.1. Upper Bound on the Mutual Information of an SRV
4.2. Non-Equivalence of SRVs
4.3. Synergy among MSRVs
4.4. XOR-Gates of Random Binary Inputs Always Form an MSRV
5. Examples
5.1. Two Independent Bits and XOR
5.2. XOR-Gate and Redundant Input
5.3. AND-Gate
6. Numerical Implementation
6.1. Success Rate and Accuracy of Finding SRVs
6.2. Efficiency of a Single SRV
6.3. Resilience Implication of Synergy
7. Limitations
7.1. Orthogonal Decomposition
7.1.1. Related Literature on Decomposing Correlated Variables
7.1.2. Sufficiency of Decomposition
7.1.3. Satisfiability of Decomposition
8. Discussion
9. Conclusions
Acknowledgments
Author Contributions
Conflicts of Interest
Appendix A
A.1. Upper Bound of Possible Entropy of an SRV by Induction
A.1.1. Base Case
A.1.2. Induction Step
A.2. Does Not “Overcount” Any Synergistic Information
A.2.1. Also Includes Non-Synergistic Information
A.3. Synergy Measure Correctly Handles Synergy-of-Synergies among SRVs
A.3.1. Synergy among SRVs Forms a Clique
A.3.2. Generalize to Partial Synergy among SRVs
A.4. SRVs of Two Independent Binary Variables are Always XOR Gates
A.5. Independence of the Two Decomposed Parts
A.6. Impossibility of Decomposition for Binary Variables
A.7. Wyner’s Common Variable Satisfies Orthogonal Decomposition if
A.8. Use-Case of Estimating Synergy Using the Provided Code
- from jointpdf import JointProbabilityMatrix
- # randomly generated joint probability mass function p(A,B)
- # of 2 discrete stochastic variables, each having 3 possible values
- p_AB = JointProbabilityMatrix(2,3)
- # append a third variable C which is deterministically computed from A and B, i.e., such that I(A,B:C)=H(C)
- p_AB.append_redundant_variables(1)
- p_ABC = p_AB # rename for clarity
- # compute the information synergy that C contains about A and B
- p_ABC.synergistic_information([2], [0,1])
References
- Shannon, C.E.; Weaver, W. Mathematical Theory of Communication; University Illinois Press: Champaign, IL, USA, 1963. [Google Scholar]
- Watanabe, S. Information theoretical analysis of multivariate correlation. IBM J. Res. Dev. 1960, 4, 66–82. [Google Scholar] [CrossRef]
- Williams, P.L.; Beer, R.D. Nonnegative decomposition of multivariate information. arXiv 2010. [Google Scholar]
- Timme, N.; Alford, W.; Flecker, B.; Beggs, J.M. Synergy, redundancy, and multivariate information measures: An experimentalist’s perspective. J. Comput. Neurosci. 2014, 36, 119–140. [Google Scholar] [CrossRef] [PubMed]
- Schneidman, E.; Bialek, W.; Berry, M.J. Synergy, redundancy, and independence in population codes. J. Neurosci. 2003, 23, 11539–11553. [Google Scholar] [PubMed]
- Griffith, V.; Chong, E.K.P.; James, R.G.; Ellison, C.J.; Crutchfield, J.P. Intersection Information Based on Common Randomness. Entropy 2014, 16, 1985–2000. [Google Scholar] [CrossRef]
- Lizier, J.T.; Flecker, B.; Williams, P.L. Towards a synergy-based approach to measuring information modification. In Proceedings of the 2013 IEEE Symposium on Artificial Life (ALIFE), Singapore, 16–19 April 2013; pp. 43–51.
- Bertschinger, N.; Rauh, J.; Olbrich, E.; Jost, J. Shared Information—New Insights and Problems in Decomposing Information in Complex Systems. In Proceedings of the European Conference on Complex Systems 2012; Springer: Cham, Switzerland, 2013; pp. 251–269. [Google Scholar]
- Olbrich, E.; Bertschinger, N.; Rauh, J. Information Decomposition and Synergy. Entropy 2015, 17, 3501–3517. [Google Scholar] [CrossRef]
- Griffith, V.; Koch, C. Quantifying synergistic mutual information. In Guided Self-Organization: Inception; Springer: Berlin/Heidelberg, Germany, 2014; pp. 159–190. [Google Scholar]
- Brukner, C.; Zukowski, M.; Zeilinger, A. The essence of entanglement. arXiv 2001. [Google Scholar]
- Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley-Interscience: Hoboken, NJ, USA, 1991; Volume 6. [Google Scholar]
- Wibral, M.; Lizier, J.T.; Priesemann, V. Bits from Brains for Biologically Inspired Computing. Front. Robot. AI 2015. [Google Scholar] [CrossRef]
- Bertschinger, N.; Rauh, J.; Olbrich, E.; Jost, J.; Ay, N. Quantifying unique information. Entropy 2014, 16, 2161–2183. [Google Scholar] [CrossRef]
- Wyner, A. The common information of two dependent random variables. IEEE Trans. Inf. Theory 1975, 21, 163–179. [Google Scholar] [CrossRef]
- Witsenhausen, H.S. Values and bounds for the common information of two discrete random variables. SIAM J. Appl. Math. 1976, 31, 313–333. [Google Scholar] [CrossRef]
- Xu, G.; Liu, W.; Chen, B. Wyners common information for continuous random variables—A lossy source coding interpretation. In Proceedings of the 45th Annual Conference on Information Sciences and Systems (CISS), Baltimore, MD, USA, 23–25 March 2011; pp. 1–6.
- Gács, P.; Körner, J. Common information is far less than mutual information. Probl. Control Inf. Theory 1973, 2, 149–162. [Google Scholar]
- Karhunen, K. Zur Spektraltheorie Stochastischer Prozesse; Finnish Academy of Science and Letters: Helsinki, Finland, 1946. (In Germany) [Google Scholar]
- Loeve, M. Probability Theory, 4th ed.; Graduate Texts in Mathematics (Book 46); Springer: New York, NY, USA, 1994; Volume 2. [Google Scholar]
- Phoon, K.K.; Huang, H.W.; Quek, S.T. Simulation of strongly non-Gaussian processes using Karhunen–Loeve expansion. Probabilistic Eng. Mech. 2005, 20, 188–198. [Google Scholar] [CrossRef]
0 | 0 | 0 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 1 | 0 |
© 2017 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Quax, R.; Har-Shemesh, O.; Sloot, P.M.A. Quantifying Synergistic Information Using Intermediate Stochastic Variables. Entropy 2017, 19, 85. https://doi.org/10.3390/e19020085
Quax R, Har-Shemesh O, Sloot PMA. Quantifying Synergistic Information Using Intermediate Stochastic Variables. Entropy. 2017; 19(2):85. https://doi.org/10.3390/e19020085
Chicago/Turabian StyleQuax, Rick, Omri Har-Shemesh, and Peter M. A. Sloot. 2017. "Quantifying Synergistic Information Using Intermediate Stochastic Variables" Entropy 19, no. 2: 85. https://doi.org/10.3390/e19020085