Probabilistic Modeling with Matrix Product States
<p>A bird’s eye view of the training dynamics of exact single-site DMRG on the unit sphere. (<b>a</b>) The initial vector <math display="inline"><semantics> <msub> <mi>ψ</mi> <mn>0</mn> </msub> </semantics></math> and the vector <math display="inline"><semantics> <msub> <mi>ψ</mi> <mover accent="true"> <mi>π</mi> <mo>^</mo> </mover> </msub> </semantics></math> lie in the unit sphere of <math display="inline"><semantics> <mrow> <mi mathvariant="script">H</mi> </mrow> </semantics></math>. (<b>b</b>) The vector <math display="inline"><semantics> <msub> <mi>ψ</mi> <mn>0</mn> </msub> </semantics></math> is used to define the subspace <math display="inline"><semantics> <msub> <mi mathvariant="script">H</mi> <mn>1</mn> </msub> </semantics></math>. The unit vectors in <math display="inline"><semantics> <msub> <mi mathvariant="script">H</mi> <mn>1</mn> </msub> </semantics></math> define a lower dimensional sphere in <math display="inline"><semantics> <mrow> <mi mathvariant="script">H</mi> </mrow> </semantics></math> (in blue). The vector <math display="inline"><semantics> <msub> <mi>ψ</mi> <mn>1</mn> </msub> </semantics></math> is the vector in that sphere that is closest to <math display="inline"><semantics> <msub> <mi>ψ</mi> <mover accent="true"> <mi>π</mi> <mo>^</mo> </mover> </msub> </semantics></math>. (<b>c</b>) The vector <math display="inline"><semantics> <msub> <mi>ψ</mi> <mn>1</mn> </msub> </semantics></math> is used to define the subspace <math display="inline"><semantics> <msub> <mi mathvariant="script">H</mi> <mn>2</mn> </msub> </semantics></math>. The unit sphere in <math display="inline"><semantics> <msub> <mi mathvariant="script">H</mi> <mn>2</mn> </msub> </semantics></math> (in blue) contains <math display="inline"><semantics> <msub> <mi>ψ</mi> <mn>1</mn> </msub> </semantics></math> but does not contain <math display="inline"><semantics> <msub> <mi>ψ</mi> <mn>0</mn> </msub> </semantics></math>. The vector <math display="inline"><semantics> <msub> <mi>ψ</mi> <mn>2</mn> </msub> </semantics></math> is the unit vector in <math display="inline"><semantics> <msub> <mi mathvariant="script">H</mi> <mn>2</mn> </msub> </semantics></math> closest to <math display="inline"><semantics> <msub> <mi>ψ</mi> <mover accent="true"> <mi>π</mi> <mo>^</mo> </mover> </msub> </semantics></math>. (<b>d</b>) The vector <math display="inline"><semantics> <msub> <mi>ψ</mi> <mn>2</mn> </msub> </semantics></math> is used to define the subspace <math display="inline"><semantics> <msub> <mi mathvariant="script">H</mi> <mn>3</mn> </msub> </semantics></math>. The vector <math display="inline"><semantics> <msub> <mi>ψ</mi> <mn>3</mn> </msub> </semantics></math> is the unit vector in <math display="inline"><semantics> <msub> <mi mathvariant="script">H</mi> <mn>3</mn> </msub> </semantics></math> closest to <math display="inline"><semantics> <msub> <mi>ψ</mi> <mover accent="true"> <mi>π</mi> <mo>^</mo> </mover> </msub> </semantics></math>. And so on.</p> "> Figure 2
<p>A representative bias-variance tradeoff curve showing negative log-likelihood (base 2) as a function of bond dimension for exact single-site DMRG on the <math display="inline"><semantics> <msub> <mi>P</mi> <mn>20</mn> </msub> </semantics></math> dataset. For bond dimension 3, the generalization gap is approximately <math display="inline"><semantics> <mrow> <mi>ϵ</mi> <mo>=</mo> <mn>0.0237</mn> </mrow> </semantics></math>. For reference, the uniform distribution on bitstrings has NLL of 20. Memorizing the training data would yield a NLL of approximately <math display="inline"><semantics> <mrow> <mn>13.356</mn> </mrow> </semantics></math>.</p> "> Figure 3
<p>A representative bias-variance tradeoff curve showing negative log-likelihood (base 2) as a function of bond dimension for exact single-site DMRG on the div7 dataset. For bond dimension 8, the generalization gap is approximately <math display="inline"><semantics> <mrow> <mi>ϵ</mi> <mo>=</mo> <mn>0.032</mn> </mrow> </semantics></math>. For reference, the uniform distribution on bitstrings has NLL of 20, the target distribution has a NLL of <math display="inline"><semantics> <mrow> <mn>17.192</mn> </mrow> </semantics></math>, and memorizing the training data would yield a NLL of approximately <math display="inline"><semantics> <mrow> <mn>13.87</mn> </mrow> </semantics></math>.</p> "> Figure A1
<p>The shaded region represents the model class <math display="inline"><semantics> <mrow> <mi mathvariant="script">M</mi> </mrow> </semantics></math>. The red points all lie in <math display="inline"><semantics> <msub> <mi mathvariant="script">H</mi> <mrow> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> </semantics></math>. The vector <math display="inline"><semantics> <msub> <mover accent="true"> <mi>ψ</mi> <mo>˜</mo> </mover> <mrow> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> </semantics></math> is defined to be the unit vector in <math display="inline"><semantics> <msub> <mi mathvariant="script">H</mi> <mrow> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> </semantics></math> closest to the target <math display="inline"><semantics> <msub> <mi>ψ</mi> <mover accent="true"> <mi>π</mi> <mo>^</mo> </mover> </msub> </semantics></math>. Note that <math display="inline"><semantics> <msub> <mover accent="true"> <mi>ψ</mi> <mo>˜</mo> </mover> <mrow> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> </semantics></math> does not lie in <math display="inline"><semantics> <mrow> <mi mathvariant="script">M</mi> </mrow> </semantics></math>. The vector <math display="inline"><semantics> <msubsup> <mi>ψ</mi> <mrow> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> <mi>SVD</mi> </msubsup> </semantics></math> is defined to be the vector in <math display="inline"><semantics> <mrow> <mi mathvariant="script">M</mi> <mo>∩</mo> <msub> <mi mathvariant="script">H</mi> <mrow> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> </mrow> </semantics></math> closest to to <math display="inline"><semantics> <msub> <mover accent="true"> <mi>ψ</mi> <mo>˜</mo> </mover> <mrow> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> </semantics></math>. In this picture, <math display="inline"><semantics> <mrow> <mrow> <mo>∥</mo> </mrow> <msubsup> <mi>ψ</mi> <mrow> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> <mi>SVD</mi> </msubsup> <mo>−</mo> <msub> <mi>ψ</mi> <mover accent="true"> <mi>π</mi> <mo>^</mo> </mover> </msub> <mrow> <mo>∥</mo> <mo>></mo> <mo>∥</mo> </mrow> <msub> <mi>ψ</mi> <mi>t</mi> </msub> <mo>−</mo> <msub> <mi>ψ</mi> <mover accent="true"> <mi>π</mi> <mo>^</mo> </mover> </msub> <mrow> <mo>∥</mo> <mo>.</mo> </mrow> </mrow> </semantics></math> There may be a point, such as the one labelled <math display="inline"><semantics> <msubsup> <mi>ψ</mi> <mrow> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> <mi>better</mi> </msubsup> </semantics></math>, which lies in <math display="inline"><semantics> <mrow> <mi mathvariant="script">M</mi> <mo>∩</mo> <msub> <mi mathvariant="script">H</mi> <mrow> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> </mrow> </semantics></math> and is closer to <math display="inline"><semantics> <msub> <mi>ψ</mi> <mover accent="true"> <mi>π</mi> <mo>^</mo> </mover> </msub> </semantics></math> than <math display="inline"><semantics> <msubsup> <mi>ψ</mi> <mrow> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> <mi>SVD</mi> </msubsup> </semantics></math>, notwithstanding the fact that is is further from <math display="inline"><semantics> <msub> <mover accent="true"> <mi>ψ</mi> <mo>˜</mo> </mover> <mrow> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> </semantics></math>. This figure, to scale, depicts a scenario in which <math display="inline"><semantics> <mrow> <mrow> <mo>∥</mo> </mrow> <msub> <mi>ψ</mi> <mi>t</mi> </msub> <mo>−</mo> <msub> <mi>ψ</mi> <mover accent="true"> <mi>π</mi> <mo>^</mo> </mover> </msub> <mrow> <mo>∥</mo> </mrow> </mrow> </semantics></math> = 0.09, <math display="inline"><semantics> <mrow> <mrow> <mo>∥</mo> </mrow> <msubsup> <mi>ψ</mi> <mrow> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> <mi>SVD</mi> </msubsup> <mo>−</mo> <msub> <mi>ψ</mi> <mover accent="true"> <mi>π</mi> <mo>^</mo> </mover> </msub> <mrow> <mo>∥</mo> </mrow> </mrow> </semantics></math> = 0.10, <math display="inline"><semantics> <mrow> <mrow> <mo>∥</mo> </mrow> <msubsup> <mi>ψ</mi> <mrow> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> <mi>better</mi> </msubsup> <mo>−</mo> <msub> <mi>ψ</mi> <mover accent="true"> <mi>π</mi> <mo>^</mo> </mover> </msub> <mrow> <mo>∥</mo> </mrow> </mrow> </semantics></math> = 0.07, <math display="inline"><semantics> <mrow> <mrow> <mo>∥</mo> </mrow> <msub> <mover accent="true"> <mi>ψ</mi> <mo>˜</mo> </mover> <mrow> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mo>−</mo> <msub> <mi>ψ</mi> <mover accent="true"> <mi>π</mi> <mo>^</mo> </mover> </msub> <mrow> <mo>∥</mo> </mrow> </mrow> </semantics></math> = 0.06, <math display="inline"><semantics> <mrow> <mrow> <mo>∥</mo> </mrow> <msubsup> <mi>ψ</mi> <mrow> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> <mi>SVD</mi> </msubsup> <mo>−</mo> <msub> <mover accent="true"> <mi>ψ</mi> <mo>˜</mo> </mover> <mrow> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mrow> <mo>∥</mo> </mrow> </mrow> </semantics></math> = 0.07, and <math display="inline"><semantics> <mrow> <mrow> <mo>∥</mo> </mrow> <msubsup> <mi>ψ</mi> <mrow> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> <mi>better</mi> </msubsup> <mo>−</mo> <msub> <mover accent="true"> <mi>ψ</mi> <mo>˜</mo> </mover> <mrow> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mrow> <mo>∥</mo> </mrow> </mrow> </semantics></math> = 0.08.</p> ">
Abstract
:1. Introduction
2. The Problem Formulation
3. Outline of Our Approach to Solving the Problem
4. Effective Versions of the Problem
5. The Exact Single-Site DMRG Algorithm
6. Experiments
7. Discussion
8. Conclusions and Outlook
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
Appendix A. Multi-Site DMRG
- Use to define an isometric embedding with
- Let be the unit vector in closest to .
- Perform a model repair of to obtain a vector There are multiple ways to do the model repair.
References
- Biamonte, J.; Wittek, P.; Pancotti, N.; Rebentrost, P.; Wiebe, N.; Lloyd, S. Quantum machine learning. Nature 2017, 549, 195. [Google Scholar] [CrossRef]
- Preskill, J. Quantum Computing in the NISQ era and beyond. Quantum 2018, 2, 79. [Google Scholar] [CrossRef]
- Peruzzo, A.; McClean, J.; Shadbolt, P.; Yung, M.H.; Zhou, X.Q.; Love, P.J.; Aspuru-Guzik, A.; O’brien, J.L. A variational eigenvalue solver on a photonic quantum processor. Nat. Commun. 2014, 5, 4213. [Google Scholar] [CrossRef] [Green Version]
- Farhi, E.; Goldstone, J.; Gutmann, S. A quantum approximate optimization algorithm. arXiv 2014, arXiv:1411.4028. [Google Scholar]
- Huggins, W.; Patil, P.; Mitchell, B.; Whaley, K.B.; Stoudenmire, E.M. Towards quantum machine learning with tensor networks. Quantum Sci. Technol. 2019, 4, 024001. [Google Scholar] [CrossRef] [Green Version]
- Liu, J.G.; Wang, L. Differentiable learning of quantum circuit Born machines. Phys. Rev. A 2018, 98, 062324. [Google Scholar] [CrossRef] [Green Version]
- Benedetti, M.; Garcia-Pintos, D.; Perdomo, O.; Leyton-Ortega, V.; Nam, Y.; Perdomo-Ortiz, A. A generative modeling approach for benchmarking and training shallow quantum circuits. npj Quantum Inf. 2019, 5, 45. [Google Scholar] [CrossRef]
- Du, Y.; Hsieh, M.H.; Liu, T.; Tao, D. The expressive power of parameterized quantum circuits. arXiv 2018, arXiv:1810.11922. [Google Scholar]
- Killoran, N.; Bromley, T.R.; Arrazola, J.M.; Schuld, M.; Quesada, N.; Lloyd, S. Continuous-variable quantum neural networks. Phys. Rev. Res. 2019, 1, 033063. [Google Scholar] [CrossRef] [Green Version]
- Shepherd, D.; Bremner, M.J. Temporally unstructured quantum computation. Proc. R. Soc. A Math. Phys. Eng. Sci. 2015, 465. [Google Scholar] [CrossRef]
- Romero, E.; Mazzanti Castrillejo, F.; Delgado, J.; Buchaca, D. Weighted Contrastive Divergence. Neural Netw. 2018. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Shalev-Shwartz, S.; Shamir, O.; Shammah, S. Failures of Gradient-Based Deep Learning. In Proceedings of the 34th International Conference on Machine Learning (ICML 2017), Sydney, Australia, 6–11 August 2017; pp. 3067–3075. [Google Scholar]
- Han, Z.Y.; Wang, J.; Fan, H.; Wang, L.; Zhang, P. Unsupervised generative modeling using matrix product states. Phys. Rev. X 2018, 8, 031012. [Google Scholar] [CrossRef] [Green Version]
- Cheng, S.; Wang, L.; Xiang, T.; Zhang, P. Tree tensor networks for generative modeling. Phys. Rev. B 2019, 99, 155131. [Google Scholar] [CrossRef] [Green Version]
- Farhi, E.; Neven, H. Classification with quantum neural networks on near term processors. arXiv 2018, arXiv:1802.06002. [Google Scholar]
- Stoudenmire, E.M. The Tensor Network. 2019. Available online: https://tensornetwork.org (accessed on 13 February 2019).
- Schollwöck, U. The density-matrix renormalization group in the age of matrix product states. Ann. Phys. 2011, 326, 96–192. [Google Scholar] [CrossRef] [Green Version]
- Bridgeman, J.C.; Chubb, C.T. Hand-waving and interpretive dance: An introductory course on tensor networks. J. Phys. A Math. Theor. 2017, 50, 223001. [Google Scholar] [CrossRef] [Green Version]
- Orús, R. A practical introduction to tensor networks: Matrix product states and projected entangled pair states. Ann. Phys. 2014, 349, 117–158. [Google Scholar] [CrossRef] [Green Version]
- Glasser, I.; Sweke, R.; Pancotti, N.; Eisert, J.; Cirac, J.I. Expressive power of tensor-network factorizations for probabilistic modeling, with applications from hidden Markov models to quantum machine learning. arXiv 2019, arXiv:1907.03741. [Google Scholar]
- Montúfar, G.F.; Rauh, J.; Ay, N. Expressive Power and Approximation Errors of Restricted Boltzmann Machines. In Proceedings of the 24th International Conference on Neural Information Processing Systems (NIPS’11), Granada, Spain, 12–15 December 2011; Curran Associates Inc.: Red Hook, NY, USA, 2011; pp. 415–423. [Google Scholar]
- Amin, M.H.; Andriyash, E.; Rolfe, J.; Kulchytskyy, B.; Melko, R. Quantum boltzmann machine. Phys. Rev. X 2018, 8, 021050. [Google Scholar] [CrossRef] [Green Version]
- Kappen, H.J. Learning quantum models from quantum or classical data. arXiv 2018, arXiv:1803.11278. [Google Scholar]
- Bradley, T.D.; Stoudenmire, E.M.; Terilla, J. Modeling Sequences with Quantum States: A Look Under the Hood. arXiv 2019, arXiv:1910.07425. [Google Scholar]
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Stokes, J.; Terilla, J. Probabilistic Modeling with Matrix Product States. Entropy 2019, 21, 1236. https://doi.org/10.3390/e21121236
Stokes J, Terilla J. Probabilistic Modeling with Matrix Product States. Entropy. 2019; 21(12):1236. https://doi.org/10.3390/e21121236
Chicago/Turabian StyleStokes, James, and John Terilla. 2019. "Probabilistic Modeling with Matrix Product States" Entropy 21, no. 12: 1236. https://doi.org/10.3390/e21121236