Article
Open access
Published: 12 June 2024

Neural network approach to quasiparticle dispersions in doped antiferromagnets

Communications Physics volume 7, Article number: 187 (2024) Cite this article

1673 Accesses
6 Citations
3 Altmetric
Metrics details

Subjects

Abstract

Numerically simulating large, spinful, fermionic systems is of great interest in condensed matter physics. However, the exponential growth of the Hilbert space dimension with system size renders exact quantum state parameterizations impractical. Owing to their representative power, neural networks often allow to overcome this exponential scaling. Here, we investigate the ability of neural quantum states (NQS) to represent the bosonic and fermionic t − J model – the high interaction limit of the Hubbard model – on various 1D and 2D lattices. Using autoregressive, tensorized recurrent neural networks (RNNs), we study ground state representations upon hole doping the half-filled system. Additionally, we propose a method to calculate quasiparticle dispersions, applicable to any network architecture or lattice geometry, and allowing to infer the low-energy physics from NQS. By analyzing the strengths and weaknesses of the RNN ansatz we shed light on the challenges and promises of NQS for simulating bosonic and fermionic systems.

Spectroscopy of two-dimensional interacting lattice electrons using symmetry-aware neural backflow transformations

Article Open access 30 January 2025

Fermionic neural-network states for ab-initio electronic structure

Article Open access 12 May 2020

Deep neural networks as variational solutions for correlated open quantum systems

Article Open access 07 August 2024

Introduction

The simulation of quantum systems has remained a persistent challenge until today, primarily due to the exponential growth of the Hilbert space, making it exceedingly difficult to parameterize the wave functions of large systems using exact methods. Since the seminal work of Carleo and Troyer¹, the idea of using neural networks to simulate quantum systems^1,2,3,4,5 has been applied successfully for a large number of quantum systems, leveraging various neural network architectures. These architectures include restricted Boltzmann machines⁶, convolutional neural networks (CNNs)⁷, group CNNs⁸, autoencoders⁹ as well as autoregressive neural networks such as recurrent neural networks (RNNs)^{5,10,11,12,13,14}, with neural network representations of both amplitude and phase distributions of the quantum state under consideration. These neural quantum states (NQS) make use of the innate ability of neural networks to efficiently represent probability distributions. When applying them to represent quantum systems, this ability can help to reduce the number of parameters required to encode the system.

Despite their representative power, NQS have been shown to face challenges during the training process, for example, when they are trained to minimize the energy, i.e. to represent ground states. This results from the intricate nature of the loss landscape, characterized by numerous saddle points and local minima that complicate the search for the global minimum¹⁵. One promising avenue to overcome this problem is the use of many uncorrelated samples during the training. This strategy is facilitated when using autoregressive neural networks^16,17, allowing to directly sample from the wave functions’ amplitudes. Autoregressive networks have already been applied in the physics context^18,19, such as for the variational simulation of spin systems^11,12,13,14.

Many works have so far focused on NQS representations of spin systems at half-filling, revealing that NQS can be used to study a variety of phenomena that are relevant to state-of-the-art research, as e.g. shown for RNN representations on various lattice geometries, including frustrated spin systems^11,20, and systems with topological order²¹. For all of these systems, the physics becomes even richer when introducing mobile impurities, e.g. holes, into the system, yielding a competition between the magnetic background and the kinetic energy of the impurity. Simulating such systems holds particular relevance for understanding high-temperature superconductivity, where the superconducting dome arises upon doping the antiferromagnetic half-filled state with holes²². The search for NQS that are capable of representing such spinful fermionic systems is still in its early stages. In recent years, the first NQS have been developed that obey the fermionic statistics, simulating molecules^{23,24,25,26,27}, spinless fermions¹⁷ and spinful fermions^28,29,30,31. Among those architectures are FermiNet^23,24, Slater-Jastrow ansätze^17,28 or variants of Jordan-Wigner transformations^25,29,32.

Here, we use an autoregressive neural network architecture, supplemented with a Jordan-Wigner transformation, to simulate ground and excited states of the high interaction limit of the Fermi-Hubbard model, believed to capture essential features of high-temperature cuprate superconductors. Specifically, we use a tensorized 2D version of an RNN wave function¹¹ with gated recurrent units^33,34,35, proven to successfully model spin systems^{10,11,20,21,36,37}. In the remainder of this paper, we will discuss the system under investigation, namely the bosonic and fermionic t − J model, followed by the presentation of a scheme for the calculation of dispersion relations from any NQS architecture, tested for quasiparticle dispersions of 1D and 2D t − J systems. We find that the RNN succeeds in accurately capturing the features of the considered low-energy states. Lastly, we discuss the performance of the RNN ansatz and identify the strengths and bottlenecks of this ansatz.

Results

We apply the RNN to simulate ground and excited states of the fermionic (bosonic) t − J model, both in one and two dimensions. In its more generalized form, known as the fermionic (bosonic) t − XXZ model, with anisotropic superexchange interactions denoted as J_z and J_±, the Hamiltonian under consideration reads as follows:

$${{{{{{{{\mathcal{H}}}}}}}}}_{t{{{{{{{\rm{XXZ}}}}}}}}}= -t \, {\sum}_{\langle {{{{{{{\boldsymbol{i}}}}}}}},{{{{{{{\boldsymbol{j}}}}}}}}\rangle ,\sigma }{{{{{{{{\mathcal{P}}}}}}}}}_{G}\left({\hat{c}}_{{{{{{{{\boldsymbol{i}}}}}}}},\sigma }^{{{{\dagger}}} }{\hat{c}}_{{{{{{{{\boldsymbol{j}}}}}}}},\sigma }+{{{{{{{\rm{h.c.}}}}}}}}\right){{{{{{{{\mathcal{P}}}}}}}}}_{G}\\ +{J}_{z}{\sum}_{\langle {{{{{{{\boldsymbol{i}}}}}}}},{{{{{{{\boldsymbol{j}}}}}}}}\rangle }\left({\hat{S}}_{{{{{{{{\boldsymbol{i}}}}}}}}}^{z}\cdot {\hat{S}}_{{{{{{{{\boldsymbol{j}}}}}}}}}^{z}-\frac{1}{4}{\hat{n}}_{{{{{{{{\boldsymbol{i}}}}}}}}}{\hat{n}}_{{{{{{{{\boldsymbol{j}}}}}}}}}\right)\\ +{J}_{\pm }{\sum}_{\langle {{{{{{{\boldsymbol{i}}}}}}}},{{{{{{{\boldsymbol{j}}}}}}}}\rangle }\frac{1}{2}\left({\hat{S}}_{{{{{{{{\boldsymbol{i}}}}}}}}}^{+}\cdot {\hat{S}}_{{{{{{{{\boldsymbol{j}}}}}}}}}^{-}+{\hat{S}}_{{{{{{{{\boldsymbol{i}}}}}}}}}^{-}\cdot {\hat{S}}_{{{{{{{{\boldsymbol{j}}}}}}}}}^{+}\right),$$

(1)

with the fermionic (bosonic) creation and annihilation operators ${\hat{c}}_{{{{{{{{\boldsymbol{i}}}}}}}},\sigma }^{{{{\dagger}}} }$ and ${\hat{c}}_{{{{{{{{\boldsymbol{i}}}}}}}},\sigma }$ for particles at site i with spin σ, spin (pseudospin) operators ${\hat{{{{{{{{\boldsymbol{S}}}}}}}}}}_{{{{{{{{\boldsymbol{i}}}}}}}}}={\sum }_{\sigma ,{\sigma }^{{\prime} }}{\hat{c}}_{{{{{{{{\boldsymbol{i}}}}}}}},\sigma }^{{{{\dagger}}} }\frac{1}{2}{{{{{{{{\boldsymbol{\sigma }}}}}}}}}_{\sigma {\sigma }^{{\prime} }}{\hat{c}}_{{{{{{{{\boldsymbol{i}}}}}}}},{\sigma }^{{\prime} }}$ and density operators ${\hat{n}}_{{{{{{{{\boldsymbol{i}}}}}}}}}$³⁸. Furthermore, ${{{{{{{{\mathcal{P}}}}}}}}}_{G}$ projects out states with more than one particle per site. Note that for a single hole, this single occupancy constraint leads to the same statistics for the bosonic and fermionic models. For J_z = J_±, Eq. (1) reduces to the t − J model and for J_± = 0 to the t − J_z model. Despite its relevance for the study of unconventional superconductivity, whether there are actually superconducting phases in the t − J and the Fermi Hubbard model is still under debate^39,40,41. While finding NQS representations of superconducting states, or – more broadly – pairing wave functions^{28,30,42,43,44}, is still a topic of current research, here, we focus on the capacity of the RNN ansatz to represent ground and excited states of the t − J model and not on their superconducting properties.

In the absence of doping (${\hat{n}}_{{{{{{{{\boldsymbol{i}}}}}}}}}=1$), Eq. (1) reduces to the XXZ model or, in the case of J_z = J ± , the Heisenberg model. Prior studies have already utilized RNNs to simulate these spin models^20,45, with the possibility of rendering the model stoquastic by making use of the Marshall sign rule⁴⁶. This is done by implementing the sign rule directly in the RNN architecture²⁰, yielding a simplified optimization procedure of the wave functions’ phase. In contrast, we do not implement any bias on the phase of the quantum state, in order to make our architecture applicable to any number of holes in the system.

When the ground state at ${\hat{n}}_{{{{{{{{\boldsymbol{i}}}}}}}}}=1$ is doped with a single hole, the resulting mobile impurity gets dressed with a cloud of magnetic excitations. This yields the formation of a magnetic polaron, which has already been observed in ultracold atom experiments⁴⁷. Its properties strongly depend on the spin background, see Fig. 1a and b. Upon further doping, the strong correlations in the model make the simulation of the Fermi-Hubbard or t − J models numerically challenging, despite impressive numerical advances in the past years^39,48,49,50: Commonly used methods all come with their specific limitations, e.g. density matrix renormalization group^51,52 is limited by the area-law of entanglement, making it challenging to apply this methods to 2D or higher dimensions. Finally, the calculation of spectral functions or the dispersion relations E(k)⁵³, as exemplary shown in Fig. 1, is of great interest for many fields in physics to reveal the emergent physics of a system under investigation. In condensed matter physics, they are typically used to infer the dominating excitations in the ground state or higher energy states, e.g. upon doping the system. This information is contained in specific features of the spectra, e.g. the bandwidth of the quasiparticle dispersion E(k). However, the calculation of spectra or dispersions E(k) is in general, computationally costly using conventional methods, e.g. density-matrix renormalization group (DMRG) simulations^54,55: The former typically involves a, in general expensive, time-evolution of the state, and the latter the calculation of a global operator, the momentum k, which is typically very costly for matrix-product-states. Our RNN ansatz uses $U(1)=U{(1)}_{\hat{N}}\times U{(1)}_{{\hat{S}}_{z}}$ symmetry, i.e. conserved total particle and total magnetization^{10,11,20,25,45,56}. Further details on the RNN architecture can be found in Methods.

**Fig. 1: Results for the t − J and t − J_z square lattice with 10 × 4 sites, t/J_z = 3 and open boundaries in x, periodic boundaries in y direction.**

NQS dispersion relations

Here, we calculate the dispersion relations E(k) in t − XXZ systems with different dimensions and different lattice geometries using NQS. The method that we propose is applicable to any NQS architecture since the momentum is enforced via a momentum constraint in the cost function, forcing the system to a specific target momentum k_target within the dispersion, see Fig. 2 and Methods. This is in contrast to previous works^26,57,58, where the momentum is used for the definition of the wave function coefficients. Hence, the scheme only requires the possibility to draw samples from the NQS and calculate the respective probabilities, making the calculation of E_NQS(k_x, k_y) computationally efficient, and allows to use pretrained NQS from the ground state as a starting point for the momentum training. Furthermore, the scheme can also be combined with spatial symmetries. This could help to improve the accuracy, e.g. when using a NQS with implemented translational invariance. Moreover, additional symmetries could also be used to calculate e.g. m₄ rotational resonances⁵⁹ or to probe the competition between the s, p or d-wave ground state energies⁶⁰. Here, we focus on the peak positions of the quasiparticle spectra. In principle, also the spectral weights could also be accessed by calculating the overlap $\langle {\psi }_{{{{{{{{\boldsymbol{k}}}}}}}}}^{1h}| {\hat{c}}_{{{{{{{{\boldsymbol{k}}}}}}}}}| {\psi }_{0}^{0h}\rangle$ of the momentum eigenstate ${\psi }_{{{{{{{{\boldsymbol{k}}}}}}}}}^{1h}$ with the ground state ${\psi }_{0}^{0h}$ upon removing a particle from the system. To also obtain the peak positions and spectral weights of higher energy states, usually the Green’s function is computed. In the context of NQS, this has been done for a quantum Ising model using time dependent variational Monte Carlo (t − VMC)⁶¹, for J₁ − J₂ and Heisenberg models using an extension of the stochastic reconfiguration algorithm^62,63 or for the Hubbard model using dynamical variational Monte Carlo (VMC)⁶⁴.

**Fig. 2: Calculating dispersion relations from NQS.**

t − XXZ model in 1D

In Fig. 3a the dispersion for a single hole in an antiferromagnetic t − XXZ chain with 20 sites and J_± = 1, J_z = 4 and t = 8, obtained with a 1D RNN and exact diagonalization (ED) is shown. Note that in the case of a single hole, no holes can be exchanged and hence the bosonic and fermionic models are the same. The results for the ground state energy at k_x = 0.5π, obtained during a training with 20000 iterations, and the energies away from the ground state, shown in Fig. 3, are in relatively good agreement with the exact values from ED. However, at some values of k_x ≠ 0.5π it can be seen that the RNN is trapped in local minima close to the ground state. Overall, the RNN succeeds in capturing physical properties like the bandwidth very accurately, revealing the underlying physical excitations:

**Fig. 3: Results for the 1D t − XXZ system with 20 sites and J_± = 1, J_z = 4 and t = 8.**

For the system under consideration, the bandwidth and the shape of the dispersion in Fig. 3a is a result of spin-charge separation in 1D systems. Spin-charge separation denotes the fact that the motion of a hole in such an antiferromagnetic (AFM) spin chain with coupling J_±, J_z ≪ t can be approximated by an almost free hole that is only weakly coupled to the spin chain. Hence, the dispersion in Fig. 3 can be approximated by two separate dispersions; i.e. holon and spinon dispersions. Hereby, the holon is the charge excitation, associated with energy scales t, and the spinon is the spin excitation associated with energy J_⊥, J_z. In ref. ⁵⁹ it is shown that the combined dispersion is

$$E({k}_{x})=-2t\,\cos ({k}_{h})+{J}_{\pm }\,\cos \left(2\Delta k\right)+{J}_{\pm }+{J}_{z},$$

(2)

where Δk = k_x − k_h, k_h is the momentum of the holon and k_x = k_h + k_s is the combined momentum of the holon and spinon. Eq. (2) is denoted by the gray line in Fig. 3. Again, the agreement with the RNN is relatively good.

t − J model on a square lattice

Due to the layered structure of high-T_c superconductors like cuprates²² or nickelates^65,66, the physics of t − J systems upon doping is particularly interesting in 2D. In Figs. 1 and 4, the quasiparticle dispersion for a single hole on 10 × 4 and 4 × 4t − J and t − J_z lattices are presented. In both cases, Figs. 1b and 4b show that the ground state convergence is better for the t − J_z model with relative errors on the order of Δϵ ≈ 10⁻³ for both system sizes, yielding a good agreement with the reference energies from DMRG (10 × 4 system) and ED (4 × 4 system) for all considered energies E(k_x, k_y) away from the ground state. With a relative error of Δϵ ≈ 10⁻², the error of the t − J ground states is above the t − J_z systems, which is also reflected in the accuracy of the dispersion E_RNN(k_x, k_y) in Figs. 1 and 4.

**Fig. 4: Results for the t − J (blue) and t − J_z (red) square lattice with 4 × 4 sites, t/J = 3 and periodic boundaries.**

In contrast to the previous section, there is no spin-charge separation in the strict sense in two-dimensional systems. In the case t ≫ J_± = J_z = : J that we consider here (t/J = 3), the mobile dopant can be described by fractionalized spinons and chargons that are confined by a string-like potential that arises due to the spin background distortion when the dopant moves through the system^67,68. Based on this idea, Laughlin⁶⁹ drew the analogy with the 1D Fermi-Hubbard or t − J systems and suggested that the dispersion in the respective 2D systems can be interpreted in terms of pointlike partons, spinons and chargons, that interact with each other. This parton picture explains that the quasiparticle dispersion for a single hole is dominated the spinon with a bandwidth on the order of J_±, with corrections by the chargon on energy scales of t⁵³. This mechanism also provides the explanation for the flat dispersion for the t − J_z model in contrast to the t − J model, as captued by the RNN, see Figs. 1 and 4. Despite the small deviations from the dispersions calculated with ED or DMRG, our RNN architecture, succeeds in capturing the respective bandwidths of t − J_z and t − J models very accurately, allowing to gain valuable insights on the spinon and chargon physics from the RNN dispersions. Furthermore, the fact that node (π/2, π/2) and antinode (π, 0) are degenerate in the 4 × 4 system is correctly reproduced.

Lastly, we would like to mention that there is a small region of suppressed spectral weight near (π, π) in the DMRG results of the t − J system⁵⁹. This suppression yields difficulties for our RNN scheme that are further discussed in Supplementary Note 3.

t − J model on a triangular lattice

On triangular lattices, the physical phenomena that are observed are distinctly different from the physics of bipartite lattices, due to the notion of frustration and the absence of particle-hole symmetry in non-bipartite lattices, among them e.g. kinetic frustration^70,71. In particular, the underlying constituents upon doping the triangular ladder are not known⁷¹, making the triangular lattice an intriguing system to study. Recent advancements have shown that these lattices can also be studied experimentally using optical triangular lattices^72,73,74 and solid-state platforms based on Moiré heterostructures^75,76,77.

Triangular spin systems have already been studied using RNNs²⁰. Here, we consider a triangular t − J ladder with length L_x = 9, with the quasiparticle dispersion for a single hole and the learning curves with and without doping shown in Fig. 5.

**Fig. 5: Results for the t − J model on a triangular lattice with 9 × 2 sites, t/J = 3 and periodic boundaries along x direction.**

As suggested in ref. ²⁰, we use variational annealing for the training for the triangular lattice that was shown to improve the performance for frustrated systems like the triangular Heisenberg model²⁰, see Methods. In Fig. 5 it can be seen that this procedure yields relatively good results for the ground states, with errors of Δϵ ≈ 0.001 for both N_h = 0 and N_h = 1. For the dispersion shown in Fig. 5a, we consider the momentum k defined along the ladder, as shown in the inset figure. When enforcing k ≠ 0.444π away from the ground state, the exact energy gaps from ED to the first excited states strongly decrease, and the the RNN gets trapped in these states in most cases, in particular for k > 0.444π. Furthermore, the error bars of the enforced momenta are much higher compared to the other lattice geometries that were studied in Figs. 1, 3 and 4, suggesting that the RNN states partly break the translation invariance, and hence challenge the momentum optimization scheme. This is further supported by the relative difference between the wave function amplitudes $\log | {\psi }_{{{{{{{{\boldsymbol{\lambda }}}}}}}}}({\sigma }_{i}){| }^{2}=\log {p}_{{{{{{{{\boldsymbol{\lambda }}}}}}}}}({\sigma }_{i})$ and the respective translated samples $\log | {\psi }_{{{{{{{{\boldsymbol{\lambda }}}}}}}}}({\hat{T}}_{{{{{{{{{\bf{e}}}}}}}}}_{\mu }}{\sigma }_{i}){| }^{2}=\log {p}_{{{{{{{{\boldsymbol{\lambda }}}}}}}}}({\hat{T}}_{{{{{{{{{\bf{e}}}}}}}}}_{\mu }}{\sigma }_{i})$,

$${\Delta }_{{{{{{{{\rm{Transl.}}}}}}}}}^{\mu }=\frac{1}{{N}_{s}}{\sum}_{i}\frac{{\left(\log {p}_{{{{{{{{\boldsymbol{\lambda }}}}}}}}}({\sigma }_{i})-\log {p}_{{{{{{{{\boldsymbol{\lambda }}}}}}}}}\left({\hat{T}}_{{{{{{{{{\bf{e}}}}}}}}}_{\mu }}{\sigma }_{i}\right)\right)}^{2}}{{\left(\log {p}_{{{{{{{{\boldsymbol{\lambda }}}}}}}}}({\sigma }_{i})+\log {p}_{{{{{{{{\boldsymbol{\lambda }}}}}}}}}\left({\hat{T}}_{{{{{{{{{\bf{e}}}}}}}}}_{\mu }}{\sigma }_{i}\right)\right)}^{2}},$$

(3)

which we take as a measure for the violation of the translational invariance at the respective momenta. Figure 5 shows that the momenta with large errors coincident with a large ${\Delta }_{{{{{{{{\rm{Transl.}}}}}}}}}^{y}$.

Performance of the RNN ansatz

The analysis of the quasiparticle dispersions indicate that our bosonic and fermionic RNN ansätze successfully learns the dominating physics in the considered regimes. In the following, we provide a more detailed discussion on the performance of the RNN ansatz for fermionic and bosonic spin systems upon hole doping, focusing on ground states on a 4 × 4 square lattice.

Figure 6 shows the relative error for the ground state energies of the t − J_(z) model obtained with our RNN ansatz upon doping the half-filled system with N_h holes. Starting from N_h = 0, where the t − J reduces to the Heisenberg model, our RNN reaches a relative ground state energy error Δϵ ≈ 10⁻⁴ after 20,000 training steps compared to ED. Fig. 6b shows that the respective phase and amplitude distributions are relatively simple in this case, with a low variance for the logarithmic amplitude and only two values for the phase, 0 and π. Note that when comparing to the literature of ground state representations using RNNs for the Heisenberg model^11,45, the optimization problem in our setup is more challenging due to the following reasons: (i) The RNN that we use has a local Hilbert space dimension of three states instead of two, allowing for all values of N_h in principle. (ii) Our RNN learns the sign structure without any bias, i.e. we do not implement the Marshall sign rule already in the RNN, which would only work for N_h = 0. (iii) We do not include the knowledge of spatial symmetries yet, which can improve the performance as shown in Methods.

Fig. 6: Recurrent neural network (RNN) representation for ground states of the bosonic and fermionic t − J_(z) model with t/J = 3, 0≤N_h≤12 for a 4 × 4 square lattice with open boundaries.

Upon doping, the exact log-amplitude and phase distributions from ED can become more complicated than for the t − J_z model. For example, for N_h = 4, the variance of the exact amplitudes becomes very large, ${\sigma }_{{N}_{h} = 6}^{{{{{{{{\rm{b}}}}}}}}}(\log | \psi {| }^{2})=15.91$, see Fig. 6b. This yields larger ground state energy errors than for the t − J_z model, and is further complicated when including the antisymmetry in the fermionic case. Again, we make the observation that for larger hole dopings, N_h ≥ 6 for bosons and N_h ≥ 10 for fermions, the distributions for phase and amplitude become less complicated than in the low to intermediate doping regime, yielding a higher accuracy of the RNN wave function with errors Δϵ ≤ 10⁻⁴ for bosons and Δϵ ≤ 10⁻² for fermions in the respective doping regimes.

Our results show that in the low doping regime of the t − J model, both fermionic systems and bosonic systems are difficult to learn, see Fig. 6. This suggests that not only the fermionic sign structure poses difficulties: Firstly, the Hilbert space dimension in the finite doping regime indicated in gray in Fig. 6a and c becomes much larger than for spin systems, challenging both the RNN ansatz and its training. Second, the frustrated motion of holes in the AFM Heisenberg background can potentially cause problems. When these holes move through the system, the spin background is affected, giving rise to an effective J₁ − J₂ spin model with nearest and next-nearest spin exchange interactions and is hence more difficult to learn⁷⁸. For the t − J_z model, we observe that, probably due to the lack of spin dynamics resulting from the absence of spin-flip terms, the relative errors are comparably low in the bosonic case. Furthermore, for all states with high $\log | \psi {| }^{2}$ variance, there is a significant amount of configurations σ with a large negative log amplitude, i.e. ∣ψ(σ)∣² ≈ 0. This makes an accurate determination of expectation values extremely costly and can affect the training process. For example, in ref. ⁷⁹ it was shown that this yields higher variances for the gradients determined by stochastic reconfiguration. Lastly, Fig. 6 shows that the performance decreases for fermions, which is in agreement with the fact that already on a mean-field level, fermionic Slater determinants are much more complicated than the bosonic states.

In the Methods section, we provide a detailed discussion on these challenges are encountered during training our t − J RNN architecture, yielding the relatively large errors encountered e.g. in Fig. 6. Besides the increased Hilbert space dimension and the small amplitudes for certain configurations discussed above, we discuss the learning plateau associated with a local minimum that is encountered for all considered optimization routines—including annealing²⁰, minimum-step stochastic reconfiguration (minSR)⁸⁰ and the recently proposed stochastic reconfiguration (SR) variant based on a linear algebra trick⁸¹—and the fact that SR algorithms have problems with autoregressive architectures⁸²; the complicated interplay between phase and amplitude optimization¹⁵; and the difficulty to implement constraints on the symmetry sector under consideration, e.g. the particle number, magnetization and spatial symmetries directly into the RNN architecture^11,45. Many of these challenges are inherent to the simulation of both bosonic and fermionic systems. Our results indicate that the bottleneck for simulating fermionic spinful systems is the training and not only the expressivity of the ansatz, and point the way to possible improvements concerning the ansatz and the training procedure.

Conclusions

To conclude, we present a neural network architecture, based on RNNs¹¹, to simulate ground states of the fermionic and bosonic t − J model upon finite hole doping. We show that, despite many challenges due to the increased complexity of the learning problem compared to spin systems, the RNN succeeds in capturing physical properties like the shape of the dispersion, indicating the dominating emergent excitations of the systems. In order to calculate the dispersion, we present a method that can be used with any NQS ansatz and for any lattice geometry and map out quasiparticle dispersion using the RNN ansatz for several different lattice geometries, including 1D and 2D systems. Moreover, it enables an extremely efficient calculation of dispersion relations compared to conventional methods like DMRG⁵⁴, which usually require a time-evolution of the state⁵⁵. The dispersion scheme yields a good agreement when comparing to exact diagonalization or DMRG results, and is expected to perform even better for a better ground state convergence. In principle, it can also be combined with a translationally symmetric NQS ansatz to improve the accuracy. Furthermore, the scheme could be combined with additional symmetries, e.g. rotational symmetries, enabling the calculation of m₄ rotational spectra⁸³.

Methods

The RNN ansatz and its bottlenecks

Given these relatively high errors on the ground state energies in some cases, we test potential bottlenecks of our approach in the Methods section, namely: (i) Difficulties in learning either the phase or the amplitude, by considering the partial learning problems separately. (ii) The optimization procedure. (iii) The optimization landscape. (iv) The expressivity of the RNN ansatz, compared to the complexity of the learning problem.

The partial learning problem

One potential bottleneck of our approach is the way the RNN wave function is split into amplitude and phase. In order to test if there are problems with the optimization of the phase or amplitude alone, we consider their learning problems separately as suggested e.g. in refs. ^15,84.

1.
Phase training: We sample from the exact ground state distribution ∣ψ∣², calculated with ED, and optimize only the phase.
2.
Amplitude training: Given the correct phase distribution from ED, we optimize only the logarithmic amplitude to check if the ground-state probability amplitudes can be learned.

Figure 7 shows the results of amplitude and phase trainings (dark and light blue), compared to the full training of both amplitude and phase (red). For all considered systems, the results of the partial trainings are closer to the exact ground state, e.g. for open boundaries and N_h = 1, the relative error is decreased from Δϵ = 0.0147(37) to Δϵ = 0.0040(30) for the amplitude training and Δϵ = 0.0039(33) for the phase training. However, for all considered cases we observe the same problem as in the full training: the RNN gets stuck in a plateau that survives up to 20000 training steps. Although the relative error of the plateau decreases when considering the partial learning problems, the improvement is surprisingly low given the amount of information that is added to the training. Furthermore, whether the amplitude or phase training is more problematic remains unclear. Even for the phase training, for which the training samples are generated from the exact distribution ∣ψ∣² calculated with ED, the improvement is not significantly larger than for the amplitude training. This is in agreement with the results of Bukov et al.¹⁵.

Comparison of optimizers

As a next test, we compare the optimization results of different optimizers in Fig. 8a, namely Stochastic gradient descent (SGD), adaptive methods like AdaBound⁸⁵ and Adam⁸⁶, and more advanced methods such as Adam+Annealing²⁰ and the recently developed variant of stochastic reconfiguration (SR), minimum-step SR (minSR)⁸⁰. We show the optimization results for the t − J_z model on the left and the t − J model on the right, both for N_h = 1.

**Fig. 8: Testing different optimizers.**

Typically, Adam is used for RNN wave function optimization^11,20,45,87, adapting the learning rate in each VMC update. For 200 samples used in each optimization step, Adam yields relative errors on the order of Δϵ ≈ 10⁻³ for the t − J_z model and Δϵ ≈ 10⁻² for the t − J model. AdaBound, employing dynamic bounds on learning rates, yielding a gradual transition from Adam to SGD during the training, has similar results.

Another modification of the Adam training is the use of variational annealing, shown to improve the performance for frustrated systems²⁰. The idea of annealing is to avoid getting stuck in local minima by including an artificial temperature T in the learning process. In order to do so, the variational free energy of the model,

$${F}_{{{{{{{{\boldsymbol{\lambda }}}}}}}}}=\langle {H}_{{{{{{{{\boldsymbol{\lambda }}}}}}}}}\rangle -T({n}_{{{{{{{{\rm{step}}}}}}}}})\cdot S$$

(4)

instead of the energy (13) is minimized. Here, the averaged Hamiltonian 〈H_λ〉 is given by 〈H_λ〉 = ∑_σ∣ψ_λ(σ)∣²H_λ(σ). Furthermore, S denotes the Shannon entropy

$$S=-{\sum}_{{{{{{{{\boldsymbol{\sigma }}}}}}}}}| {\psi }_{{{{{{{{\boldsymbol{\lambda }}}}}}}}}(\sigma ){| }^{2}\log \left[| {\psi }_{{{{{{{{\boldsymbol{\lambda }}}}}}}}}(\sigma ){| }^{2}\right].$$

(5)

The minimization procedure that we use starts with a warmup phase with a constant temperature T₀, before decreasing the temperature T(t) = T₀(1 − (t − t_warmup)/τ) linearly with the minimization steps t. Typically, we use τ = 5000 and stop the training after t_final = 20000 training iterations, but tests up to τ = 20000 and t_final = 40000 did not yield any improvements. Figure 8a shows that for the square lattice, the use of annealing does not bring any advantage within the error bars.

Lastly, we apply two recently developed variant of stochastic reconfiguration (SR): minSR⁸⁰ and the linear algebra trick by Rende et al.⁸¹. Both methos are introduced later in the text, see Eqs. (17) and (18). For a stable training, we ensure non-exploding gradients by adding a diagonal offset δ(t) to the diagonals of the T-matrix, with δ(t) exponentially decaying from 1 to 10⁻¹⁰. After determining the gradients using Eq. (17), we apply the Adam update rule, which we empirically find to perform better than the GD update. Moreover, since it is crucial to use enough samples for a sufficiently good approximation of the gradients in SR, typically more samples than for the other optimization routines are needed. Here, we use 1000 samples in each minSR update and find that the results on the one-hole t − J ground state errors improve below the values obtained with Adam, see Fig. 8a on the right. However, we show in the Supplementary Note 2 that a comparison with Adam using the same number of samples does not lead to a conclusive result, which optimization routine is better, similar to the SR results in ref. ¹⁵.

The reason behind this can be understood when considering the spectrum of the T-matrix of the minSR algorithm: Similar to the results of ref. ⁸² for the S-matrix of the SR algorithm, Fig. 8b shows that the eigenvalues of T, λ_i, decrease extremely rapidly, in particular at the beginning of the training, indicating a very flat optimization landscape. This is a typical problem of autoregressive architectures⁸² and causes uncontrolled, high values of T⁻¹ and consequently also of the gradients δθ, see Eq. (17). Furthermore, the shape of the spectrum does not have any feature that indicates that the spectrum could be cut off at a specific eigenvalue, making a regularization very difficult. Hence, the diagonal offset δ(t) must be chosen relatively large, yielding parameter updates that are very similar to the plain vanilla Adam optimization as long as δ(t) is larger than many of the T-eigenvalues. The spectrum of the (X^TX) matrix of the SR variant by Rende et al.⁸¹, see Eq. (18), exhibits the same problem.

When comparing the results for different hidden dimensions, e.g. for minSR in Fig. 8a (right), it may suggest that a hidden dimension h_d > 100 could in principle improve the results further. However, we will show later that for such a large number of parameters, it is even possible, by restricting to a fixed number of holes and hence reducing the Hilbert space dimension to $\ll {3}^{{N}_{{{{{{{{\rm{sites}}}}}}}}}}$, to encode the wave function using exact methods.

Symmetries

The RNN ansatz we use has implemented $U(1)=U{(1)}_{\hat{N}}\times U{(1)}_{{\hat{S}}_{z}}$ symmetry, i.e. a conserved total particle number and atotal magnetization ${\hat{S}}_{z}=0(0.5)$ for even (odd) particle numbers^11,25. This is done by calculating the current particle number N_p(i) (magnetization S_z(i)) after the i-th RNN cell during the sampling process and assigning a zero conditional probability if N_p(i) = N_target (S_z(i) = S_z,target) for all sites j > i that are considered afterwards, see Supplementary Note 1C. For even (odd) particle numbers, we use S_z,target = 0 (S_z,target = 0.5). As a next test, we employ additional spatial symmetries: For a symmetry operation ${{{{{{{\mathcal{T}}}}}}}}$ according to the lattice symmetry, we know that

$$| \psi (\sigma ){| }^{2}=| \psi ({{{{{{{\mathcal{T}}}}}}}}\sigma ){| }^{2}$$

(6)

for the exact ground state. For rotational C₄ symmetry of the square lattice, we employ this constraint (i) in the training, by implementing it in the cost function, or (ii) in the RNN ansatz as in ref. ¹¹.

The constraint in the cost function that we use in (i) is calculated by rotating all samples drawn from ∣ψ_λ∣² according to C₄ in each VMC step, calculating ${p}_{{{{{{{{\boldsymbol{\lambda }}}}}}}}}({{{{{{{{\mathcal{T}}}}}}}}}_{i}\sigma )=| {\psi }_{{{{{{{{\boldsymbol{\lambda }}}}}}}}}({{{{{{{{\mathcal{T}}}}}}}}}_{i}\sigma ){| }^{2}$ for all ${\{{{{{{{{{\mathcal{T}}}}}}}}}_{i}\}}_{i}$ and adding the squared difference $\gamma (t){\sum }_{{{{{{{{\boldsymbol{\sigma }}}}}}}}}{\left(| {\psi }_{{{{{{{{\boldsymbol{\lambda }}}}}}}}}(\sigma ){| }^{2}-| {\psi }_{{{{{{{{\boldsymbol{\lambda }}}}}}}}}({{{{{{{{\mathcal{T}}}}}}}}}_{i}\sigma ){| }^{2}\right)}^{2}$ with a prefactor $\gamma (t)={\gamma }_{0}{\log }_{10}(1+9(t-{t}_{{{{{{{{\rm{warmup}}}}}}}}})/\tau )$ to the cost function. Typically, we use long decay times on the order of τ = 5000 steps.

For (ii), we assign

$${p}_{{{{{{{{\boldsymbol{\lambda }}}}}}}}}(\sigma )=\frac{1}{| {\{{{{{{{{{\mathcal{T}}}}}}}}}_{i}\}}_{i}| }{\sum}_{{{{{{{{\mathcal{T}}}}}}}}=1,{\{{{{{{{{{\mathcal{T}}}}}}}}}_{i}\}}_{i}}| {\psi }_{{{{{{{{\boldsymbol{\lambda }}}}}}}}}({{{{{{{\mathcal{T}}}}}}}}\sigma ){| }^{2}$$

(7)

for all operations ${{{{{{{{\mathcal{T}}}}}}}}}_{i}$ in the symmetry group, similar to ref. ¹¹. Note that this symmetrization scheme keeps the autoregressive property of the RNN, as pointed out e.g. in ref. ⁸⁸.

The optimization results using (i) and (ii) are shown in Fig. 9 for the t − J and t − J_z model on a 4 × 4 square lattice. It can be seen that constraining the RNN wave function directly via (ii) is more succesful than via the cost function (i): Using (ii), we get an order of magnitude lower relative errors compared to the results without spatial symmetries for the t − J_z model. This possibly results from the fact that the additional constraint on the symmetry leads to barriers in the loss landscape in the regions where the symmetry is violated. Even when increasing the symmetry constraint gradually during the training, as described above, these barriers can prevent getting close to the minimum.

**Fig. 9: The effect of U(1) and spatial symmetries.**

The t − J model results do not improve significantly for both symmetry implementations (i) and (ii), with an error on the order of Δϵ ≈ 10⁻² with and without spatial symmetries. Hence, we conclude that applying symmetries does only help to improve the accuracy if the ground state can already be learned sufficiently well, as for the t − J_z model.

For systems with sufficiently high convergence, also rotational symmetries like s, p or d-wave symmetries could be enforced to probe the competition between the ground state energies in the respective symmetry sectors⁶⁰, which is highly relevant for the study of high-T_c superconductivity. In addition, also low-energy excited states for these symmetry sectors could be calculated by making use of the dispersion scheme presented in this work, e.g. m₄ rotational spectra⁸⁹.

Complexity of the learning problem

Lastly, we consider the complexity of our learning problem and compare it to the expressivity of our RNN ansatz in terms of the number of parameters that are encoded in the RNN. In Fig. 10 on the left, we show the number of parameters used in the RNN ansatz for the 4 × 4t − J square lattice for hidden dimensions 30 ≤h_d ≤ 100. The number of parameters encoded in the ansatz is slightly lower than the number of parameters that is actually used (gray circles on the left). This is due to the way we encode the U(1) symmetry in our approach, resulting in a small fraction of weights that are not updated since the respective probabilities are set to zero to obey the U(1) symmetry, see Supplementary Note 1C. Furthermore, we show the dimension of the Hilbert space for the same system (with variable particle number) 3¹⁶ in black. For the small system size that we consider in Fig. 10, this Hilbert space dimension is two orders of magnitude larger than the number of RNN parameters. For the 10 × 4 system in Fig. 1 however, our RNN representation has 13 orders of magnitude less parameters than the Hilbert space with dimension 3⁴⁰ that is learned.

**Fig. 10: Number of parameters for the exact wave function of a 4 × 4 system compared to the recurrent neural network (RNN) ansatz.**

The Hilbert space dimension ${3}^{{N}_{{{{{{{{\rm{sites}}}}}}}}}}$ considered so far in this section allows for three states per site – spin up, down and hole –, i.e. for a variable number of holes in the system. For a fixed number of holes, the number of parameters to describe the exact state is given by all D combinations to distribute a fixed number of holes N_h, particles N_↑ and N_↓ on the N_sites sites, i.e.

$$D=\frac{{N}_{{{{{{{{\rm{sites}}}}}}}}}!}{({N}_{\downarrow }!)\left.({N}_{{{{{{{{\rm{sites}}}}}}}}}-{N}_{\downarrow })!\right)}\cdot \frac{({N}_{\uparrow }+{N}_{h})!}{({N}_{\uparrow }!{N}_{h}!)},$$

(8)

shown also in Fig. 6a and c (gray), with $D \, \ll \, {3}^{{N}_{{{{{{{{\rm{sites}}}}}}}}}}$, as shown by the blue lines in Fig. 10 for 1≤N_h≤4. In fact, for N_h = 1 our RNNs encode even more parameters than this exact parameterization when h_d > 70. This reveals one main problem of our RNN ansatz, namely the way how the U(1) symmmetries are encoded: The fixed particle and magnetization Hilbert space dimension D is typically much smaller than the dimension of the RNN, since setting the RNN conditionals to zero during the sampling path corresponds to a dimension 3^x2^y1^z with x + y + z = N_sites and typically y and z small. For future studies, we envision an RNN ansatz for a fixed number of holes in the spirit of Eq. (8), reducing the dimension of the parameter space that needs to be learned and hence facilitating the learning problem.

Lastly, we would like to point out that the learning problem that we consider here is more complex than for spin systems that are typically considered with this architecture^11,36,37,45, as can be seen when comparing the Hilbert space dimensions for local dimensions d = 2 as for spin systems, vs. d = 3 as for the t − J model in Fig. 10 on the right. For larger systems, this difference increases, e.g. for the 10 × 4 system in Fig. 1 the Hilbert space dimension increases by seven orders of magnitude when going from a spin to a t − J system (with flexible number of holes). This problem becomes even more pronounced when the Fermi-Hubbard model with local dimension d = 4 would be considered.

NQS dispersion relations

Our dispersion scheme relies on an additional term in the cost function that penalizes momenta of the NQS away from the target momentum. The momentum k_NQS of the NQS wave function is calculated from the translation operator ${\hat{T}}_{{{{{{{{\boldsymbol{R}}}}}}}}}$, which translates a state ψ(r) by the respective vector R, i.e. ${\hat{T}}_{{{{{{{{\boldsymbol{R}}}}}}}}}\psi ({{{{{{{\boldsymbol{r}}}}}}}})=\psi ({{{{{{{\boldsymbol{r}}}}}}}}-{{{{{{{\boldsymbol{R}}}}}}}})$. Furthermore, it can be written as⁹⁰

$${\hat{T}}_{{{{{{{{\boldsymbol{R}}}}}}}}}=\exp \left(-i{{{{{{{\boldsymbol{R}}}}}}}}\cdot {{{\hat{{{{{\boldsymbol{k}}}}}}}}}\right)\,,$$

(9)

with the momentum operator $\hat{{{{{{{{\boldsymbol{k}}}}}}}}}$. To determine the expectation value k_NQS = (k_x, k_y) using samples σ drawn from the NQS wave function, we calculate the expectation value of ${\hat{T}}_{{{{{{{{\boldsymbol{R}}}}}}}}}$. For example, for a square lattice, this is done by translating all snapshots by R = e_x and R = e_y with ∣e_μ∣ = a for lattice distance a and μ = x, y. Then, we calculate the respective translated states, ${\psi }_{{{{{{{{\boldsymbol{\lambda }}}}}}}}}({\hat{T}}_{{{{{{{{{\boldsymbol{e}}}}}}}}}_{\mu }}\sigma )$, to determine the expectation value

$$\left\langle {\psi }_{{{{{{{{\boldsymbol{\lambda }}}}}}}}}\right\vert {\hat{T}}_{{{{{{{{{\boldsymbol{e}}}}}}}}}_{\mu }}\left\vert {\psi }_{{{{{{{{\boldsymbol{\lambda }}}}}}}}}\right\rangle =\exp \left(-i{{{{{{{{\boldsymbol{e}}}}}}}}}_{\mu }\cdot {{{{{{{{\boldsymbol{k}}}}}}}}}_{{{{{{{{\rm{NQS}}}}}}}}}\right)\approx \frac{1}{{N}_{s}}{\sum}_{i}\frac{{\psi }_{{{{{{{{\boldsymbol{\lambda }}}}}}}}}({\hat{T}}_{{{{{{{{{\boldsymbol{e}}}}}}}}}_{\mu }}{\sigma }_{i})}{{\psi }_{{{{{{{{\boldsymbol{\lambda }}}}}}}}}({\sigma }_{i})},$$

(10)

with the first equality due to the translational invariance of the ground state of a square lattice, which we assume to be (approximately) present for our NQS ground states, see also Supplementary Note 3. Hence,

$${{{{{{{{\boldsymbol{k}}}}}}}}}_{{{{{{{{\rm{NQS}}}}}}}}}^{\mu }=\frac{i}{a}\log \left\langle {\psi }_{{{{{{{{\boldsymbol{\lambda }}}}}}}}}\right\vert {\hat{T}}_{{{{{{{{{\boldsymbol{e}}}}}}}}}_{\mu }}\left\vert {\psi }_{{{{{{{{\boldsymbol{\lambda }}}}}}}}}\right\rangle .$$

(11)

Using a sufficiently converged NQS ground state wave function as initial state, we train using VMC with an additional term in the loss function,

$${{{{{{{\mathcal{C}}}}}}}}({{{{{{{{\boldsymbol{k}}}}}}}}}_{{{{{{{{\rm{target}}}}}}}}})=\gamma (t){\sum}_{\mu }{\left({{{{{{{{\boldsymbol{k}}}}}}}}}_{{{{{{{{\rm{NQS}}}}}}}}}^{\mu }-{{{{{{{{\boldsymbol{k}}}}}}}}}_{{{{{{{{\rm{target}}}}}}}}}^{\mu }\right)}^{2},$$

(12)

with the RNN momentum k_NQS and the target momentum k_target. We use a prefactor $\gamma (t)={\gamma }_{0}{\log }_{10}(1+9(t-{t}_{{{{{{{{\rm{warmup}}}}}}}}})/\tau )$ that is turned on with typically τ = 100, …, 1000 and γ₀ = 1, …, 10 and gradually lifts all areas in the loss landscape that correspond to a NQS wave function with momentum k_NQS ≠ k_target, forcing the NQS to a higher energy state at momentum k_NQS = k_target, see Fig. 2.

For k_target far away from the ground state momentum, we observe empirically that the imaginary part of k_NQS can become large, on the same order as the real part, in particular if the ground state accuracy was not sufficiently high. In these cases, the RNN ends up in states that are not eigenstates of the momentum operator. In order to prevent our RNN wave function to get trapped in these states we apply an additional constraint in the loss function in these cases, penalizing large imaginary parts of the momentum, ${{{{{{{\rm{Im}}}}}}}}\,{{{{{{{{\boldsymbol{k}}}}}}}}}_{{{{{{{{\rm{NQS}}}}}}}}}$.

Architecture and training

In the present paper we use a recurrent neural network (RNN)⁹¹ to represent a quantum state defined on a 2D lattice with N_sites = N_x ⋅ N_y positions occupied by N_p particles. RNNs and similar generative architectures combined with variational energy minimization have already been applied successfully for spin systems^5,11,36,45. One of the advantages of these architectures is their autoregressive property, which allows extremely efficient independent sampling from the RNN wave function^19,92, which is important for the training procedure.

In order to represent fermionic wave functions, we start from the same approach as for bosonic spin systems and use an RNN architecture consisting of N_sites (tensorized) gated recurrent units (GRUs), each one representing one site of the system. The information is passed from the first cell, corresponding to the first lattice site, to the last site in a recurrent fashion, see Supplementary Note 1a.

In order to find the ground state of the system under consideration, we use the variational Monte Carlo (VMC) minimization of the energy^92,93. VMC has already been used in a wide range of machine learning applications (see e.g. refs. ^6,18 for an overview). In VMC, the expectation value of the energy of the RNN trial wave function,

$$\langle {E}_{{{{{{{{\boldsymbol{\lambda }}}}}}}}}\rangle ={\sum}_{{{{{{{{\boldsymbol{\sigma }}}}}}}}}| {\psi }_{{{{{{{{\boldsymbol{\lambda }}}}}}}}}(\sigma ){| }^{2}\,{E}_{{{{{{{{\boldsymbol{\lambda }}}}}}}}}^{{{{{{{{\rm{loc}}}}}}}}}(\sigma )\approx \frac{1}{{N}_{s}}{\sum}_{i}{E}_{{{{{{{{\boldsymbol{\lambda }}}}}}}}}^{{{{{{{{\rm{loc}}}}}}}}}({\sigma }_{i}),$$

(13)

is minimized. Here, we have defined the local energy

$${E}_{{{{{{{{\boldsymbol{\lambda }}}}}}}}}^{{{{{{{{\rm{loc}}}}}}}}}(\sigma )=\frac{\langle \sigma | {{{{{{{\mathcal{H}}}}}}}}| {\psi }_{{{{{{{{\boldsymbol{\lambda }}}}}}}}}\rangle }{\langle \sigma | {\psi }_{{{{{{{{\boldsymbol{\lambda }}}}}}}}}\rangle }.$$

(14)

As shown e.g. in refs. ^11,29 one can use the cost function

$${{{{{\mathcal{C}}}}}}=\frac{1}{{N}_{s}}{\sum }_{i}\underbrace{{\left[{E}_{{{{{\boldsymbol{\lambda }}}}}}^{loc}({\sigma}_{i})-\langle {E}_{{{{{\boldsymbol{\lambda }}}}}}^{loc}\rangle\right]}}_{={:}-\sqrt{{N}_{s}}{\bar{\epsilon}}({{\sigma }_{i}})}$$

(15)

to minimize both the local energy as well as the variance of the local energy to make the training more stable. In Eq. (15), we have defined ${\bar{\epsilon }}({{\sigma }_{i}}):= -\frac{1}{\sqrt{{N}_{s}}}\left[{E}_{{{{{\boldsymbol{\lambda }}}}}}^{{{{{\rm{loc}}}}}}({\sigma }_{i})-\langle {E}_{{{{{\boldsymbol{\lambda }}}}}}^{{{{{\rm{loc}}}}}}\rangle \right]$, where N_s denotes the number of samples.

One of the main difficulties of neural network quantum states is the optimization of Eq. (15), due to its typically rugged landscape with many local minima and saddle points¹⁵. If not stated differently, we use the Adam optimizer⁸⁶ for the optimization of Eq. (15), following previous works on NQS using RNNs^10,11,45. To improve the optimization, often stochastic reconfiguration (SR)^94,95 is used. Here, we use two recently proposed, SR variants, namely minimum-step stochastic reconfiguration (minSR) and the SR variant based on a linear algebra trick by Rende et al.⁸¹. In contrast to conventional SR, these variants enable the use of a large numbers of NQS parameters, see Supplementary Note 2. In minSR, each parameter λ_k of the neural network is optimized individually according to

$${\bar{O}}_{{\sigma }_{i}k}\,\delta {\lambda }_{k}=\bar{\epsilon }({\sigma }_{i}),$$

(16)

where σ_i denote the sample configurations, k the parameter index, ${O}_{{\sigma }_{i}k}=\frac{1}{\psi ({\sigma }_{i})}\frac{\partial \psi ({\sigma }_{i})}{\partial {\lambda }_{k}}$ and ${\bar{O}}_{{\sigma }_{i}k}=({O}_{{\sigma }_{i}k}-\langle {O}_{{\sigma }_{i}k}\rangle )/\sqrt{{N}_{s}}$. Eq. (16) is solved by

$$\delta {\lambda }_{k}={\bar{O}}_{k{\sigma }_{i}^{{\prime} }}^{{{{\dagger}}} }{({T}^{-1})}_{{\sigma }_{i}^{{\prime} }{\sigma }_{i}}\,\bar{\epsilon }({\sigma }_{i})\,,$$

(17)

with $T=\bar{O}{\bar{O}}^{{{{\dagger}}} }$⁸⁰. In the version of Rende et al.,

$$\delta {\lambda }_{k}={X}_{k{\sigma }_{i}^{\prime} }{({X}^{T}X)}_{{\sigma }_{i}^{{\prime} }{\sigma }_{i}}^{-1} \, {f}_{{\sigma }_{i}},$$

(18)

with $X={{{{{{{\rm{Concat}}}}}}}}({{{{{{{\rm{Re}}}}}}}}\,\bar{O},{{{{{{{\rm{Im}}}}}}}}\,\bar{O})$ and ${f}_{{\sigma }_{i}}={{{\rm{Concat}}}}({{{\rm{Re}}}} \, \bar{\epsilon }({\sigma }_{i}),-{{{\rm{Im}}}}\,\bar{\epsilon }({\sigma }_{i}))$⁸¹.

Fermionic RNN Wave Functions

The architecture introduced above is per se bosonic. When considering fermionic systems, we need to take the antisymmetry of the wave function into account. This antisymmetry is included during the variational Monte Carlo steps when calculating the local energy introduced in Eq. (14). We can expand the local energy to

$${E}_{loc}({{\sigma }_{i}})={\sum}_{{{\sigma }_{i}^{\prime}}}\frac{\left\langle {{\sigma }_{i}}\right| H\left| {{\sigma }_{i}^{\prime }}\right\rangle \langle {{\sigma }_{i}^{\prime }}| {\psi }_{{{{{{{{\boldsymbol{\lambda }}}}}}}}}\rangle }{\langle {{\sigma }_{i}}| {\psi }_{{{{{{{{\boldsymbol{\lambda }}}}}}}}}\rangle }.$$

(19)

In this sum, we multiply each term with a factor (−1)^P if ${{\sigma }^{\prime}}$ is connected to σ by P two-particle permutations, as suggested in ref. ²⁹. In order to do so, we take the permutations along the sampling path into account. For the t − XXZ Hamiltonian under consideration we only need to consider the hopping term for calculating the antisymmetric signs. An example is shown in Fig. 11. This procedure is equivalent to the implementation of Jordan-Wigner strings as e.g. in ref. ²⁵.

**Fig. 11: Jordan-Wigner strings on the level of NQS expectation values.**

Data availability

The data of all figures of this paper is provided in https://github.com/HannahLange/Fermionic-RNNs/in the data folder. Other data are available from the corresponding author on request.

Code availability

The code used for this paper is provided here: https://github.com/HannahLange/Fermionic-RNNs/in the src and src_dispersion folders.

References

Carleo, G. & Troyer, M. Solving the quantum many-body problem with artificial neural networks. Science 355, 602–606 (2017).
Article ADS MathSciNet Google Scholar
Torlai, G. & Melko, R. G. Learning thermodynamics with boltzmann machines. Phys. Rev. B 94, 165134 (2016).
Article ADS Google Scholar
Torlai, G. et al. Neural-network quantum state tomography. Nat. Phys. 14, 447–450 (2018).
Article Google Scholar
Torlai, G. et al. Integrating neural networks with a quantum simulator for state reconstruction. Phys. Rev. Lett. 123, 230504 (2019).
Article ADS Google Scholar
Carrasquilla, J., Torlai, G., Melko, R. G. & Aolita, L. Reconstructing quantum states with generative models. Nat. Mach. Intell. 1, 155–161 (2019).
Article Google Scholar
Melko, R. G., Carleo, G., Carrasquilla, J. & Cirac, J. I. Restricted boltzmann machines in quantum physics. Nat. Phys. 15, 1745–2481 (2019).
Article Google Scholar
Schmale, T., Reh, M. & Gärttner, M. Efficient quantum state tomography with convolutional neural networks. npj Quantum Inf. 8, 115 (2022).
Article ADS Google Scholar
Roth, C. & MacDonald, A. H. Group convolutional neural networks improve quantum state accuracy http://arxiv.org/abs/2104.05085 (2021).
Rocchetto, A., Grant, E., Strelchuk, S., Carleo, G. & Severini, S. Learning hard quantum distributions with variational autoencoders. npj Quantum Inf. 4, 28 (2018).
Article ADS Google Scholar
Morawetz, S., De Vlugt, Isaac, J. S., Carrasquilla, J. & Melko, R. G. U(1)-symmetric recurrent neural networks for quantum state reconstruction. Phys. Rev. A 104, 012401 (2021).
Article ADS MathSciNet Google Scholar
Hibat-Allah, M., Ganahl, M., Hayward, L. E., Melko, R. G. & Carrasquilla, J. Recurrent neural network wave functions. Phys. Rev. Res. 2, 023358 (2020).
Article Google Scholar
Sharir, O., Levine, Y., Wies, N., Carleo, G. & Shashua, A. Deep autoregressive models for the efficient variational simulation of many-body quantum systems. Phys. Rev. Lett. 124, 020503 (2020).
Article ADS Google Scholar
Luo, D. et al. Gauge-invariant and anyonic-symmetric autoregressive neural network for quantum lattice models. https://doi.org/10.1103/PhysRevResearch.5.013216 (2023).
Luo, D., Chen, Z., Carrasquilla, J. & Clark, B. K. Autoregressive neural network for simulating open quantum systems via a probabilistic formulation. Phys. Rev. Lett. 128, 090501 (2022).
Article ADS MathSciNet Google Scholar
Bukov, M., Schmitt, M. & Dupont, M. Learning the ground state of a non-stoquastic quantum Hamiltonian in a rugged neural network landscape. SciPost Phys. 10, 147 (2021).
Article ADS Google Scholar
Uria, B., Côté, Marc-Alexandre, Gregor, K., Murray, I. & Larochelle, H. Neural autoregressive distribution estimation. J. Mach. Learn. Res. 17, 1–37 (2016).
MathSciNet Google Scholar
Humeniuk, S., Wan, Y. & Wang, L. Autoregressive neural Slater-Jastrow ansatz for variational Monte Carlo simulation. https://doi.org/10.21468/SciPostPhys.14.6.171 (2023).
Carrasquilla, J. & Torlai, G. How to use neural networks to investigate quantum many-body physics. PRX Quantum 2, 040201 (2021).
Article ADS Google Scholar
Wu, D., Wang, L. & Zhang, P. Solving statistical mechanics using variational autoregressive networks. Phys. Rev. Lett. 122, 080602 (2019).
Article ADS Google Scholar
Hibat-Allah, M., Melko, R. G. & Carrasquilla, J. Supplementing recurrent neural network wave functions with symmetry and annealing to improve accuracy. https://doi.org/10.48550/ARXIV.2207.14314 (2022).
Hibat-Allah, M., Melko, R. G. & Carrasquilla, J. Investigating topological order using recurrent neural networks. https://doi.org/10.1103/PhysRevB.108.075152 (2023).
Keimer, B., Kivelson, S. A., Norman, M. R., Uchida, S. & Zaanen, J. From quantum matter to high-temperature superconductivity in copper oxides. Nature 518, 179–186 (2015).
Article ADS Google Scholar
Pfau, D., Spencer, J. S., Matthews, Alexander, G. D. G. & Foulkes, W. M. C. Ab initio solution of the many-electron schrödinger equation with deep neural networks. Phys. Rev. Res. 2, 033429 (2020).
Article Google Scholar
Spencer, J. S., Pfau, D., Botev, A. & Foulkes, W. M. C. Better, faster fermionic neural networks. https://doi.org/10.48550/ARXIV.2011.07125 (2020).
Barrett, T. D., Malyshev, A. & Lvovsky, A. I. Autoregressive neural-network wavefunctions for ab initio quantum chemistry. Nat. Mach. Intell. 4, 2522–5839 (2022).
Article Google Scholar
Choo, K., Carleo, G., Regnault, N. & Neupert, T. Symmetries and many-body excitations with neural-network quantum states. Phys. Rev. Lett. 121, 167204 (2018).
Article ADS Google Scholar
Hermann, J., Schätzle, Z. & Noé, F. Deep-neural-network solution of the electronic schrödinger equation. Nat. Chem. 12, 1755–4349 (2020).
Article Google Scholar
Nomura, Y., Darmawan, A. S., Yamaji, Y. & Imada, M. Restricted boltzmann machine learning for solving strongly correlated quantum systems. Phys. Rev. B 96, 205152 (2017).
Article ADS Google Scholar
Inui, K., Kato, Y. & Motome, Y. Determinant-free fermionic wave function using feed-forward neural networks. Phys. Rev. Res. 3, 043126 (2021).
Article Google Scholar
Luo, D. & Clark, B. K. Backflow transformations via neural networks for quantum many-body wave functions. Phys. Rev. Lett. 122, 226401 (2019).
Article ADS Google Scholar
Robledo Moreno, J. Carleo, G., Georges, A. & Stokes, J. Fermionic wave functions from neural-network constrained hidden states. Proc. Natl Acad. Sci. 119, e2122059119 (2022).
Article Google Scholar
Yoshioka, N., Mizukami, W. & Nori, F. Solving quasiparticle band spectra of real solids using neural-network quantum states. Commun. Phys. 4, 2399–3650 (2021).
Article Google Scholar
Bengio, Y., Simard, P. & Frasconi, P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5, 157–166 (1994).
Article Google Scholar
Schäfer, A. M., Udluft, S. & Zimmermann, H.-G. Learning long term dependencies with recurrent neural networks, in Artificial Neural Networks – ICANN 2006, edited by Kollias, S. D., Stafylopatis, A., Duch, Włodzisław, & Oja, E. (Springer Berlin Heidelberg, Berlin, Heidelberg, 2006) pp. 71–80.
Pascanu, R., Mikolov, T. & Bengio, Y. On the difficulty of training recurrent neural networks, Proc. 30th Int. Conf. Mach. Learn. 28, 1310–1318 (2013).
Czischek, S., Moss, M. S., Radzihovsky, M., Merali, E. & Melko, R. G. Data-enhanced variational monte carlo simulations for rydberg atom arrays. Phys. Rev. B 105, 205108 (2022).
Article ADS Google Scholar
Moss, M. S. et al. Enhancing variational monte carlo using a programmable quantum simulator. http://arxiv.org/abs/2308.02647 (2023).
Auerbach, A. Interacting Electrons and Quantum Magnetism - (Springer Science & Business Media, Berlin Heidelberg, 2012).
Qin, M. et al. Absence of superconductivity in the pure two-dimensional hubbard model. Phys. Rev. X 10, 031016 (2020).
Google Scholar
White, S. R. & Scalapino, D. J. Competition between stripes and pairing in a ${t-t}^{{\prime} }-j$ model. Phys. Rev. B 60, R753–R756 (1999).
Article ADS Google Scholar
Himeda, A., Kato, T. & Ogata, M. Stripe states with spatially oscillating d-wave superconductivity in the two-dimensional ${t}-{{t}}^{{\prime} }-{J}$ model. Phys. Rev. Lett. 88, 117001 (2002).
Article ADS Google Scholar
Luo, D., Dai, D. D. & Fu, L. Pairing-based graph neural network for simulating quantum materials. http://arxiv.org/abs/2311.02143 [cond-mat.str-el] (2023).
Kim, J. et al. Neural-network quantum states for ultra-cold fermi gases. http://arxiv.org/abs/2305.08831 (2023).
Lou, Wan Tong et al. Neural wave functions for superfluids http://arxiv.org/abs/2305.06989 (2024).
Roth, C. Iterative retraining of quantum spin models using recurrent neural networks. https://arxiv.org/abs/2003.06228 (2020).
Marshall, W. Antiferromagnetism. https://royalsocietypublishing.org/doi/10.1098/rspa.1955.0200 (1955).
Koepsell, J. et al. Imaging magnetic polarons in the doped fermi–hubbard model. Nature 572, 358–362 (2019).
Article ADS Google Scholar
Schäfer, T. et al. Tracking the footprints of spin fluctuations: A multimethod, multimessenger study of the two-dimensional hubbard model. Phys. Rev. X 11, 011058 (2021).
Google Scholar
Xu, H. et al. Coexistence of superconductivity with partially filled stripes in the hubbard model. Science 384, eadh7691 (2024).
Arovas, D. P., Berg, E., Kivelson, S. A. & Raghu, S. The hubbard model. Annu. Rev. Condens. Matter Phys. 13, 239–274 (2022).
Article ADS Google Scholar
Schollwöck, U. The density-matrix renormalization group in the age of matrix product states. Ann. Phys. 326, 96–192 (2011).
Article ADS MathSciNet Google Scholar
White, S. R. Density matrix formulation for quantum renormalization groups. Phys. Rev. Lett. 69, 2863–2866 (1992).
Article ADS Google Scholar
Bohrdt, A., Demler, E., Pollmann, F., Knap, M. & Grusdt, F. Parton theory of angle-resolved photoemission spectroscopy spectra in antiferromagnetic mott insulators. Phys. Rev. B 102, 035139 (2020).
Article ADS Google Scholar
Vanderstraeten, L., Mariën, Michaël, Verstraete, F. & Haegeman, J. Excitations and the tangent space of projected entangled-pair states. Phys. Rev. B 92, 201111 (2015).
Article ADS Google Scholar
Van Damme, M., Vanhove, R., Haegeman, J., Verstraete, F. & Vanderstraeten, L. Efficient matrix product state methods for extracting spectral information on rings and cylinders. Phys. Rev. B 104, 115142 (2021).
Article ADS Google Scholar
Malyshev, A., Arrazola, J. M., & Lvovsky, A. I. Autoregressive neural quantum states with quantum number symmetries. http://arxiv.org/abs/2310.04166 (2023).
Nomura, Y. Machine learning quantum states — extensions to fermion-boson coupled systems and excited-state calculations. J. Phys. Soc. Jpn. 89, 054706 (2020).
Article ADS Google Scholar
Viteritti, LucianoLoris, Ferrari, F. & Becca, F. Accuracy of restricted Boltzmann machines for the one-dimensional J₁ − J₂ Heisenberg model. SciPost Phys. 12, 166 (2022).
Article ADS Google Scholar
Bohrdt, A., Greif, D., Demler, E., Knap, M. & Grusdt, F. Angle-resolved photoemission spectroscopy with quantum gas microscopes. Phys. Rev. B 97, 125117 (2018).
Article ADS Google Scholar
Leung, P. W. Low-energy states with different symmetries in the t-j model with two holes on a 32-site lattice. Phys. Rev. B 65, 205101 (2002).
Mendes-Santos, T., Schmitt, M. & Heyl, M. Highly resolved spectral functions of two-dimensional systems with neural quantum states. Phys. Rev. Lett. 131, 046501 (2023).
Article ADS MathSciNet Google Scholar
Hendry, D. & Feiguin, A. E. Machine learning approach to dynamical properties of quantum many-body systems. Phys. Rev. B 100, 245123 (2019).
Article ADS Google Scholar
Hendry, D., Chen, H., Weinberg, P. & Feiguin, A. E. Chebyshev expansion of spectral functions using restricted boltzmann machines. Phys. Rev. B 104, 205130 (2021).
Article ADS Google Scholar
Charlebois, M. & Imada, M. Single-particle spectral function formulated and calculated by variational monte carlo method with application to d-wave superconducting state. Phys. Rev. X 10, 041023 (2020).
Google Scholar
Li, D. et al. Superconductivity in an infinite-layer nickelate. Nature 572, 624–627 (2019).
Article ADS Google Scholar
Sun, H. et al. Realization of a bosonic antiferromagnet. Nat. Phys. 17, 990–994 (2021).
Article Google Scholar
Béran, P., Poilblanc, D. & Laughlin, R. B. Evidence for composite nature of quasiparticles in the 2d t-j model. Nucl. Phys. B 473, 707–720 (1996).
Article ADS Google Scholar
Grusdt, F. et al. Parton theory of magnetic polarons: Mesonic resonances and signatures in dynamics. Phys. Rev. X 8, 011046 (2018).
Google Scholar
Laughlin, R. B. Evidence for quasiparticle decay in photoemission from underdoped cuprates. Phys. Rev. Lett. 79, 1726–1729 (1997).
Article ADS Google Scholar
Haerter, J. O. & Shastry, B. S. Kinetic antiferromagnetism in the triangular lattice. Phys. Rev. Lett. 95, 087202 (2005).
Article ADS Google Scholar
Schlömer, H., Schollwöck, U., Bohrdt, A. & Grusdt, F. Kinetic-to-magnetic frustration crossover and linear confinement in the doped triangular t − j model. http://arxiv.org/abs/2305.02342 (2023).
Struck, J. et al. Quantum simulation of frustrated classical magnetism in triangular optical lattices. Science 333, 996–999 (2011).
Article ADS Google Scholar
Tang, Y. et al. Simulation of hubbard model physics in wse2/ws2 moirésuperlattices. Nature 579, 353–358 (2020).
Article ADS Google Scholar
Xu, M. et al. Frustration- and doping-induced magnetism in a fermi–hubbard simulator. Nature 620, 971–976 (2023).
Article ADS Google Scholar
Yamamoto, R., Ozawa, H., Nak, D. C., Nakamura, I. & Fukuhara, T. Single-site-resolved imaging of ultracold atoms in a triangular optical lattice. N. J. Phys. 22, 123028 (2020).
Article Google Scholar
Wu, F., Lovorn, T., Tutuc, E. & MacDonald, A. H. Hubbard model physics in transition metal dichalcogenide moiré bands. Phys. Rev. Lett. 121, 026402 (2018).
Article ADS Google Scholar
Davydova, M., Zhang, Y. & Fu, L. Itinerant spin polaron and metallic ferromagnetism in semiconductor moiré superlattices. Phys. Rev. B 107, 224420 (2023).
Article ADS Google Scholar
Schlömer, H. et al. Quantifying hole-motion-induced frustration in doped antiferromagnets by hamiltonian reconstruction. https://doi.org/10.1038/s43246-023-00382-3 (2023).
Sinibaldi, A., Giuliani, C., Carleo, G. & Vicentini, F. Unbiasing time-dependent variational monte carlo by projected quantum evolution. http://arxiv.org/abs/2305.14294 (2023).
Chen, A. & Heyl, M., Efficient optimization of deep neural quantum states toward machine precision. http://arxiv.org/abs/2302.01941 arXiv:2302.01941 (2023).
Rende, R., Viteritti, Luciano Loris, Bardone, L., Becca, F. & Goldt, S. A simple linear algebra identity to optimize large-scale neural network quantum states. http://arxiv.org/abs/2310.05715 (2023).
Donatella, K., Denis, Z., Le Boité, A. & Ciuti, C. Dynamics with autoregressive neural quantum states: Application to critical quench dynamics. Phys. Rev. A 108, 022210 (2023).
Article ADS MathSciNet Google Scholar
Bohrdt, A., Demler, E. & Grusdt, F. Rotational resonances and regge-like trajectories in lightly doped antiferromagnets. Phys. Rev. Lett. 127, 197004 (2021).
Article ADS Google Scholar
Wang, J.-Q., He, R.-Q. & Lu, Z.-Y. Variational optimization of the amplitude of neural-network quantum many-body ground states. http://arxiv.org/abs/2308.09664 (2023).
Luo, L., Xiong, Y., Liu, Y. & Sun, X. Adaptive gradient methods with dynamic bound of learning rate, in Proc. 7th Int. Conf. Learn. Represent. (New Orleans, Louisiana, 2019).
Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization http://arxiv.org/abs/1412.6980 (2017).
Hibat-Allah, M., Inack, E. M., Wiersema, R., Melko, R. G. & Carrasquilla, J. Variational neural annealing. Nat. Mach. Intell. 3, 2522–5839 (2021).
Article Google Scholar
Reh, M., Schmitt, M. & Gärttner, M. Optimizing design choices for neural quantum states. Phys. Rev. B 107, 195115 (2023).
Article ADS Google Scholar
Bohrdt, A., Demler, E. & Grusdt, F. Dichotomy of heavy and light pairs of holes in the tj model. https://doi.org/10.1038/s41467-023-43453-2 (2023).
Shankar, R. Principles of quantum mechanics (Plenum, New York, NY, 1980). https://cds.cern.ch/record/102017.
Hochreiter, S. & Schmidhuber, J. ürgen Long Short-Term Memory. Neural Comput. 9, 1735–1780 (1997).
Article Google Scholar
Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press, 2016) http://www.deeplearningbook.org.
Becca, F. & Sorella, S. Quantum Monte Carlo Approaches for Correlated Systems (Cambridge University Press, 2017). https://doi.org/10.1017/9781316417041.
Stokes, J., Izaac, J., Killoran, N. & Carleo, G. Quantum Natural Gradient. Quantum 4, 269 (2020).
Article Google Scholar
Sorella, S. Green function monte carlo with stochastic reconfiguration. Phys. Rev. Lett. 80, 4558–4561 (1998).
Article ADS Google Scholar

Download references

Acknowledgements

We thank Ao Chen, Ejaaz Merali, Estelle Inack, Fabian Grusdt, Lukas Vetter, Markus Heyl, Markus Schmitt, Mohammed Hibat-Allah, Moritz Reh, Roeland Wiersema, Roger Melko, Schuyler Moss, Stefan Kienle, Stefanie Czischek and Tizian Blatz for helpful and inspiring discussions. We acknowledge funding by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy – EXC-2111 – 390814868 and from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programm (Grant Agreement no 948141) — ERC Starting Grant SimUcQuam. HL acknowledges support by the International Max Planck Research School. JC acknowledges support from the Natural Sciences and Engineering Research Council (NSERC) and the Canadian Institute for Advanced Research (CIFAR) AI chair program. Resources used in preparing this research were provided, in part, by the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute www.vectorinstitute.ai/#partners.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Ludwig-Maximilians-University Munich, Theresienstr. 37, Munich, D-80333, Germany
Hannah Lange & Fabian Döschl
Max-Planck-Institute for Quantum Optics, Hans-Kopfermann-Str.1, Garching, D-85748, Germany
Hannah Lange
Munich Center for Quantum Science and Technology, Schellingstr. 4, Munich, D-80799, Germany
Hannah Lange, Fabian Döschl & Annabelle Bohrdt
Department of Physics, 60 Saint George St., University of Toronto, Toronto, Ontario, M5S 1A7, Canada
Juan Carrasquilla
Vector Institute, MaRS Centre, Toronto, Ontario, M5G 1M1, Canada
Juan Carrasquilla
Department of Physics and Astronomy, University of Waterloo, Ontario, N2L 3G1, Canada
Juan Carrasquilla
University of Regensburg, Universitätsstr. 31, Regensburg, D-93053, Germany
Annabelle Bohrdt

Authors

Hannah Lange
View author publications
You can also search for this author in PubMed Google Scholar
Fabian Döschl
View author publications
You can also search for this author in PubMed Google Scholar
Juan Carrasquilla
View author publications
You can also search for this author in PubMed Google Scholar
Annabelle Bohrdt
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

H.L. performed all calculations and wrote large parts of the manuscript. A.B. supervised the work. H.L., F.D., J.C. and A.B. jointly analyzed the results and contributed to writing the manuscript.

Corresponding author

Correspondence to Hannah Lange.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Communications Physics thanks Michele Casula, Douglas Hendry and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Peer Review File

Supplementary material

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Lange, H., Döschl, F., Carrasquilla, J. et al. Neural network approach to quasiparticle dispersions in doped antiferromagnets. Commun Phys 7, 187 (2024). https://doi.org/10.1038/s42005-024-01678-7

Download citation

Received: 21 November 2023
Accepted: 29 May 2024
Published: 12 June 2024
DOI: https://doi.org/10.1038/s42005-024-01678-7

This article is cited by

Spectroscopy of two-dimensional interacting lattice electrons using symmetry-aware neural backflow transformations
- Imelda Romero
- Jannes Nys
- Giuseppe Carleo
Communications Physics (2025)

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

NQS dispersion relations

t − XXZ model in 1D

t − J model on a square lattice

t − J model on a triangular lattice

Performance of the RNN ansatz

Conclusions

Methods

The RNN ansatz and its bottlenecks

The partial learning problem

Comparison of optimizers

Symmetries

Complexity of the learning problem

NQS dispersion relations

Architecture and training

Fermionic RNN Wave Functions

Data availability

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links