[go: up one dir, main page]

Skip to main content
Log in

Nonlinear Model Reduction for Slow–Fast Stochastic Systems Near Unknown Invariant Manifolds

  • Published:
Journal of Nonlinear Science Aims and scope Submit manuscript

This article has been updated

Abstract

We introduce a nonlinear stochastic model reduction technique for high-dimensional stochastic dynamical systems that have a low-dimensional invariant effective manifold with slow dynamics and high-dimensional, large fast modes. Given only access to a black-box simulator from which short bursts of simulation can be obtained, we design an algorithm that outputs an estimate of the invariant manifold, a process of the effective stochastic dynamics on it, which has averaged out the fast modes, and a simulator thereof. This simulator is efficient in that it exploits of the low dimension of the invariant manifold, and takes time-steps of size dependent on the regularity of the effective process, and therefore typically much larger than that of the original simulator, which had to resolve the fast modes. The algorithm and the estimation can be performed on the fly, leading to efficient exploration of the effective state space, without losing consistency with the underlying dynamics. This construction enables fast and efficient simulation of paths of the effective dynamics, together with estimation of crucial features and observables of such dynamics, including the stationary distribution, identification of metastable states, and residence times and transition rates between them.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data Availability

Data deposition: The software package implementing the proposed algorithms can be found on https://github.com/yexf308/ATLAS.

Change history

  • 22 December 2023

    Table 3 has been updated in this article.

Notes

  1. We note here that we tried other approaches toward estimating \({\hat{\Lambda }^{{l}}_d}\), for example by attempting to solve a least squares problem in the space of positive definite matrices of rank d directly, ideally with respect to a natural Riemannian metric on that space. While natural, this was significantly more computationally expensive, and it did not lead to significantly different results.

  2. Of course, we do not explicitly construct such cells; we only need a function mapping a state \(\textbf{z}\) to the index of cell it belongs to, and this is achieved by finding the nearest landmark in the \(\hat{\tilde{\rho }} \) “metric”.

References

  • Abourashchi, N., Veretennikov, A.Y.: On stochastic averaging and mixing. Theory Stoch. Process. 16(1), 111–129 (2010)

    MathSciNet  MATH  Google Scholar 

  • Alexander, R., Giannakis, D.: Operator-theoretic framework for forecasting nonlinear time series with kernel analog techniques. Physica D 409, 132520 (2020)

    MathSciNet  MATH  Google Scholar 

  • Bakhtin, V., Kifer, Y.: Diffusion approximation for slow motion in fully coupled averaging. Probab. Theory Relat. Fields 129(2), 157–181 (2004)

    MathSciNet  MATH  Google Scholar 

  • Berglund, N., Gentz, B.: Geometric singular perturbation theory for stochastic differential equations. J. Differ. Equ. 191(1), 1–54 (2003)

    ADS  MathSciNet  MATH  Google Scholar 

  • Berglund, N., Gentz, B.: Noise-Induced Phenomena in Slow-Fast Dynamical Systems. Probability and its Applications. Springer, Berlin (2006)

    MATH  Google Scholar 

  • Beygelzimer A., Kakade S., Langford J.: Cover trees for nearest neighbor. In: ICML, pp. 97–104 (2006)

  • Bicout, D.J., Szabo, A.: Entropic barriers, transition states, funnels, and exponential protein folding kinetics: a simple model. Protein Sci. 9(3), 452–465 (2000)

    PubMed  PubMed Central  CAS  MATH  Google Scholar 

  • Bittracher, A., Banisch, R., Schütte, C.: Data-driven computation of molecular reaction coordinates. J. Chem. Phys. 149(15), 154103 (2018)

    ADS  PubMed  MATH  Google Scholar 

  • Chen, M., Tang-Qing, Yu., Tuckerman, M.E.: Locating landmarks on high-dimensional free energy surfaces. Proc. Natl. Acad. Sci. USA 112(11), 3235–3240 (2015)

    ADS  PubMed  PubMed Central  CAS  MATH  Google Scholar 

  • Chiavazzo, E., Covino, R., Coifman, R.R., William Gear, C., Georgiou, A.S., Hummer, G., Kevrekidis, I.G.: Intrinsic map dynamics exploration for uncharted effective free-energy landscapes. Proc. Natl. Acad. Sci. USA 114(28), E5494–E5503 (2017)

    PubMed  PubMed Central  CAS  Google Scholar 

  • Coifman, R.R., Lafon, S., Lee, A.B., Mauro Maggioni, B., Nadler, F.W., Zucker, S.W.: Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps. Proc. Natl. Acad. Sci. USA 102(21), 7426–7431 (2005)

    ADS  PubMed  PubMed Central  CAS  MATH  Google Scholar 

  • Coifman, R.R., Kevrekidis, I.G., Lafon, S., Maggioni, M., Nadler, B.: Diffusion maps, reduction coordinates, and low dimensional representation of stochastic systems. Multiscale Model. Simul. 7(2), 842–864 (2008)

    MathSciNet  MATH  Google Scholar 

  • Crosskey, M.C., Maggioni, M.: Atlas: a geometric approach to learning high-dimensional stochastic systems near manifolds. Multiscale Model. Simul. 15(1), 110–156 (2017)

    MathSciNet  MATH  Google Scholar 

  • Dietrich, F., Makeev, A., Kevrekidis, G., Evangelou, N., Bertalan, T., Reich, S., Kevrekidis, I.G.: Learning effective stochastic differential equations from microscopic simulations: combining stochastic numerics and deep learning (2021)

  • Dsilva, C.J., Ronen Talmon, C., Gear, W., Coifman, R.R., Kevrekidis, I.G.: Data-driven reduction for a class of multiscale fast-slow stochastic dynamical systems. SIAM J. Appl. Dyn. Syst. 15(3), 1327–1351 (2016)

    MathSciNet  MATH  Google Scholar 

  • Freidlin, M.I., Szucs, J., Wentzell, A.D.: Random Perturbations of Dynamical Systems. Grundlehren der mathematischen Wissenschaften. Springer, Berlin (2012)

    MATH  Google Scholar 

  • Frewen, T.A., Hummer, G., Kevrekidis, I.G.: Exploration of effective potential landscapes using coarse reverse integration. J. Chem. Phys. 131(13), 134104 (2009)

    ADS  PubMed  PubMed Central  MATH  Google Scholar 

  • Frewen, T.A., Hummer, G., Kevrekidis, I.G.: Exploration of effective potential landscapes using coarse reverse integration. J. Chem. Phys. 131(13), 134104 (2009)

    ADS  PubMed  PubMed Central  MATH  Google Scholar 

  • Gardiner, C.: Stochastic Methods: A Handbook for the Natural and Social Sciences. Springer Series in Synergetics, Springer, Berlin (2009)

    MATH  Google Scholar 

  • Ge, H., Qian, H.: Landscapes of non-gradient dynamics without detailed balance: stable limit cycles and multiple attractors. Chaos An Interdiscip. J. Nonlinear Sci. 22(2), 023140 (2012)

    MathSciNet  MATH  Google Scholar 

  • Givon, D.: Strong convergence rate for two-time-scale jump-diffusion stochastic differential systems. Multiscale Model. Simul. 6(2), 577–594 (2007)

    MathSciNet  MATH  Google Scholar 

  • Givon, D., Kupferman, R., Stuart, A.: Extracting macroscopic dynamics: model problems and algorithms. Nonlinearity 17(6), R55–R127 (2004)

    ADS  MathSciNet  MATH  Google Scholar 

  • Givon, D., Kevrekidis, I.G., Kupferman, R.: Strong convergence of projective integration schemes for singularly perturbed stochastic differential systems. Commun. Math. Sci. 4(4), 707–729 (2006)

    MathSciNet  MATH  Google Scholar 

  • Hartmann, C., Neureither, L., Sharma, U.: Coarse graining of nonreversible stochastic differential equations: quantitative results and connections to averaging. SIAM J. Math. Anal. 52(3), 2689–2733 (2020)

    MathSciNet  MATH  Google Scholar 

  • Has’minskii, R.Z.: On stochastic processes defined by differential equations with a small parameter. Theory Probab. Appl. 11(2), 211–228 (1966)

    MathSciNet  MATH  Google Scholar 

  • Holmes, P., Lumley, J.L., Berkooz, G., Rowley, C.W.: Turbulence, Coherent Structures, Dynamical Systems and Symmetry. Cambridge Monographs on Mechanics, 2nd edn. Cambridge University Press, Cambridge (2012)

    MATH  Google Scholar 

  • Husic, B.E., Pande, V.S.: Markov state models: from an art to a science. J. Am. Chem. Soc. 140(7), 2386–2396 (2018). (PMID: 29323881)

    PubMed  CAS  MATH  Google Scholar 

  • Jiang, D.-Q., Qian, M., Qian, M.-P.: Mathematical Theory of Nonequilibrium Steady States, Volume 1833 of Lecture Notes in Mathematics. Springer, Berlin, (2004). On the frontier of probability and dynamical systems

  • Jones, C.K.R.T.: Geometric singular perturbation theory. In: Dynamical systems (Montecatini Terme, 1994), Volume 1609 of Lecture Notes in Mathematics, pp. 44–118. Springer, Berlin (1995)

  • Khas’minskii, R.Z.: A limit theorem for the solutions of differential equations with random right-hand sides. Theory Probab. Appl. 11(3), 390–406 (1966)

    MATH  Google Scholar 

  • Khasminskii, R.Z., Yin, G.: On averaging principles: an asymptotic expansion approach. SIAM J. Math. Anal. 35(6), 1534–1560 (2004)

    MathSciNet  MATH  Google Scholar 

  • Kifer, Y.: Another proof of the averaging principle for fully coupled dynamical systems with hyperbolic fast motions. Discrete Contin. Dyn. Syst. A 13(5), 1187–1201 (2005)

    MathSciNet  MATH  Google Scholar 

  • Kim, S.B., Dsilva, C.J., Kevrekidis, I.G., Debenedetti, P.G.: Systematic characterization of protein folding pathways using diffusion maps: application to trp-cage miniprotein. J. Chem. Phys. 142(8), 085101 (2015)

    ADS  PubMed  Google Scholar 

  • Klus, S., Nüske, F., Koltai, P., Hao, W., Kevrekidis, I., Schütte, C., Noé, F.: Data-driven model reduction and transfer operator approximation. J. Nonlinear Sci. 28(3), 985–1010 (2018)

    ADS  MathSciNet  MATH  Google Scholar 

  • Kuehn, C.: Multiple time Scale Dynamics, Volume 191 of Applied Mathematical Sciences. Springer, Berlin (2015)

    MATH  Google Scholar 

  • Kutz, J.N., Brunton, S.L., Brunton, B.W., Proctor, J.L.: Dynamic Mode Decomposition. SIAM (2016)

  • Legoll, F., Lelièvre, T.: Effective dynamics using conditional expectations. Nonlinearity 23(9), 2131–2163 (2010)

    ADS  MathSciNet  MATH  Google Scholar 

  • Legoll, F., Lelièvre, T.: Some remarks on free energy and coarse-graining. In Numerical analysis of multiscale computations, Volume 82 of Lecture Notes Computing Science Engineering, pp. 279–329. Springer (2012)

  • Leimkuhler, B., Matthews, C.: Molecular Dynamics: With Deterministic and Stochastic Numerical Methods. Interdisciplinary Applied Mathematics. Springer, Berlin (2015)

    MATH  Google Scholar 

  • Li, X.-M.: An averaging principle for a completely integrable stochastic Hamiltonian system. Nonlinearity 21(4), 803–822 (2008)

    ADS  MathSciNet  MATH  Google Scholar 

  • Liberty, E., Woolfe, F., Martinsson, P.-G., Rokhlin, V., Tygert, M.: Randomized algorithms for the low-rank approximation of matrices. Proc. Natl. Acad. Sci. USA 104(51), 20167–20172 (2007)

    ADS  MathSciNet  PubMed  PubMed Central  CAS  MATH  Google Scholar 

  • Little, A.V., Maggioni, M., Rosasco, L.: Multiscale geometric methods for data sets i: multiscale svd, noise and curvature. Appl. Comput. Harmon. Anal. 43(3), 504–567 (2017)

    MathSciNet  MATH  Google Scholar 

  • Liu, D.: Strong convergence of principle of averaging for multiscale stochastic dynamical systems. Commun. Math. Sci. 8(4), 999–1020 (2010)

    MathSciNet  MATH  Google Scholar 

  • Liu, P., Siettos, C.I., Gear, C.W., Kevrekidis, I.G.: Equation-free model reduction in agent-based computations: coarse-grained bifurcation and variable-free rare event analysis. Math. Model. Nat. Phenom. 10(3), 71–90 (2015)

    MathSciNet  MATH  Google Scholar 

  • Maria Bruna, S., Chapman, J., Smith, M.J.: Model reduction for slow-fast stochastic systems with metastable behaviour. J. Chem. Phys. 140(17), 174107 (2014)

    ADS  PubMed  MATH  Google Scholar 

  • Molgedey, L., Schuster, H.G.: Separation of a mixture of independent signals using time delayed correlations. Phys. Rev. Lett. 72, 3634–3637 (1994)

    ADS  PubMed  CAS  MATH  Google Scholar 

  • Pavliotis, G.A., Stuart, A.: Multiscale Methods: Averaging and Homogenization. Texts in Applied Mathematics, Springer, Berlin (2008)

    MATH  Google Scholar 

  • Pérez-Hernández, G., Paul, F., Giorgino, T., De Fabritiis, G., Noé, F.: Identification of slow molecular order parameters for Markov model construction. J. Chem. Phys. 139(1), 015102 (2013)

    ADS  PubMed  Google Scholar 

  • Röckner, M., Sun, X., Xie, L.: Strong and weak convergence in the averaging principle for sdes with hölder coefficients (2019)

  • Rohrdanz, M.A., Zheng, W., Maggioni, M., Clementi, C.: Determination of reaction coordinates via locally scaled diffusion map. J. Chem. Phys. 134(12), 124116 (2011)

    ADS  PubMed  Google Scholar 

  • Rohrdanz, M.A., Zheng, W., Clementi, C.: Discovering mountain passes via torchlight: methods for the definition of reaction coordinates and pathways in complex macromolecular reactions. Annu. Rev. Phys. Chem. 64(1), 295–316 (2013)

    ADS  PubMed  CAS  Google Scholar 

  • Rowley, C.W., Mezič, I., Bagheri, S., Schlatter, P., Henningson, D.S.: Spectral analysis of nonlinear flows. J. Fluid Mech. 641, 115–127 (2009)

    ADS  MathSciNet  MATH  Google Scholar 

  • Schappals, M., Mecklenfeld, A., Kröger, L., Botan, V., Köster, A., Stephan, S., García, E.J., Rutkai, G., Raabe, G., Klein, P., Leonhard, K., Glass, C.W., Lenhard, J., Vrabec, J., Hasse, H.: Round robin study: molecular simulation of thermodynamic properties from models with internal degrees of freedom. J. Chem. Theory Comput. 13(9), 4270–4280 (2017)

    PubMed  CAS  Google Scholar 

  • Singer, A., Erban, R., Kevrekidis, I.G., Coifman, R.R.: Detecting intrinsic slow variables in stochastic dynamical systems by anisotropic diffusion maps. Proc. Natl. Acad. Sci. USA 106(38), 16090–16095 (2009)

    ADS  PubMed  PubMed Central  CAS  MATH  Google Scholar 

  • Tribello, G.A., Bonomi, M., Branduardi, D., Camilloni, C., Bussi, G.: Plumed 2: new feathers for an old bird. Comput. Phys. Commun. 185(2), 604–613 (2014)

    ADS  CAS  Google Scholar 

  • van Kampen, N.G.: Stochastic Processes in Physics and Chemistry, Volume 888 of Lecture Notes in Mathematics. North-Holland Publishing Co. (1981)

  • Vanden-Eijnden, E.: Numerical techniques for multi-scale dynamical systems with stochastic effects. Commun. Math. Sci. 1 (2003)

  • Vershynin, R.: Deviations of Random Matrices and Geometric Consequences, pp. 216–231. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge (2018)

  • Wechselberger, M.: Geometric Singular Perturbation Theory Beyond the Standard form, Volume 6 of Frontiers in Applied Dynamical Systems: Reviews and Tutorials. Springer, Cham (2020)

  • Weinan, E., Vanden-Eijnden, E.: Metastability, conformation dynamics, and transition pathways in complex systems. In: Multiscale Modelling and Simulation, vol. 39 of Lecture Notes Computing Science Engineering, pp. 35–68. Springer (2004)

  • Weinan, E., Liu, D., Vanden-Eijnden, E.: Analysis of multiscale methods for stochastic differential equations. Commun. Pure Appl. Math. 58(11), 1544–1585 (2005)

    MathSciNet  MATH  Google Scholar 

  • Yu, A., Veretennikov: On the averaging principle for systems of stochastic differential equations. Math. USSR Sb. 69(1), 271–284 (1991)

    MathSciNet  MATH  Google Scholar 

  • Zhang, B., Hongbo, F., Wan, L., Liu, J.: Weak order in averaging principle for stochastic differential equations with jumps. Adv. Differ. Equ. 1, 2018 (2018)

    MathSciNet  MATH  Google Scholar 

  • Zheng, W., Rohrdanz, M.A., Clementi, C.: Rapid exploration of configuration space with diffusion-map-directed molecular dynamics. J. Phys. Chem. B 117(42), 12769–12776 (2013)

    PubMed  CAS  MATH  Google Scholar 

Download references

Acknowledgements

We thank Y. Kevrekidis and F. Lu for helpful discussions related to this work. MM is grateful for partial support from DOE-255223, FA9550-20-1-0288, NSF-1837991, NSF-1913243, and the Simons Fellowship. FY is grateful for partial support from AMS-Simons travel grant, Travel Support for Mathematicians from Simons Foundation. Prisma Analytics, Inc. provided computing equipment and support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Felix X.-F. Ye.

Additional information

Communicated by Alain Goriely.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A: Assumption, Linear Approximation, and Averaging

We briefly review here the definitions of slow and invariant manifolds, some of the very basic expansions in geometric singular perturbation theory that motivate our key linearized model in Eq. (3.1), and the assumptions underlying them.

As a matter of notation, \(E_{i\cdot }\) denotes the i-th row of a matrix E, and \(E_{\cdot j}\) denotes the j-th column of a matrix E.

1.1 A.1: Assumption

The following assumptions 1-4 ensure that for the original latent stochastic system Eq. (2.1) there exists a uniformly asymptotically stable invariant manifold \( {\mathcal {M}}^{\textbf{x}}_{{\epsilon }}\) (Berglund and Gentz 2006, 2003; Kuehn 2015).

Assumption 1

Domain and differentiability: \(f\in {\mathcal {C}}^2({\mathcal {D}},{\mathbb {R}}^{D-d}), g\in {\mathcal {C}}^2({\mathcal {D}}, {\mathbb {R}}^d)\) and \(F\in {\mathcal {C}}^1({\mathcal {D}}, {\mathbb {R}}^{(D-d)\times (D-d)}), G\in {\mathcal {C}}^1({\mathcal {D}}, {\mathbb {R}}^{d \times d })\), where \({\mathcal {D}}\) is an open subset of \({\mathbb {R}}^d \times {\mathbb {R}}^{D-d}\). We further assume that fgFG are bounded in sup-norm by a constant M within \({\mathcal {D}}\).

Assumption 2

Slow manifold: there is a connected open subset \({\mathcal {D}}_0\subset {\mathbb {R}}^d\) and a continuous function \(\textbf{y}^\star : {\mathcal {D}}_0\rightarrow {\mathbb {R}}^{D-d}\) such that

$$\begin{aligned} {\mathcal {M}}^{\textbf{x}}_{0} =\{(\textbf{x},\textbf{y}) \in {\mathcal {D}}: \textbf{y}=\textbf{y}^\star (\textbf{x}), \textbf{x}\in {\mathcal {D}}_0\} \end{aligned}$$

is a slow manifold of the deterministic system, that is, \((\textbf{x},\textbf{y}^\star (\textbf{x}))\in {\mathcal {D}}\) and \(f(\textbf{x}, \textbf{y}^\star (\textbf{x}))=0\) for all \(\textbf{x}\in {{\mathcal {D}}}_0\).

Assumption 3

Stability: the slow manifold is uniformly asymptotically stable, that is, all eigenvalues of the Jacobian matrix

$$\begin{aligned} A^{\star }(\textbf{x})=\partial _\textbf{y}f( \textbf{x},\textbf{y}^\star (\textbf{x})) \end{aligned}$$

have negative real parts, uniformly bounded away from 0 for all \(\textbf{x}\in {\mathcal {D}}_0\).

Assumption 4

Non-degeneracy: the diffusivity matrix \(F(\textbf{x},\textbf{y})F(\textbf{x}, \textbf{y})^T\) is positive definite.

Under these assumptions, Fenichel’s theorem guarantees the existence of an invariant manifold (also called adiabatic manifold) (Berglund and Gentz 2006, 2003; Jones 1995)

$$\begin{aligned} {\mathcal {M}}^{\textbf{x}}_{{\epsilon }} =\{(\textbf{x},\textbf{y})\in {\mathcal {D}}: \textbf{y}=\bar{\textbf{y}}(\textbf{x}, \epsilon ), \textbf{x}\in {\mathcal {D}}_0\}, \end{aligned}$$

in a neighborhood of which trajectories concentrate for an extended time w.h.p. Also, \( {\mathcal {M}}^{\textbf{x}}_{{\epsilon }}\) is close to the slow manifold \( {\mathcal {M}}^{\textbf{x}}_{0}\) in the sense that \(\bar{\textbf{y}}(\textbf{x}, \epsilon )=\textbf{y}^\star (\textbf{x})+{{O}}(\epsilon )\).

The next assumption 5 imposes that the effect of the drift term is small relative to the effect of the diffusion term.

Assumption 5

Diffusion-dominated dynamics: for any \(1\le l\le L\), \(\sqrt{\sigma _d(\Lambda ^{{l}})} \gg \Vert \textbf{b}^{{l}}\Vert \sqrt{\tau }\).

This assumption allows us to simplify the construction of the diffusion-adapted, Mahalanobis-like metric. Indeed, the \(\sqrt{\tau }\)-neighborhood of the landmark \(B(\textbf{z}^l, \sqrt{\tau })\) that we use is not exactly the same as the estimated \(p\%\) confidence region of finding the effective reduced stochastic system at time \(\sqrt{\tau }\), started at \(\textbf{z}^l\): that would be better approximated by

$$\begin{aligned} \tilde{B}(\textbf{z}^l, \sqrt{\tau }):=\{\textbf{v}:\frac{1}{{\mathcal {X}}^2_d(p)}(\textbf{v}-\textbf{z}^{l,\text {slw}}_0 -\textbf{b}^{{l}}\tau )^T(\Lambda ^{{l}})^\dag (\textbf{v}-\textbf{z}^{l,\text {slw}}_0 -\textbf{b}^{{l}}\tau )<\tau \}. \end{aligned}$$

However, with the assumption of diffusion-dominated dynamics, the approximation we use is satisfactory. Indeed, the boundary of \(B(\textbf{z}^l, \sqrt{\tau })\) is a hyperellipsoid of dimension d embedded in \(\textbf{R}^D\). Columns of \( U^{{l,\textrm{slw}}}_{d} \), which are the eigenvectors of \(\Lambda ^{{l}}\), define the principle axes of the hyperellipsoid. \(\sigma _{1}(\Lambda ^{{l}}), \sigma _{2}(\Lambda ^{{l}}), \cdots , \sigma _{d}(\Lambda ^{{l}})\) are proportional to squares of the lengths of the semi-axes. The vertices of the hyperellipsoid at time t are \(\textbf{z}^{l,\text {slw}}_0\pm \sqrt{\chi _d^2(p)\sigma _i(\Lambda ^{{l}}) t} \left( U^{{l,\textrm{slw}}}_{d} \right) _{\cdot i}\) for \(i=1,2,\cdots , d\), so the minimum length of semi-axes at \(t=\tau \) for the hyperellipsoid is \(\smash {\sqrt{\chi _d^2(p)\sigma _d(\Lambda ^{{l}}) \tau }}\). In the mean time, the center of the hyperellipsoid is moved by \(\Vert \textbf{b}^{{l}}\Vert \tau \) if we use \(\tilde{B}(\textbf{z}^l, \sqrt{\tau })\) instead of \(B(\textbf{z}^l, \sqrt{\tau })\). Assumption 5 guarantees that the movement of the center is negligible relative to the length of the semi-axes of the hyperellipsoid.

1.2 A.2: Linear Approximation

In Eq. (3.1) we assumed linear approximations of the time-dependent expectation \(\textbf{m}^{{l}}_t\) and covariance \(C(\textbf{z}^{{l}}_t|\textbf{z}^{{l}}_0)\). The slopes of these quantities, as a function of time, are \(\smash {\textbf{b}^{{l}}}\) and \(\smash {\Lambda ^{{l}}}\), and the intercepts of these quantities are \(\smash {\textbf{z}_0^{{l}}}\) and \(\smash {\Gamma ^{{l}}}\). In this section, we provide some mathematical intuitions for this assumption, following the exposition of Berglund and Gentz (2006), to which we refer the reader for further details. We start by considering the system in the latent space, and its linear approximation (2.1) near the invariant manifold. First, we define the deviation of sample paths from the invariant manifold: \(\mathbf{\zeta }_t:=\textbf{y}_t-\bar{\textbf{y}}(\textbf{x}_t, \epsilon )\). An application of Itô’s formula implies that the fast dynamics part \(\mathbf{\zeta }_t\) satisfies

(A.1)

where , and \(W_t = [U_t; V_t]\), (Here “; ” denotes concatenation of column vectors). Always following Berglund and Gentz (2006), if we ignore the Itô-term and use the fact that the drift term vanishes when \(\mathbf{\zeta }=0\), we derive that the invariant manifold \(\bar{\textbf{y}}(\textbf{x}_t, \epsilon )\) should satisfy the following PDE:

$$\begin{aligned} f(\textbf{x}_t,\bar{\textbf{y}}(\textbf{x}_t, \epsilon ))= \epsilon \partial _\textbf{x}\bar{\textbf{y}}(\textbf{x}_t, \epsilon ) g(\textbf{x}_t,\bar{\textbf{y}}(\textbf{x}_t, \epsilon )). \end{aligned}$$
(A.2)

Therefore, by Taylor expansion and Eq. (A.2), we have the following linear approximation \(\tilde{\mathbf{\zeta }}_t\) of the fast dynamics \(\mathbf{\zeta }_t\) in Eq. (A.1):

$$\begin{aligned} \textrm{d}\tilde{\mathbf{\zeta }}_t= \frac{1}{\epsilon } A(\textbf{x}_t, \epsilon )\tilde{\mathbf{\zeta }}_t\textrm{d}t + \frac{1}{\sqrt{\epsilon }} F_0(\textbf{x}_t, \epsilon )\textrm{d}W_t\,, \end{aligned}$$
(A.3)

where

$$\begin{aligned}&A(\textbf{x},\epsilon ):= \partial _\textbf{y}f(\textbf{x},\bar{\textbf{y}}(\textbf{x}, \epsilon ) )-\epsilon \partial _\textbf{x}\bar{\textbf{y}}(\textbf{x}, \epsilon ) \partial _\textbf{y}g(\textbf{x},\bar{\textbf{y}}(\textbf{x}, \epsilon ))\quad ,\\&F_0(\textbf{x},\epsilon ):=\frac{1}{\sqrt{\epsilon }} \begin{bmatrix} -\sqrt{{\epsilon }}\partial _\textbf{x}\bar{\textbf{y}}(\textbf{x}, \epsilon ) G(\textbf{x},\bar{\textbf{y}}(\textbf{x}, \epsilon ))&F(\textbf{x},\bar{\textbf{y}}(\textbf{x}, \epsilon )) \end{bmatrix}\,. \end{aligned}$$

As for the slow dynamics \([\textbf{x}_t; \bar{\textbf{y}}(\textbf{x}_t, \epsilon )]\), we have

$$\begin{aligned} \textrm{d}\begin{bmatrix} \textbf{x}_t \\ \bar{\textbf{y}}(\textbf{x}_t, \epsilon ) \end{bmatrix} = g^{\text {slw}}(\textbf{x}_t)\textrm{d}t + G^{\text {slw}}(\textbf{x}_t)\textrm{d}U_t, \end{aligned}$$
(A.4)

where , , and here , the term . Here of course \(\bar{\textbf{y}}(\textbf{x}_t, \epsilon )\) is slaved to \(\textbf{x}_t\), and we call the dynamics of \(\textbf{x}_t\) only the slow reduced dynamics happening in \(\mathbb {R}^d\).

The dynamics of \(\tilde{\mathbf{\zeta }}_t\) shown in Eq. (A.3) is a high dimensional Ornstein–Uhlenbeck process, and thus its expectation \(\mathbb {E}\left[ \tilde{\mathbf{\zeta }}^l_t\right| \textbf{z}_0^l]\) and covariance \(\textrm{cov}(\tilde{\mathbf{\zeta }}^l_t|\textbf{z}_0^l)\) at landmark l stabilize exponentially fast to \(\textbf{0}\) and some matrix \(\Theta (\textbf{x}_0^l)\). In particular, the error is of order \(O({\epsilon })\) once t reaches the timescale of separation \(\tau \gg \epsilon \), where here we have utilized assumption 3. At the same time, at the timescale of separation \(\tau \), the slow dynamics does not change significantly. In particular, in a neighborhood of a landmark \(\textbf{z}^{l}\), the drift and diffusion coefficients \(g^{\text {slw}}(\textbf{x}^l_t), G^{\text {slw}}(\textbf{x}^l_t)\) in Eq. (A.4) do not change too much, and we may treat the slow dynamics as having nearly constant drift and diffusion coefficients. These equations and reasoning justify the form of reduced equations obtain by averaging, in the limit \({\epsilon }\rightarrow 0\), of the form of Eq. (2.2).

Now, notice that we have the following decomposition of the latent coordinate \(\textbf{w}_t^l\) into coordinates for fast dynamics \(\mathbf{\zeta }^l_t\) and coordinates for slow dynamics \([\textbf{x}^l_t;\bar{\textbf{y}}(\textbf{x}^l_t, \epsilon )]\):

$$\begin{aligned} \textbf{w}_t^l = \begin{bmatrix} {\displaystyle O}_{d\times (D-d)}\\ I_{(D-d)\times (D-d)} \end{bmatrix} \mathbf{\zeta }^l_t + \begin{bmatrix} \textbf{x}^l_t\\ \bar{\textbf{y}}(\textbf{x}^l_t, \epsilon ) \end{bmatrix}. \end{aligned}$$
(A.5)

Then, according to previous analysis regarding timescale of separation as well as Eq. (A.5), we have the following linear approximations of the latent coordinate at times t comparable to \(\tau \):

$$\begin{aligned} \begin{aligned} {\mathbb {E}}\left[ \textbf{w}^{{l}}_t \right| \textbf{w}^{{l}}_0]&= \begin{bmatrix} \textbf{x}^l_0\\ \bar{\textbf{y}}(\textbf{x}^l_0, \epsilon ) \end{bmatrix} +g^{\text {slw}}(\textbf{x}^l_0)t +{{O}}({\epsilon }), \\ {\text {cov}}(\textbf{w}^{{l}}_t|\textbf{w}^{{l}}_0)&= \begin{bmatrix} {\displaystyle O}_{d\times (D-d)}\\ I_{(D-d)\times (D-d)} \end{bmatrix} \Theta (\textbf{x}_0^l)\begin{bmatrix} {\displaystyle O}_{d\times (D-d)}\\ I_{(D-d)\times (D-d)} \end{bmatrix}^T \\&\quad + G^{\text {slw}}(\textbf{x}^l_0)\left[ G^{\text {slw}}(\textbf{x}^l_0)\right] ^Tt + {{O}}({\epsilon }). \end{aligned} \end{aligned}$$
(A.6)

These equations motivate Eq. (3.1), but in the latent space.

We now proceed to consider the situation in the observed variables, locally around a fixed point \(\textbf{z}_t^l\); in particular we consider the behavior of the time-dependent expectation \(\textbf{m}^{{l}}_t\) and covariance \(C(\textbf{z}^{{l}}_t|\textbf{z}^{{l}}_0)\). Assume the local dynamics around landmark \(\textbf{z}^l\) is within the chart \((\mathcal {U}_{\alpha (l)},\varphi _{\alpha (l)})\), and let \(\varphi _{\alpha (l)}(\textbf{z}_t^l)=\textbf{w}_t^l=[\textbf{x}_t^l; \textbf{y}_t^l]\). We assume that the 1-st order Taylor approximation of \(\varphi ^{-1}_{\alpha (l)}\) is accurate enough, which corresponds to the map \(\varphi \) having sufficiently small Hessian, or restricting the size of the neighborhood under consideration to be small enough. We can then assume that for t comparable to \(\tau \), we have:

$$\begin{aligned} \varphi ^{-1}_{\alpha (l)}(\textbf{x}^l_t,\textbf{y}^l_t)=&\,\varphi ^{-1}_{\alpha (l)}(\textbf{x}^l_0,\bar{\textbf{y}}(\textbf{x}^l_0, \epsilon ))\\&+J^T(\varphi ^{-1}_{\alpha (l)})\mid _{(\textbf{x}^l_0 , \bar{\textbf{y}}(\textbf{x}^l_0, \epsilon ))} \left( \begin{bmatrix}\textbf{x}^l_t \\ \textbf{y}^l_t \end{bmatrix}-\begin{bmatrix}\textbf{x}^l_0 \\ \bar{\textbf{y}}(\textbf{x}^l_0, \epsilon ) \end{bmatrix}\right) +{{O}}({\epsilon })\,. \end{aligned}$$

Then, according to Eq. (A.6), we have the linear approximations in the observation space of the form

$$\begin{aligned} \textbf{z}^{l,\text {slw}}_0&= \varphi ^{-1}_{\alpha (l)}(\textbf{x}^l_0,\bar{\textbf{y}}(\textbf{x}^l_0, \epsilon ))\,,\\ \textbf{b}^{{l}}&= J^T(\varphi ^{-1}_{\alpha (l)})\mid _{(\textbf{x}^l_0 , \bar{\textbf{y}}(\textbf{x}^l_0, \epsilon ))}g^{\text {slw}}(\textbf{x}^l_0)\,, \\ \Gamma ^{{l}}&= J^T(\varphi ^{-1}_{\alpha (l)})\mid _{(\textbf{x}^l_0 , \bar{\textbf{y}}(\textbf{x}^l_0, \epsilon ))}\begin{bmatrix} I_{(D-d)\times (D-d)} \\ {\displaystyle O}_{d\times (D-d)} \end{bmatrix}\\&\quad \Theta (\textbf{x}_0^l)\begin{bmatrix} I_{(D-d)\times (D-d)} \\ {\displaystyle O}_{d\times (D-d)} \end{bmatrix}^T J(\varphi ^{-1}_{\alpha (l)})\mid _{(\textbf{x}^l_0 , \bar{\textbf{y}}(\textbf{x}^l_0, \epsilon ))}\,, \\ \Lambda ^{{l}}&= J^T(\varphi ^{-1}_{\alpha (l)})\mid _{(\textbf{x}^l_0 , \bar{\textbf{y}}(\textbf{x}^l_0, \epsilon ))}G^{\text {slw}}(\textbf{x}^l_0)\left[ G^{\text {slw}}(\textbf{x}^l_0)\right] ^T J(\varphi ^{-1}_{\alpha (l)})\mid _{(\textbf{x}^l_0 , \bar{\textbf{y}}(\textbf{x}^l_0, \epsilon ))}\,, \end{aligned}$$

which justify the crucial approximation in Eq. (3.1), which motivates all our local estimators.

1.3 A.3: Averaging

In this section, we briefly review the idea of stochastic averaging, which is a classical method to analyze fast/slow systems, see for example Freidlin et al. (2012) and Pavliotis and Stuart (2008) for a comprehensive review of this subject. See in particular, theorem 2.1 in chapter 7 of Freidlin et al. (2012), known as the averaging principle, Givon et al. (2004) for a survey of several approaches to the problem of extracting effective dynamics including averaging, where similarities and differences between these approaches are highlighted.

From the theory perspective, there exists huge body of literature on stochastic averaging, where strong and weak convergence results are provided under different regularity assumptions on slow–fast SDE coefficients, see, e.g., Givon et al. (2006), Khas’minskii (1966), Bakhtin and Kifer (2004), Kifer (2005), Li (2008) and Yu and Veretennikov (1991). Besides these, many works also study the rate of convergence of the original process to the reduced one (e.g., as a function of \({\epsilon }\)); see for example Pavliotis and Stuart (2008), Givon (2007), Khasminskii and Yin (2004), Liu (2010), Vanden-Eijnden (2003), Zhang et al. (2018), Röckner et al. (2019), Abourashchi and Veretennikov (2010), Has’minskii (1966) and Weinan et al. (2005). In particular, the recent work (Röckner et al. 2019) provides a very general, robust and unified method for establishing the averaging principle, involving both strong and weak convergence, for slow–fast SDEs with irregular coefficients and under the fully coupled case (i.e., the diffusion coefficient in the slow equation can depend on the fast term) for weak convergence (but not for strong convergence). This leads to simplifications and extensions of previously established results. It also shows that the strong and weak convergence rates depend only on the regularity of all the coefficients with respect to the slow variable.

In addition, a very recent work (Hartmann et al. 2020) provides quantitative results on the connection of averaging to coarse-graining and effective dynamics in multiscale studies. It also presents a detailed comparison of the averaging and the conditional expectation approach in the case of (non-reversible) Ornstein–Uhlenbeck (O-U) processes and isolate sufficient conditions under which the two approaches agree.

Now, we briefly review the formulation of averaging, and we start from considering in the latent space. From Eq. (2.1), we define the averaged approximation around the invariant manifold, as in Eq. (2.2), with the averaged coefficients given by Röckner et al. (2019):

$$\begin{aligned} \bar{g}(\textbf{x}) := \int _{\mathbb {R}^{D-d}} g(\textbf{x}, \textbf{y})\nu (\textrm{d}\textbf{y}|\textbf{x}) \quad ,\quad \bar{G}(\textbf{x}) := \sqrt{\int _{\mathbb {R}^{D-d}} G(\textbf{x},\textbf{y})G^{T}(\textbf{x},\textbf{y})\nu (\textrm{d}\textbf{y}|\textbf{x})}\,. \end{aligned}$$
(A.7)

Here the conditional invariant measure \(\nu (\textbf{y}|\textbf{x})\), as mentioned in Eq. (2.2), is the unique invariant measure of the process \(\textbf{Y}_t^\textbf{x}\) (Röckner et al. 2019), governed by the equations “frozen” at \(\textbf{x}\):

$$\begin{aligned} \left\{ \begin{aligned}&\textrm{d}{\textbf{Y}_t^\textbf{x}}= f(\textbf{x},\textbf{Y}_t^\textbf{x})\textrm{d}t+F(\textbf{x},\textbf{Y}_t^\textbf{x})\textrm{d}V_t \\&\textbf{Y}_0^\textbf{x}=\textbf{y}\end{aligned} \right. . \end{aligned}$$

Now we move to the observation space. In the local coordinates discussed in Sect. 2, split between the local tangent plane to the invariant manifold and the affine subspace containing the fast variables, we can decouple slow and fast variables. We then have the following local SDEs written in the variables \(\textbf{z}^{\text {slw}}\) and \({\varvec{\xi }}\), derived from Eq. (2.1) by applying Itô’s formula:

$$\begin{aligned} \left\{ \begin{aligned}&\textrm{d}\textbf{z}^{\text {slw}}_t = \tilde{g}(\textbf{z}^{\text {slw}}_t, {\varvec{\xi }}_t) \textrm{d}t + \tilde{G}(\textbf{z}^{\text {slw}}_t, {\varvec{\xi }}_t) \textrm{d}U_t \\&\textrm{d}{\varvec{\xi }}_t = \frac{1}{{\epsilon }} \tilde{f}(\textbf{z}^{\text {slw}}_t, {\varvec{\xi }}_t) \textrm{d}t + \frac{1}{\sqrt{{\epsilon }}}\tilde{F}(\textbf{z}^{\text {slw}}_t, {\varvec{\xi }}_t) \textrm{d}V_t \end{aligned} \right. . \end{aligned}$$

From these SDEs in local coordinates, we can define the reduced SDEs, as in Eq. (2.3), with the averaged coefficients given by

$$\begin{aligned} b(\textbf{z}^{\text {slw}}) =&\int _{\mathbb {R}^{D}} \tilde{g}(\textbf{z}^{\text {slw}}, {\varvec{\xi }})\nu (\textrm{d}{\varvec{\xi }}|\textbf{z}^{\text {slw}}) \,\,,\nonumber \\ H(\textbf{z}^{\text {slw}}) =&\, \sqrt{\int _{\mathbb {R}^{D}} \tilde{G}(\textbf{z}^{\text {slw}},{\varvec{\xi }})\tilde{G}^{T}(\textbf{z}^{\text {slw}}, {\varvec{\xi }})\nu (\textrm{d}{\varvec{\xi }}|\textbf{z}^{\text {slw}})}. \end{aligned}$$
(A.8)

Here again, the conditioned invariant measure \(\nu ({\varvec{\xi }}|\textbf{z}^{\text {slw}})\), as mentioned in Eq. (2.3), is the unique invariant measure of the process \({\varvec{\xi }}_t^{\textbf{z}^{\text {slw}}}\), governed by the equations “frozen” at \(\textbf{z}^{\text {slw}}\):

$$\begin{aligned} \left\{ \begin{aligned}&\textrm{d}{{\varvec{\xi }}_t^{\textbf{z}^{\text {slw}}}}= \tilde{f}(\textbf{z}^{\text {slw}},{\varvec{\xi }}_t^{\textbf{z}^{\text {slw}}})\textrm{d}t+\tilde{F}(\textbf{z}^{\text {slw}},{\varvec{\xi }}_t^{\textbf{z}^{\text {slw}}})\textrm{d}V_t \\&{\varvec{\xi }}_0^{\textbf{z}^{\text {slw}}}={\varvec{\xi }}\end{aligned} \right. . \end{aligned}$$

As mentioned in Sect. 2, these reduced SDEs as in Eq. (2.3) may be viewed in intrinsic coordinates, or in Cartesian coordinates in the ambient space \(\mathbb {R}^D\), with \(\textbf{z}^{\text {slw}}_t\in \mathbb {R}^D\) but on \( {\mathcal {M}}_{{\epsilon }}\), \(b\in \mathbb {R}^D\) a vector field on \( {\mathcal {M}}_{{\epsilon }}\), and \(H\in \mathbb {R}^{D\times d}\) acting on a Wiener process \(U_t\) in \(\mathbb {R}^d\).

B: Algorithms

We provide here detailed pseudo-code for the construction of ATLAS, see Algorithms 2, 3, 4, 5, 6, 7. The algorithm follows closely the theory, with minor caveats, having to do with constants that are assumed known, while in practice they would either need to be estimated, or set by the user using external information.

We also discuss several minor modifications to the algorithms.

  1. 1.

    In algorithm 6, the neighbor landmarks of the current point \(\textbf{z}_t\), \(\mathcal {N}_\tau ^{\mathcal {A}}(\textbf{z}_t)\) are approximated by the nearest landmark and its neighbors, \(\{k_t, \mathcal {N}(k_t)\}\). Then we need an efficient method to search the nearest landmark \(k_{t+1}\) for the next point \(\textbf{z}_{t+1}\) in the “metric” \(\hat{\tilde{\rho }} \). we update the nearest landmark by only calculating the distance of \(\textbf{z}_{t+1}\) to the current nearest landmark and its neighbors, and repeat this procedure until the nearest landmark remains the same. This procedure avoids the global search which could be very expensive. Then when simulating ATLAS process, there is no need to check whether \(\Vert \textbf{z}-\hat{\textbf{z}}^{{l}}\Vert \le \hat{R}_{\max }\) when calculating \(\hat{\tilde{\rho }} (\textbf{z}_t, \hat{\textbf{z}}^{{l}})\), since the current point \(\textbf{z}_t\) is always close enough to the neighbor landmarks.

  2. 2.

    In algorithm 5 and algorithm 7 we say that when \(\hat{\rho } (\hat{\textbf{z}}^{{l}}, \hat{\textbf{z}}^{{k}})<d_{\text {con}} \), we add k to \(\mathcal {N}(l)\) and add l to \(\mathcal {N}(k)\): in practice, we relax this condition to \(\min (\hat{\tilde{\rho }} (\hat{\textbf{z}}^{{l}}, \hat{\textbf{z}}^{{k}}), \hat{\tilde{\rho }} (\hat{\textbf{z}}^{{k}}, \hat{\textbf{z}}^{{l}}))<d_{\text {con}} \), in order to include more landmarks. This slight modification is particularly useful when the condition that the process is diffusion-dominated (see assumption 5 in Appendix A) does not hold.

  3. 3.

    In algorithm 3 and algorithm 6, there is no need to explicitly calculate \(\hat{\Lambda }^{\mathcal {A}}(\textbf{z})\) since we can use the iterative algorithm to output top d singular values and corresponding singular vectors to estimate the diffusion coefficient \({\hat{H}^{\mathcal {A}}} _d(\textbf{z})\) with the cost of \(O(C^dDd^2)\) where \(C^d\) is the number of landmarks in \(\mathcal {N}_\tau ^{\mathcal {A}}(\textbf{z})\). In the \(d=1\) scenarios (i.e., oscillating half-moons and butane), the average number of landmarks in \(\mathcal {N}_\tau ^{\mathcal {A}}(\textbf{z})\) is approximated 4 and for the \(d=2\) example (Pinched Spheres), this average increases to about 8. However, if d is not relatively small, the number of landmarks in \(\mathcal {N}_\tau ^{\mathcal {A}}(\textbf{z})\) may grow exponentially as d, so it might be challenging to simulate trajectories with ATLAS simulator. When the ambient dimension D is very large, one could use randomized SVD in Eqs. (3.5, 3.13, 3.14) to further significantly lower the computational complexity in projection to rank d (Liberty et al. 2007). In all three models, we didn’t use this approach since the biggest D that we tested is 20 and the most time consuming part is not this projection step.

  4. 4.

    When refinement of the landmark is necessary in algorithm 4, we perform this only in the last round of refinement. The reason that we don’t use this correction at each round of refinement is it will consistently move the landmark position toward the direction of the effective drift and the region like around the saddle point will be not covered by the neighborhoods of landmarks.

Algorithm 2
figure c

Pseudo-code for ATLAS construction

Algorithm 3
figure d

Estimation of local parameters of effective dynamics [Sect.3.3]

Algorithm 4
figure e

Estimation of local geometric parameters of invariant manifold \( {\mathcal {M}}_{{\epsilon }}\) [Sect.3.4]

Algorithm 5
figure f

Construct sketch of the invariant manifold \( {\mathcal {M}}_{{\epsilon }}\), \(\{\hat{\textbf{z}}^{{l}}\}_{l=1}^{L'} \leftarrow \text {ConstructIM}(\{\hat{\textbf{z}}^{{l}}\}_{l=1}^{L}, \tau , d_{\text {con}})\) [Sect.3.4]

Algorithm 6
figure g

Simulate one time-step with ATLAS simulator, \([\textbf{z}_{t+1}, k_{t+1}] \leftarrow \text {ATLAStime-step}(\textbf{z}_t, k_t, \{\hat{\textbf{z}}^{{l}}\}_{l=1}^{L'}, \lambda )\) [Sect. 3.5]

Algorithm 7
figure h

Construct ATLAS simulator in exploration mode [Sect. 4]

Algorithm 8
figure i

Construct Markov state models from ATLAS [Sect. 6]

C: Examples

In this section we discuss in depth details and results of ATLAS in the numerical examples in Sect. 7. In Table 5 we report the parameters of the models and for the construction of ATLAS.

In our numerical tests on the accuracy of ATLAS for the approximation of invariant distribution, we proceed as follows. We will generate two sufficiently long trajectories with both the original and ATLAS simulator, and we take samples from each of the two trajectories. Then, we apply the projection \(\hat{P}^{{ l}} \) to the samples obtained from the original simulator. Here we only use the oblique projection which is consistent with the ATLAS simulation method, and does not require any knowledge of the latent space. For plotting purposes only, we visualize the smoothed histograms by binning the projected samples according to latent slow variables, or other specified coordinates, for both the original and ATLAS simulator. The \(L^1\)- and \(L^2\)-norm of the difference of their approximate probability densities are calculated directly from the histograms. The bin widths should be of order \(\sqrt{\tau }\), consistently with the spirit of ATLAS simulator, which averages information below timescale \(\tau \), with a constant coefficient depending on the scaling of the diffusion coefficient for the slow variable. If the latent slow variable is unknown, we could automatically cluster the samples by its nearest landmark.

In our numerical tests on the accuracy of ATLAS for the local statistics, we proceed as follows. We will generate a long trajectory with the ATLAS simulator and compare the estimated invariant manifold, estimated tangent space, estimated effective drift and diffusion terms at each point with the analytically derived reduced dynamics on the slow manifold.

In our numerical tests on the accuracy of ATLAS for the approximation of medium- and large-time observables, we proceed as follows. We will sample initial conditions with respect to the invariant measure restricted to the specified initial regions of state space, and obtain an estimate of the residence time in the target regions, with both simulators. To be specific, first, we generate a single sufficiently long trajectory with either the original or ATLAS simulator and uniformly sample \(N_{\text {IC}}\) initial conditions that are restricted to the corresponding initial regions (e.g., specified metastable states). Second, we run in parallel the dynamics with both simulators giving to each the same initial conditions, sampled above. We check whether a trajectory reaches the boundary of region at each ATLAS time-step for both simulators and record the residence time once they leave. This is to ensure the consistency of time-steps for both simulators in recording residence times. At last, we compute the mean residence time and its confidence interval for both simulators. To ensure the robustness of the ATLAS algorithm, we repeat the construction of ATLAS (which is random with its input, the observed bursts) and sampling of residence time ten times, and calculate the confidence interval of the relative error of the mean resident time.

It is nontrivial to identify the metastable regions in fast-slow stochastic systems when D is very large. With ATLAS one can easily build up Markov state models(MSMs) and construct the transition matrix for the effective reduced process. First, the number of landmarks naturally become associated to the states of MSMs; more precisely state l of the MSMs is the set of points on the invariant manifold whose nearest landmark, in the \(\hat{\tilde{\rho }} \), is \(\hat{\textbf{z}}^{{l}}\). Starting from the node of the same i-th landmark we use ATLAS simulator to simulate in parallel \(N_{\text {msm}}\) short trajectories of time \(\tau \), which is one ATLAS time-step. Let \(N_{ij}\) be the number of trajectories whose end positions land in the same state of the j-th landmark. The probability transition from i-th landmark to j-th landmark is estimated as \(M_{ij}:=N_{ij}/N\), see algorithm 8. The eigenvalues and eigenvectors are approximations to the spectrum and eigenfunctions of the transfer operator of the reduced stochastic system, \(\exp (\tau \mathcal {L})\), where \(\mathcal {L}\) is the generator of the reduced effective process. Since we don’t assume the reversibility, the spectrum and eigenvectors could be imaginary. The top left eigenvector of the transition matrix is proportional to the invariant distribution of reduced stochastic process and is always real. The spectral gap, which is the distance between the next dominant eigenvalue and the eigenvalue 1, indicates the decay rate of correlations and the reciprocal of the spectrum gap shows the order of ATLAS time-steps to reach the equilibrium. The number of the dominant eigenvalues are the number of metastable states. Performing some clustering or connectivity detection algorithms on landmarks with high equilibrium density can yield metastable regions. At least in the setting of real eigenvectors, the positive or negative regions of successive dominant eigenvectors can be used to identify metastable regions.

Table 5 Parameters of three examples

1.1 C. 1: Pinched sphere model

The governing slow–fast SDE in Cartesian coordinates becomes

$$\begin{aligned} \begin{aligned} \textrm{d}\textbf{z}&=\left( J(\textbf{z}) \cdot \textbf{b}_1(\textbf{z}) + \frac{1}{2}A(\textbf{z})\cdot \textbf{b}_2(\textbf{z})\right) \textrm{d}t + J(\textbf{z})\cdot \sigma (\textbf{z}) {\left[ \begin{array}{ccccccccccccccc}\textrm{d}W_1 \\ \textrm{d}W_2 \\ \textrm{d}W_3 \end{array}\right] }\end{aligned} \end{aligned}$$
(C.1)

where

$$\begin{aligned} \begin{aligned}&J= {\left[ \begin{array}{ccccccccccccccc}\frac{z_1}{\Vert \textbf{z}\Vert } &{} \frac{z_3z_1}{\sqrt{z_1^2 +z_2^2}} &{} -z_2 \\ \frac{z_2}{\Vert \textbf{z}\Vert } &{} \frac{ z_3z_2}{\sqrt{z_1^2 +z_2^2}} &{} z_1 \\ \frac{z_3}{\Vert \textbf{z}\Vert } &{} -\sqrt{z_1^2 +z_2^2 } &{} 0 \end{array}\right] },\\&\textbf{b}_1 = {\left[ \begin{array}{ccccccccccccccc}-\frac{c_1}{{\epsilon }} \frac{1-\sqrt{a_1 + a_2z_3^2/\Vert \textbf{z}\Vert ^2}}{\Vert \textbf{z}\Vert } \\ c_3\frac{4z_3^3/\Vert \textbf{z}\Vert ^3 -3z_3/\Vert \textbf{z}\Vert }{\sqrt{z_1^2+z_2^2}} \\ c_5 \left( \frac{z_2z_3}{\sqrt{z_1^2+z_2^2}\Vert \textbf{z}\Vert ^2}+\frac{z_1}{\Vert \textbf{z}\Vert ^2} \right) \end{array}\right] }, A = {\left[ \begin{array}{ccccccccccccccc}-z_1 &{} -z_1 \\ -z_2 &{} -z_2 \\ -z_3 &{} 0 \end{array}\right] }, \\&\textbf{b}_2 = {\left[ \begin{array}{ccccccccccccccc}\frac{c_4^2 (z_1^2+z_2^2)}{\Vert \textbf{z}\Vert ^4} \\ c_6^2/\Vert \textbf{z}\Vert ^2 \end{array}\right] }, \sigma = \text {diag}{\left[ \begin{array}{ccccccccccccccc}c_2/(\sqrt{{\epsilon }}\Vert \textbf{z}\Vert ) \\ \frac{c_4\sqrt{z_1^2+z_2^2}}{\Vert \textbf{z}\Vert ^2} \\ c_6/\Vert \textbf{z}\Vert \end{array}\right] }\end{aligned} \end{aligned}$$
(C.2)

Without the noise term, the deterministic counterpart of this system has two stable fixed points, \((\theta ^*, \phi ^*)=(\pi /6, 5\pi /6), (5\pi /6, \pi /6)\), which are marked in Fig. 4. Note under the current parameter settings, \(c_6\gg c_5\) and \(c_4\gg c_3\), the assumption of diffusion-dominated dynamics is satisfied. We claimed in the main text that this system is not reversible, and we demonstrate it here. The corresponding Fokker Planck equation of the governing equation in Cartesian coordinates is

$$\begin{aligned}&\partial _t p(t, \textbf{z}) = -\nabla \cdot \left[ (J(\textbf{z}) \textbf{b}_1(\textbf{z}) +\frac{1}{2}A(\textbf{z})\textbf{b}_2(\textbf{z})) p(t, \textbf{z})\right] +\nabla ^2 \left[ D(\textbf{z}) p(t, \textbf{z})\right] \,. \end{aligned}$$
(C.3)

where \(D(\textbf{z})= \frac{1}{2}(J(\textbf{z})\sigma ^2(\textbf{z})J^T(\textbf{z}))\). We transform the equation into the symmetric form

$$\begin{aligned}&\partial _t p(t, \textbf{z}) = -\nabla \cdot \left[ (J(\textbf{z}) \textbf{b}_1(\textbf{z}) +\frac{1}{2}A(\textbf{z})\textbf{b}_2(\textbf{z})-\nabla D(\textbf{z})) p(t, \textbf{z})\right] \nonumber \\ {}&\quad +\nabla \cdot \left[ D(\textbf{z}) \nabla p(t, \textbf{z})\right] \,. \end{aligned}$$
(C.4)

where \(\nabla D(\textbf{z}) = \left[ \sum _{j=1}^3\frac{\partial }{\partial z_j}D_{1j}(\textbf{z}), \sum _{j=1}^3\frac{\partial }{\partial z_j}D_{2j}(\textbf{z}), \sum _{j=1}^3\frac{\partial }{\partial z_j}D_{3j}(\textbf{z}) \right] ^T\).

The theorem 3.3.7 in Jiang et al. (2004) indicates that the system is reversible if and only if the force \(\textbf{b}_3(\textbf{z})=D^{-1}(\textbf{z})(J(\textbf{z}) b_1(\textbf{z}) +\frac{1}{2}A(\textbf{z})b_2(\textbf{z})-\nabla D(\textbf{z}))\) is a conservative vector field. We calculate the curl of \(\textbf{b}_3(\textbf{z})\) is

$$\begin{aligned}{} & {} \nabla \times \textbf{b}_3(\textbf{z}) \nonumber \\ {}{} & {} = {\left[ \begin{array}{ccccccccccccccc}z_1f_1(z_1,z_2)+z_2f_2(z_1,z_2,z_3),&z_2f_1(z_1,z_2)-z_1f_2(z_1,z_2,z_3),&\frac{2c_5z_1}{c_6^2 (z_1^2+z_2^2)} \end{array}\right] }^T\nonumber \\ \end{aligned}$$
(C.5)

where

$$\begin{aligned}{} & {} f_1(z_1,z_2) = -\frac{2c_5z_2}{c_6^2(z_1^2+z_2^2)^{3/2}}, \qquad f_2(z_1, z_2, z_3)= \frac{2c_3z_3(3(z_1^2+z_2^2)-z_3^2)}{c_4^2(z_1^2+z_2^2)^2 \Vert \textbf{z}\Vert }\nonumber \\ {}{} & {} - \frac{2a_2c_1z_3}{c_2^2\Vert \textbf{z}\Vert ^2 \sqrt{a_1+\frac{a_2z_3^2}{\Vert \textbf{z}\Vert ^2}}} \end{aligned}$$
(C.6)

The curl is not zero if all coefficients are positive. So this system is not reversible.

Identifying the slow manifold and the effective stochastic dynamics. The slow manifold \( {\mathcal {M}}^{\textbf{x}}_{0} \) in spherical coordinate is \(r^\star (\theta )=R(\theta )\) in the limit of \({\epsilon }\rightarrow 0\). Since Cartesian coordinates \(z_1, z_2, z_3\) are linear with the radius r, the slow manifold \( {\mathcal {M}}_{0} \) in Cartesian coordinates is \([R(\theta )\sin (\theta )\cos (\phi ), R(\theta )\sin (\theta )\sin (\phi ), R(\theta )\cos (\theta )]^T\) in the limit of \({\epsilon }\rightarrow 0\). In this example, \( {\mathcal {M}}_{0} \) is the image of \( {\mathcal {M}}^{\textbf{x}}_{0} \) under the coordinate transformation. It is uniformly asymptotically stable and it has negative and positive curvature at different points (with the chosen set of parameter \(a_1, a_2\)). The reduced dynamics (in the limit \({\epsilon }\rightarrow 0\)) on the slow manifold \( {\mathcal {M}}_{0}\), in Cartesian coordinates, is given as

$$\begin{aligned} \textrm{d}\textbf{z}= \textbf{b}^{\text {eff}}\textrm{d}t + H^{\text {eff}}{\left[ \begin{array}{ccccccccccccccc}\textrm{d}W_2 \\ \textrm{d}W_3 \end{array}\right] }\end{aligned}$$
(C.7)

where

$$\begin{aligned} \begin{aligned} \textbf{b}^{\text {eff}}&= {\left[ \begin{array}{ccccccccccccccc}\frac{\textrm{d}}{\textrm{d}\theta }(R(\theta )\sin (\theta )) \cos (\phi ) &{} -R(\theta )\sin (\theta )\sin (\phi ) \\ \frac{\textrm{d}}{\textrm{d}\theta }(R(\theta )\sin (\theta )) \sin (\phi ) &{} R(\theta )\sin (\theta )\cos (\phi ) \\ \frac{\textrm{d}}{\textrm{d}\theta }(R(\theta )\cos (\theta )) &{} 0 \end{array}\right] }\cdot {\left[ \begin{array}{ccccccccccccccc}\frac{c_3\cos (3\theta )}{R(\theta )\sin (\theta )} \\ \frac{c_5\sin (\phi +\theta )}{R(\theta )} \end{array}\right] }\\&\quad +\frac{1}{2} {\left[ \begin{array}{ccccccccccccccc}\frac{\textrm{d}^2 }{\textrm{d}\theta ^2}(R(\theta )\sin (\theta )) \frac{c_4^2\sin ^2(\theta )\cos (\phi )}{R^2(\theta )} - \frac{c^2_6 \sin (\theta )\cos (\phi ) }{R(\theta )} \\ \frac{\textrm{d}^2 }{\textrm{d}\theta ^2}(R(\theta )\sin (\theta )) \frac{c_4^2\sin ^2(\theta )\sin (\phi )}{R^2(\theta )} - \frac{c^2_6 \sin (\theta )\sin (\phi ) }{R(\theta )} \\ \frac{\textrm{d}^2 }{\textrm{d}\theta ^2}(R(\theta )\cos (\theta )) \frac{c_4^2\sin ^2(\theta )}{R^2(\theta )} \end{array}\right] }, \\ H^{\text {eff}}&= {\left[ \begin{array}{ccccccccccccccc}\frac{\textrm{d}}{\textrm{d}\theta }(R(\theta )\sin (\theta )) \cos (\phi ) &{} -R(\theta )\sin (\theta )\sin (\phi ) \\ \frac{\textrm{d}}{\textrm{d}\theta }(R(\theta )\sin (\theta )) \sin (\phi ) &{} R(\theta )\sin (\theta )\cos (\phi ) \\ \frac{\textrm{d}}{\textrm{d}\theta }(R(\theta )\cos (\theta )) &{} 0 \end{array}\right] }\cdot {\left[ \begin{array}{ccccccccccccccc}\frac{c_4\sin (\theta )}{R(\theta )} &{} 0 \\ 0 &{} \frac{c_6}{R(\theta )} \end{array}\right] }\end{aligned}\nonumber \\ \end{aligned}$$
(C.8)

and \(\theta = {{\,\textrm{arctan2}\,}}(\sqrt{z_1^2 +z_2^2}, z_3), \phi =\text {mod}({{\,\textrm{arctan2}\,}}(z_2, z_1),2\pi )\). This result is not exactly the effective dynamics (averaged at the finite timescale \(\tau \)) on the invariant manifold \( {\mathcal {M}}_{{\epsilon }}\), but it provides us some good reference. Here, and in the other numerical experiments, we will in fact consider this as ground truth, for measuring the accuracy of the ATLAS estimators of geometric objects (e.g., \( {\mathcal {M}}_{{\epsilon }}\)) and dynamics quantities (e.g., \(\textbf{b}\) and \(\Lambda \)). Note that when we measure the accuracy of other quantities such as mean residence times and accuracy of the stationary distributions, these approximations are not used, and the ATLAS estimates are compared directly with those obtained from the original simulator.

Estimating the relevant timescale \(\tau \). In Fig. 7 we visualize the behavior of \({\textrm{tr}}(\hat{C}_t^l)\) and \(\Vert \hat{\textbf{m}}_t^l\Vert \) as a function of time: we observe that they initially behave nonlinearly, but then transit to a linear regime, at time about 0.025, consistent with the approximations in Eq. (3.1). In this example, we choose \((\tau _{\min }, \tau _{\max })=(0.05, 0.10)\) and the timescale of separation \(\tau =0.10\).

Estimating dimension and tangent spaces of \( {\mathcal {M}}_{0}\), and direction of the fast modes. We all report in Fig. 7 the behavior of the singular values of \(\hat{C}_\tau ^l\), \(\hat{\Gamma }^l_{D-d}\) and \({\hat{\Lambda }^{{l}}_d}\) at one landmark in descending order. The dominant singular values are visualized with cross markers. Local Principal Component Analysis (PCA) corresponds to the analysis of \(\hat{C}_N(\tau )\): it exhibits 1 dominant singular value, with the corresponding singular vector close to the direction of fast variables, because the large fluctuations of the fast modes dominates. The other two singular vectors of \(\hat{C}_N(\tau )\), which are necessarily, in PCA, orthogonal to the leading mode, are not the directions of slow variables at most regions. The number of the dominant singular values of \(\hat{\Gamma }^l_{D-d}\) is 1 and its corresponding singular vector correctly estimates the direction of the fast variable. Finally, the number of the dominant singular values of \({\hat{\Lambda }^{{l}}_d}\) is 2, equal to the correct dimension of the slow manifold, and the span of their corresponding singular vectors correctly estimates the tangent space of the slow manifold. On average, it requires 1585 charts to fully describe the invariant manifold.

Identifying metastable states, and estimation of large-time properties of the process. We assume we do not know the metastable states, nor the number of such states. The top six eigenvalues of the Markov transition matrix (see Fig. 7) are real and they exhibit a clear spectral gap after the top two eigenvalues, which indicates, correctly, that this system has two metastable states. We simulate trajectories of time length \(10^6\) with the original simulator and with the ATLAS simulator, and from those we estimate the invariant distribution by using smoothed histograms with bins constructed in the latent coordinates \((\phi , \theta )\), which parametrize the invariant manifold. We visualize them in Fig. 7: from the contour plot of both distributions they appear very close, and indeed in the \(L^1\)-norm of the difference of two densities (i.e., the total variation distance between the distributions) is \( 0.107\pm 0.009\), and the \(L^2\)-norm of the difference of densities (a more robust but less stringent distance) is \(0.0034\pm 0.0004\).

Fig. 7
figure 7

Pinched sphere: Top right: plot of trace of empirical covariance and norm of empirical mean, \({\textrm{tr}}(\hat{C}_t^l)\) and \(\Vert \hat{\textbf{m}}_t^l\Vert \) verse time at one landmark. Top center: singular values of \(\hat{C}_\tau ^l, \hat{\Gamma }^{l}_{D-d}\) and \({\hat{\Lambda }^{{l}}_d}\) at one landmark, the dominant singular values are marked as cross labels. Top left: top six eigenvalues of the Markov transition matrix. Bottom left: the contour plot of kernel-fitted invariant distribution in the coordinate \((\phi , \theta )\) from the trajectory of time length \(10^6\) generated by the original simulator. Bottom center: the contour plot of kernel-fitted invariant distribution in the coordinate \((\phi , \theta )\) from the trajectory of time length \(10^6\) generated by the ATLAS simulator. Bottom right: A trajectory of time length 2.5\(\tau \) starting from the marked landmark is simulated with the original simulator is shown. We depict a portion of the key estimated objects: the invariant manifold \( {\mathcal {M}}_{{\epsilon }}\), drift at \(\textbf{z}_0\), fast direction, and normal direction. Note that the fast direction is far from the normal one, and the (Itô) drift is far from being tangent to \( {\mathcal {M}}_{{\epsilon }}\), and also that \( {\mathcal {M}}_{{\epsilon }}\) is locally a graph over the tangent plane at \(\textbf{z}_0\)

Estimation of residence times. In the stage of estimating residence times, we generate a single long trajectory of time length \(5 \times 10^{5}\) with the ATLAS simulator (this time length is much larger than the residence times to be estimated), from which we uniformly sample \(N_{\text {IC}}\) initial conditions that are in \(S_{\text {cyan}}=\{\varphi _2>0.05\}\) and \(S_{\text {red}}=\{\varphi _2<-0.05\}\). Here one can use either original simulator or ATLAS simulator, however, the ATLAS simulator is much faster. In this example, we test the medium and large residence time by defining the boundary of the residence set as \(\{\varphi _2>0.02\}\) and \(\{\varphi _2<-0.02\}\) in the first experiments, and, the metastable states \(M_1\) and \(M_2\) in the second experiment. We say that a point on invariant manifold is in a set if the closest landmark of the point is in the corresponding set (with this definition, the set is slightly different from the corresponding sublevel set or superlevel set of an eigenfunction).

1.2 C.2: Oscillating Half-Moons

Identifying the slow manifold and the effective stochastic dynamics. With the setting of parameter, our model is reversible since \(a_1=0\) (Ge and Qian 2012). In the latent variable space, the fast variables are \(r_1\) and \(u_i\) and the slow variable is \(\theta \). In the limit \({\epsilon }\rightarrow 0\), the fast variables relax to the equilibrium at \(r_1=1\) and \(u_i=0\) so the slow manifold \( {\mathcal {M}}^{\textbf{x}}_{0} \) in the latent variable is \(r_1=1\), which is the unit circle. The "local" invariant distribution of the fast variable \(r_1\) and \(u_i\) are

$$\begin{aligned}&\nu (r_1|\theta )=\sqrt{\frac{b_1}{b_2^2\pi }}\exp \left[ -\frac{b_1}{b_2^2}\left( r_1-1\right) ^2\right] ,\qquad \nu (u_i|\theta )=\sqrt{\frac{b_3}{b_4^2\pi }}\exp \left[ -\frac{b_3}{b_4^2}\left( u_i\right) ^2\right] . \end{aligned}$$
(C.9)

With the current parameter setup, there is very small probability that \(r_1\) can go to the negative side but we will ignore the possibility in our analytical calculations.

In the example, the fast variables are nonlinearly coupled in the observed Cartesian coordinates, so the slow manifold \( {\mathcal {M}}_{0} \) in Cartesian coordinates is not the image of \( {\mathcal {M}}^{\textbf{x}}_{0} \) under the coordinate transformation. In the limit \({\epsilon }\rightarrow 0\), the landmark position in Cartesian coordinate for given \(\theta \) is

$$\begin{aligned}&\bar{z}_1(\theta ):= \mathbb {E}(z_1|\theta )=\int _{-\infty }^{+\infty } r_1\cos (r_1+\theta -1)\nu (r_1|\theta ) \textrm{d}r_1 \nonumber \\ {}&\quad = \exp \left( -\frac{b_2^2}{4b_1}\right) \sqrt{1+\left( \frac{b_2^2}{2b_1}\right) ^2} \cos (\theta + \theta _s), \end{aligned}$$
(C.10)
$$\begin{aligned}&\bar{z}_2(\theta ):= \mathbb {E}(z_2|\theta )=\int _{-\infty }^{+\infty } r_1\sin (r_1+\theta -1) \nu (r_1|\theta ) \textrm{d}r_1 \nonumber \\ {}&\quad = \exp \left( -\frac{b_2^2}{4b_1}\right) \sqrt{1+\left( \frac{b_2^2}{2b_1}\right) ^2} \sin (\theta +\theta _s), \end{aligned}$$
(C.11)
$$\begin{aligned}&\bar{z}_i(\theta ) := \mathbb {E}(z_i|\theta ) = \int _{-\infty }^{+\infty }\left( \int _{-\infty }^{+\infty } \left( r_1 + u_i\right) \nu (r_1|\theta ) \textrm{d}r_1 \right) \nu (u_i|\theta )\textrm{d}u_i =1. \end{aligned}$$
(C.12)

where the angle shift \(\theta _s\) is \(\theta _s= \arctan \left( \frac{b_2^2}{2b_1}\right) \). Then the slow manifold \( {\mathcal {M}}_{0} \) is the circle embedded in \((\bar{z}_1, \bar{z}_2, 1, \dots , 1)\) with the radius \( \bar{r}=\exp \left( -\frac{b_2^2}{4b_1}\right) \sqrt{1+\left( \frac{b_2^2}{2b_1}\right) ^2}\) and the angle is shifted by \(\theta _s\) compared to the standard angle in Cartesian coordinates. This shift is due to the nonlinearity of the fast modes, in particularly their curvature. With current parameter setup, the radius is approximately \(\bar{r}=0.9925\) and \(\theta _s\) is approximately 0.0153.

The distance of the point from the slow manifold \( {\mathcal {M}}_{0} \), \(\text {dist}(\textbf{z}, {\mathcal {M}}_{0}) = \sqrt{(\sqrt{z_1^2+z_2^2}-\bar{r})^2+\sum _{i=3}^{20}(z_i-1)^2}\). In Cartesian coordinate, the effective dynamics of the first and second coordinate, \(\bar{z}_1, \bar{z}_2\) are

$$\begin{aligned} \textrm{d}{\left[ \begin{array}{ccccccccccccccc}\bar{z}_1(\theta ) \\ \bar{z}_2(\theta ) \end{array}\right] }&= \left( \left( a_1 +a_2 \sin (2\theta )+a_3\cos (\theta ) \right) {\left[ \begin{array}{ccccccccccccccc}-\bar{z}_2(\theta ) \\ \bar{z}_1(\theta ) \end{array}\right] }-\frac{a_4^2 }{2} {\left[ \begin{array}{ccccccccccccccc}\bar{z}_1(\theta ) \\ \bar{z}_2(\theta ) \end{array}\right] }\right) \textrm{d}t \nonumber \\ {}&\quad + a_4 {\left[ \begin{array}{ccccccccccccccc}-\bar{z}_2(\theta ) \\ \bar{z}_1(\theta ) \end{array}\right] }\textrm{d}W_t \end{aligned}$$
(C.13)

The effective dynamics on the other 18 coordinates has zero drift term and zero diffusion term.

Estimating the relevant timescale \(\tau \). In Fig. 8, \({\textrm{tr}}(\hat{C}_t^l)\) reaches the linear regime at \(t=0.5\), however, the norm of the empirical mean \(\Vert \hat{\textbf{m}}_t^l\Vert \) behaves linearly only after \(t=1.0\) (this is true also of the first and the second coordinates which are also TICA coordinates, \(\hat{m}_1^l, \hat{m}_2^l\)). This is consistent with the large fast modes, and high curvature of the slow manifold; the training interval that we choose is \([\tau _{\min }, \tau _{\max }]= [1.0, 1.25]\), and \(\tau =1.0\).

Estimating dimension and tangent spaces of \( {\mathcal {M}}_{0}\), and direction of the fast modes. We report in Fig. 8 the behavior of the singular values of \(\hat{C}_\tau ^l\), \(\hat{\Gamma }^l_{D-d}\) and \({\hat{\Lambda }^{{l}}_d}\) at one landmark in descending order. The dominant singular values are visualized with cross markers. Although the number of the dominant singular value of covariance matrix \(\hat{C}_\tau ^l\) and the diffusivity matrix \(\hat{\Lambda }^l\) are 1, the corresponding singular vector of \(\hat{C}_\tau ^l\) is not the slow direction, but close to fast direction, dooming a naïve approach based on local PCA. On the other hand, the dominant singular vector of \({\hat{\Lambda }^{{l}}_d}\) correctly estimates the tangent direction of the slow manifold. The number of the dominant singular values of covariance matrix \(\hat{\Gamma }^l_{D-d}\) is 1, which is the correct number of fast variables. The scatter plot of the samples from paths of a burst, at \(t=0.5, 1.0, 1.25\) in the global TICA coordinates \((z_1, z_2)\), visualizes how the data cloud is stretched out nonlinearly away from the slow manifold. Due to the relatively big curvature, the part of slow manifold that these data clouds covers at different time clearly show the nonlinear effects. A local linear approximation on dynamics and geometry might not be accurate at these scales, making the refinement of the landmark positions necessary, for one round of refinement as discussed in Sect. 3. This procedure significantly reduces the bias introduced by the local linear approximation. To further reduce the effect of the nonlinearity, we choose the effective timescale \(\tau \) as lower bound of the training interval. Here the scale of separation is large enough, multiple rounds of refinement are not necessary. On average, it requires 65 charts to fully describe the invariant manifold.

Identifying metastable states, and estimation of large-time properties of the process. From the top six eigenvalues of the transition matrix of a Markov state model constructed from ATLAS, we note the significant gap between the second and third eigenvalues, which correctly indicates the system has two metastable states. In this example, we explicitly provide the regions of the two metastable states, \(M_{\text {Left}}=\{\theta \in \left( -2\tan ^{-1}(4+\sqrt{15}),-2\tan ^{-1}(4-\sqrt{15}) \right) \}\) and \(M_{\text {Right}}=\{\theta \in \left( -2\tan ^{-1}(4-\sqrt{15}),-2\tan ^{-1}(4+\sqrt{15}) +2\pi \right) \}\) in the latent variable \(\theta \), that parametrizes the slow manifold.

We simulate trajectories of time \(8 \times 10^{6}\) with the original simulator and with the ATLAS simulator, and from those we plot the invariant distributions in the latent variable \(\theta \). The standard bin width is \(a_4\sqrt{\tau }=0.06\) and we use it for the plot and calculation of the difference of the two distributions, which are very close (see Fig. 5). The \(L^1\)-norm of the difference of two approximated densities is \(0.098\pm 0.006\) and \(L^2\)-norm of the difference is \(0.015\pm 0.001\). Another observation is that albeit ATLAS is only expected to be accurate for the standard bin width, we also attempted other smaller bin widths, all the way down to 0.0019, which is much smaller than \(a_4\sqrt{\tau }\), and observed no increase in the estimation error, thanks to the regularity of such distributions.

Estimation of residence times. In the stage of simulating residence time, we generate a single long trajectory of time \(2.4 \times 10^{6}\) by the original simulator and uniformly sample \(N_{\text {IC}}\) initial conditions in each metastable state. The boundary of the residence set is the same as the region of the metastable state. In this example, we say the point is in the residence set if its latent variable \(\theta \) is in the corresponding metastable state.

Fig. 8
figure 8

Oscillating half-moons: Top left: plot of trace of empirical covariance and norm of empirical mean, \({\textrm{tr}}(\hat{C}_t^l), \Vert \hat{\textbf{m}}_t^l\Vert \) verse time at one landmark. Top center: plot the first and second coordinates of the empirical mean \(\hat{\textbf{m}}_t^l\) verse time at one landmark. Top right: singular values in log scale of \(\hat{C}_\tau ^l, \hat{\Gamma }^{l}_{D-d}\) and \({\hat{\Lambda }^{{l}}_d}\) at one landmark, the dominant singular values are marked as cross labels. Bottom left: scatter of the burst in the \((z_1, z_2)\) coordinate at \(t=0.5, 1.0, 1.25\), together with the slow manifold, \(\sqrt{\tau }\)-neighborhood and dominant singular vectors of \(\hat{C}_\tau ^l, \hat{\Gamma }^{l}_{D-d}\). Bottom right: top six eigenvalues of the Markov transition matrix

1.3 C.3: Butane

Governing equations for butane dynamics. We consider the overdamped Langevin dynamics of the butane molecule. The positions are denoted as \(q^{i}\in \mathbb {R}^3\) for \(1\le i \le 4\). To remove the rigid body motion invariant, we set

$$\begin{aligned} q^1={\left[ \begin{array}{ccccccccccccccc}x_1 \\ y_1 \\ 0 \end{array}\right] }, q^2 ={\left[ \begin{array}{ccccccccccccccc}0 \\ 0\\ 0\end{array}\right] }, q^3 = {\left[ \begin{array}{ccccccccccccccc}0 \\ y_3 \\ 0 \end{array}\right] }, q^4 = {\left[ \begin{array}{ccccccccccccccc}x_4\\ y_4\\ z_4 \end{array}\right] }. \end{aligned}$$

The potential energy is given as

$$\begin{aligned} V = \sum _{i=1}^3 V_{\text {bond}}\left( \Vert q^{i+1}-q^i\Vert \right) + \sum _{i=1}^2 V_{\text {angle}}\left( \theta _i\right) + V_{\text {torsion}}(\phi ). \end{aligned}$$

where \(\theta _1, \theta _2\) are the angles formed by the three first atoms and the three last atoms respectively. \(\phi \) is the dihedral angle, i.e, the angle between the plane on which the first three atoms lay and the plane on which the three last atoms lay. These potential functions are \(V_{\text {bond}}( l)=\frac{k_2}{2}\left( l - l_{eq}\right) ^2, V_{\text {angle}}(\theta ) =\frac{k_3}{2}\left( \theta -\theta _{eq}\right) ^2\) and \(V_{\text {torsion}}(\phi ) = c_1\cos \phi + c_2\cos ^2\phi +c_3\cos ^3\phi \). The numerical values for the constants are those in Schappals et al. (2017). The overdamped Langevin dynamics on the state space \({\mathbb {R}}^6\) is

$$\begin{aligned} \textrm{d}\textbf{z}= -\nabla V(\textbf{z}) \textrm{d}t + \sigma \textrm{d}W_t \end{aligned}$$
(C.14)

where \(\textbf{z}= {\left[ \begin{array}{ccccccccccccccc}x_1&y_1&y_3&x_4&y_4&z_4 \end{array}\right] }^T\) and the diffusion coefficient is \(\sigma =\sqrt{2\beta ^{-1}}\). The dihedral angle \(\phi \) has the explicit form when \(x_1<0\),

$$\begin{aligned} \cos (\phi ) = \frac{(\textbf{v}_{43}\times \textbf{v}_{32}) \cdot (\textbf{v}_{12}\times \textbf{v}_{23})}{|\textbf{v}_{43}\times \textbf{v}_{32}| \cdot |\textbf{v}_{12}\times \textbf{v}_{23}|}=\frac{x_4}{\sqrt{x_4^2+z_4^2}}, \end{aligned}$$
(C.15)

where \(\textbf{v}_{ij}= q^j-q^i\). If we define the counterclockwise rotation as positive, then \(x_4= l\sin (\theta _{eq})\cos (\phi )\) and \(z_4= l\sin (\theta _{eq})\sin (\phi )\). If \(x_1<0, y_3>0\), the explicit form of the potential is

$$\begin{aligned}&V(x_1,y_1,y_3, x_4, y_4, z_4) = c_1\frac{x_4}{(x_4^2+z_4^2)^{1/2}} + c_2\frac{x_4^2}{x_4^2+z_4^2} +c_3\frac{x_4^3}{\left( x_4^2+z_4^2\right) ^{3/2}} \nonumber \\&+ \frac{1}{2}k_2 \left( \left( \sqrt{x_1^2 +y_1^2}- l\right) ^2 + \left( \sqrt{y_3^2}- l\right) ^2 +\left( \sqrt{x_4^2+z_4^2 + \left( y_3-y_4\right) ^2}- l\right) ^2\right) \nonumber \\&+\frac{1}{2}k_3\left( \left( \theta _{eq} - \arccos \left( \frac{y_1}{\sqrt{x_1^2+y_1^2}}\right) \right) ^2 \right. \nonumber \\ {}&\left. + \left( \theta _{eq}-\arccos \left( \frac{(y_3-y_4)}{\sqrt{x_4^2+(y_3-y_4)^2+z_4^2}}\right) \right) ^2\right) . \end{aligned}$$
(C.16)

Identifying the slow manifold and the effective stochastic dynamics. The dihedral angle \(\phi \) is usually chosen as the slow variable of the butane dynamics and we will have the slow manifold \( {\mathcal {M}}_{0} \), which is a circle embedded in \(\mathbb {R}^6\), if \(x_1>0, y_3<0\),

$$\begin{aligned}&{\mathcal {M}}_{0} =\left\{ {\left[ \begin{array}{ccccccccccccccc}- l\sin (\theta _{eq}),&l\cos (\theta _{eq}),&l,&x_4,&l- l\cos (\theta _{eq}),&z_4\end{array}\right] }^T \ \text {with}\ x_4^2 + z_4^2 = l^2\sin ^2(\theta _{eq}) \right\} \end{aligned}$$

The potential energy for the dihedral angle has three local minima at \(\phi =-2\pi /3\) (bot-cis), \(\phi =0\)(trans) and \(\phi =2\pi /3\) (top-cis). Butane dynamics is well known for its conformation isomerism and can be treated as the unimolecular reaction of three states. The trans conformer has lower energy than the cis conformer, so the trans state is more stable than the cis state. The distance of the point from the slow manifold \({\mathcal {M}}_0\) is calculated as follows,

$$\begin{aligned}&\text {dist}(\textbf{z}, {\mathcal {M}}_{0})\nonumber \\ {}&=\sqrt{ \left( x_1+ l\sin (\theta _{eq})\right) ^2 + \left( y_1 - l\cos (\theta _{eq})\right) ^2 +\left( y_3- l \right) ^2 +\left( y_4+ l\cos (\theta _{eq})- l \right) ^2 +\left( \sqrt{x_4^2 + z_4^2 } - l\sin (\theta _{eq})\right) ^2} \end{aligned}$$
(C.17)

The proposed effective stochastic dynamics of the dihedral angle \(\phi \) is

$$\begin{aligned} \textrm{d}\phi _t = -\nabla V_{\text {torsion}}(\phi _t) \textrm{d}t + \sqrt{2\beta ^{-1}}\sigma (\phi _t) \textrm{d}W_t \end{aligned}$$
(C.18)

As suggested in Legoll and Lelièvre (2012), the \(\sigma (\phi _t)\) is \(\sigma ^2(\phi _t) = \mathbb {E}\left( |\nabla \phi |^2(\textbf{y}) |\phi (\textbf{y})=\phi _t\right) \). In this case, \(\sigma (\phi _t)\) can be explicitly calculated here, \(\sigma (\phi _t) = \frac{1}{ l \sin (\theta _{eq})}\). Then in \(\mathbb {R}^6\), the explicit form of the effective stochastic dynamics of the fourth and sixth coordinates, \(x_4, z_4\) are

$$\begin{aligned} \partial _t x_4&= \left( -z_4^2 \frac{c_1\left( x_4^2+z_4^2\right) +2c_2x_4\sqrt{x_4^2+z_4^2} +3c_3x_4^2 }{(x_4^2+z_4^2)^{5/2}} - \frac{x_4}{\beta l^2 \sin ^2(\theta _{eq})}\right) \textrm{d}t -\frac{z_4 \sqrt{2\beta ^{-1}}}{l \sin (\theta _{eq})} \textrm{d}W_t, \nonumber \\ \partial _t z_4&= \left( x_4z_4 \frac{c_1\left( x_4^2+z_4^2\right) +2c_2x_4\sqrt{x_4^2+z_4^2} +3c_3x_4^2 }{(x_4^2+z_4^2)^{5/2}} - \frac{z_4}{\beta l^2 \sin ^2(\theta _{eq})}\right) \textrm{d}t +\frac{x_4 \sqrt{2\beta ^{-1}}}{l \sin (\theta _{eq})}\textrm{d}W_t. \end{aligned}$$
(C.19)

Other variables, \(x_1, y_1, y_3, y_4\) have no drift and no diffusion in the effective stochastic dynamics.

Estimating the relevant timescale \(\tau \). Based on the strength of the bond angle and the largest parameter in the torsion potential, the contribution from the bond and bond angle part will be quickly relaxed and the scale of separation is approximately a factor of 20, which is not very large, both in absolute terms and when compared to the other examples we considered. It is therefore necessary to perform multiple rounds of refinement to ensure the initial condition is close enough to invariant manifold. Initially, we use the time window \([\tau _{\min {}}, \tau _{\max {}}]=[4 \times 10^{-5}, 5 \times 10^{-5}]\) to learn the parameters in each landmark and proceeds with rounds of refinement until the relative differences of estimated parameters within 5% (this resulted in no more than 10 rounds), as discussed in Sect. 3. In Fig. 9, both \({\textrm{tr}}(\hat{C}_t^l)\) and \(\Vert \hat{\textbf{m}}_t^l\Vert \) reach the linear regime at \(t=5 \times 10^{-6}\), as well as the fourth and sixth coordinates of \(\hat{\textbf{m}}_t^l\), which are TICA coordinates. We thus ran a last round of refinement with the true training interval \([1 \times 10^{-5}, 1.5 \times 10^{-5}]\) and \(\tau =1 \times 10^{-5}\).

Estimating dimension and tangent spaces of \( {\mathcal {M}}_{0}\), and direction of the fast modes. We report in Fig. 9 the behavior of the singular values of \(\hat{C}_\tau ^l\), \(\hat{\Gamma }^l_{D-d}\) and \({\hat{\Lambda }^{{l}}_d}\) at one landmark in descending order. In this example, the dominant singular vector of covariance matrix \(\hat{C}_t^l\) matches with the one of the diffusivity matrix \({\hat{\Lambda }^{{l}}_d}\) and the first dominant singular vector of covariance matrix \(\hat{\Gamma }^l_{D-d}\) is almost orthogonal with the slow direction. There are 5 dominant singular values for the covariance matrix \(\hat{\Gamma }^l_{D-d}\), then it shows the fast variable has 5 dimensions. Similar to the oscillating half-moon example, the part of slow manifold that these data cloud covers clearly shows the nonlinear effect. Therefore, \(\tau \) is chosen from the lower bound of the training interval and the special procedure to modify the landmark during the last round of refinement is necessary. On average, it requires 77 charts to fully describe the invariant manifold.

Identifying metastable states, and estimation of large-time properties of the process. From the top six eigenvalues of the transition matrix of a Markov state model constructed from ATLAS, we observe a significant gap between the third and fourth eigenvalues, correctly indicating that the system has three metastable states. As in the half-moon example, we express the metastable states in the latent (dihedral, in this case) angle \(\phi \): trans:=\(\{\phi \in (-\frac{\pi }{3}, \frac{\pi }{3})\}\), top-cis:=\(\{\phi \in (\frac{\pi }{3},\pi )\}\) and bot-cis:=\(\{\phi \in (-\pi , -\frac{\pi }{3})\}\). We simulate trajectories of time \(5 \times 10^{2}\) with the original simulator and with the ATLAS simulator, and from those we plot both invariant distributions in the dihedral angle \(\phi \). The standard bin width is \(\sigma \sqrt{\tau }= 0.07\) and we use it for the plot and calculation of the difference of two distributions. The \(L^1\)-norm of the difference of two approximated densities is \( 0.060\pm 0.013\) and \(L^2\)-norm of the difference is \(0.013\pm 0.003\).

Estimation of residence times. In the stage of simulating residence time, we generate a single long trajectory of time \(5 \times 10^{2}\) by the original simulator and uniformly sample \(N_{\text {IC}}\) initial conditions in each metastable state. The boundary of the residence set is the same as the region of the metastable states. In this example, we say the point is in the residence set if its dihedral angle \(\phi \) is in the corresponding metastable state.

Fig. 9
figure 9

butane: Top left: plot of trace of empirical covariance and norm of empirical mean, \({\textrm{tr}}(\hat{C}_t^l)\) and \(\Vert \hat{\textbf{m}}_t^l\Vert \) verse time at one landmark. Top center: plot the fourth and sixth coordinates of the empirical mean \( \hat{\textbf{m}}_t^l\) verse time at one landmark. Top right: singular values of \(\hat{C}_\tau ^l, \hat{\Gamma }^{l}_{D-d}\) and \({\hat{\Lambda }^{{l}}_d}\) at one landmark, the dominant singular values are marked as cross labels. Bottom left: scatter of the burst in the \((x_4, z_4)\) coordinate at \(t=0.5 \times 10^{-5}, 1 \times 10^{-5}, 1.5 \times 10^{-5}\), together with the slow manifold, \(\sqrt{\tau }\)-neighborhood and dominant singular vectors of \(\hat{C}_\tau ^l, \hat{\Gamma }^{l}_{D-d}\). Bottom right: top six eigenvalues of the Markov transition matrix

D: Error Analysis

The equations of the relative error of Euclidean norm of the estimated drift term \(\hat{\textbf{b}}^{\mathcal {A}}(\textbf{z})\), estimated diffusivity matrix \(\hat{\Lambda }^{\mathcal {A}}(\textbf{z})\), and between the estimated and theoretical tangent space, are defined, respectively, as

$$\begin{aligned}&{\text {relErr}}({\hat{\textbf{b}}}) = \frac{\Vert \hat{\textbf{b}}^{\mathcal {A}}(\textbf{z}) - \textbf{b}(\textbf{z})\Vert }{\Vert \textbf{b}(\textbf{z})\Vert }\,,\,{\text {relErr}}({{\hat{\Lambda }}}) =\frac{\Vert \hat{\Lambda }^{\mathcal {A}}_d(\textbf{z}) - \Lambda (\textbf{z})\Vert }{\Vert \Lambda (\textbf{z})\Vert } \,,\,{\text {AbsErr}}({{\hat{T}_{\hat{\textbf{z}}^{{l}}}}}) \nonumber \\ {}&= \arccos \left( \left\| (\hat{U}^{{l,\textrm{slw}}}_{d})^T U^{{l,\textrm{slw}}}_{d} \right\| \right) \,. \end{aligned}$$
(D.1)

For the estimation error of the invariant manifold, as discussed above we compared points generated by long ATLAS trajectories, which lie on \( \hat{{\mathcal {M}}}_{{\epsilon }} \) by definition (of \( \hat{{\mathcal {M}}}_{{\epsilon }} \)) with points on the slow manifold obtained by averaging the equations in the limit as \({\epsilon }\rightarrow 0\); this corresponds to the following:

  • in the Pinched sphere example, from the calculations above we let \({\text {AbsErr}}({{ \hat{{\mathcal {M}}}_{{\epsilon }}}}) = \left| r-r^\star (\theta )\right| \).

  • In the oscillating half-moon example, we use the distance of the point to the slow manifold,

    $$\begin{aligned} {\text {AbsErr}}({{ \hat{{\mathcal {M}}}_{{\epsilon }}}}) = \sqrt{(\sqrt{z_1^2+z_2^2}-\bar{r})^2+\sum _{i=3}^{20}(z_i-1)^2} \end{aligned}$$
  • In the butane example, we use the distance of the point to the slow manifold,

    $$\begin{aligned}{} & {} {\text {AbsErr}}({{ \hat{{\mathcal {M}}}_{{\epsilon }}}}) \\ {}{} & {} =\sqrt{ \left( x_1+ l\sin (\theta _{eq})\right) ^2 + \left( y_1 - l\cos (\theta _{eq})\right) ^2 +\left( y_3- l \right) ^2 +\left( y_4+ l\cos (\theta _{eq})- l \right) ^2 +\left( \sqrt{x_4^2 + z_4^2 } - l\sin (\theta _{eq})\right) ^2}. \end{aligned}$$

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ye, F.XF., Yang, S. & Maggioni, M. Nonlinear Model Reduction for Slow–Fast Stochastic Systems Near Unknown Invariant Manifolds. J Nonlinear Sci 34, 22 (2024). https://doi.org/10.1007/s00332-023-09998-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00332-023-09998-8

Keywords

Navigation