[go: up one dir, main page]

 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (24)

Search Parameters:
Keywords = Langevin Monte Carlo

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 2523 KiB  
Article
Hyperspectral Image Denoising by Pixel-Wise Noise Modeling and TV-Oriented Deep Image Prior
by Lixuan Yi, Qian Zhao and Zongben Xu
Remote Sens. 2024, 16(15), 2694; https://doi.org/10.3390/rs16152694 - 23 Jul 2024
Viewed by 653
Abstract
Model-based hyperspectral image (HSI) denoising methods have attracted continuous attention in the past decades, due to their effectiveness and interpretability. In this work, we aim at advancing model-based HSI denoising, through sophisticated investigation for both the fidelity and regularization terms, or correspondingly noise [...] Read more.
Model-based hyperspectral image (HSI) denoising methods have attracted continuous attention in the past decades, due to their effectiveness and interpretability. In this work, we aim at advancing model-based HSI denoising, through sophisticated investigation for both the fidelity and regularization terms, or correspondingly noise and prior, by virtue of several recently developed techniques. Specifically, we formulate a novel unified probabilistic model for the HSI denoising task, within which the noise is assumed as pixel-wise non-independent and identically distributed (non-i.i.d) Gaussian predicted by a pre-trained neural network, and the prior for the HSI image is designed by incorporating the deep image prior (DIP) with total variation (TV) and spatio-spectral TV. To solve the resulted maximum a posteriori (MAP) estimation problem, we design a Monte Carlo Expectation–Maximization (MCEM) algorithm, in which the stochastic gradient Langevin dynamics (SGLD) method is used for computing the E-step, and the alternative direction method of multipliers (ADMM) is adopted for solving the optimization in the M-step. Experiments on both synthetic and real noisy HSI datasets have been conducted to verify the effectiveness of the proposed method. Full article
Show Figures

Figure 1

Figure 1
<p>Denoising results of competing methods on the <span class="html-italic">Beads</span> HSI in the <span class="html-italic">CAVE</span> dataset with noise of Case 1.</p>
Full article ">Figure 2
<p>Denoising results of competing methods on the <span class="html-italic">Superball</span> HSI in the <span class="html-italic">CAVE</span> dataset with noise of Case 2.</p>
Full article ">Figure 3
<p>Denoising results of competing methods on the <span class="html-italic">Stuffed Toy</span> HSI in the <span class="html-italic">CAVE</span> dataset with noise of Case 3.</p>
Full article ">Figure 4
<p>Denoising results of competing methods on the <span class="html-italic">Peppers</span> HSI in the <span class="html-italic">CAVE</span> dataset with noise of Case 4.</p>
Full article ">Figure 5
<p>Denoising results of competing methods on the <span class="html-italic">Pompoms</span> HSI in the <span class="html-italic">CAVE</span> dataset with noise of Case 5.</p>
Full article ">Figure 6
<p>Example denoising results of competing methods on the <span class="html-italic">ICVL</span> dataset with noise of Case 1.</p>
Full article ">Figure 7
<p>Example denoising results of competing methods on the <span class="html-italic">ICVL</span> dataset with noise of Case 2.</p>
Full article ">Figure 8
<p>Example denoising results of competing methods on the <span class="html-italic">ICVL</span> dataset with noise of Case 3.</p>
Full article ">Figure 9
<p>Example denoising results of competing methods on the <span class="html-italic">ICVL</span> dataset with noise of Case 4.</p>
Full article ">Figure 10
<p>Example denoising results of competing methods on the <span class="html-italic">ICVL</span> dataset with noise of Case 5.</p>
Full article ">Figure 11
<p>Denoising results of the <span class="html-italic">IndianPines</span> HSI.</p>
Full article ">Figure 12
<p>Denoising results of the <span class="html-italic">Urban</span> HSI.</p>
Full article ">Figure 13
<p>Spectral signatures of competing methods on the <span class="html-italic">IndianPines</span> HSI.</p>
Full article ">Figure 14
<p>Spectral signatures of competing methods on the <span class="html-italic">Urban</span> HSI.</p>
Full article ">Figure 15
<p>Empirical convergence behavior, in terms of PSNR, of the proposed method.</p>
Full article ">
16 pages, 3986 KiB  
Article
Accelerating Convergence of Langevin Dynamics via Adaptive Irreversible Perturbations
by Zhenqing Wu, Zhejun Huang, Sijin Wu, Ziying Yu, Liuxin Zhu and Lili Yang
Mathematics 2024, 12(1), 118; https://doi.org/10.3390/math12010118 - 29 Dec 2023
Viewed by 896
Abstract
Irreversible perturbations in Langevin dynamics have been widely recognized for their role in accelerating convergence in simulations of multi-modal distributions π(θ). A commonly used and easily computed standard irreversible perturbation is Jlogπ(θ), [...] Read more.
Irreversible perturbations in Langevin dynamics have been widely recognized for their role in accelerating convergence in simulations of multi-modal distributions π(θ). A commonly used and easily computed standard irreversible perturbation is Jlogπ(θ), where J is a skew-symmetric matrix. However, Langevin dynamics employing a fixed-scale standard irreversible perturbation encounter a trade-off between local exploitation and global exploration, associated with small and large scales of standard irreversible perturbation, respectively. To address this trade-off, we introduce the adaptive irreversible perturbations Langevin dynamics, where the scale of the standard irreversible perturbation changes adaptively. Through numerical examples, we demonstrate that adaptive irreversible perturbations in Langevin dynamics can enhance performance compared to fixed-scale irreversible perturbations. Full article
Show Figures

Figure 1

Figure 1
<p>The paths of a pair of Langevin dynamics, driven by two different temperatures, are depicted. The orange and red traces represent the lower and higher temperatures, respectively. The background corresponds to the contours of the objective function. It is worth noting that lines of the same color are separate from each other due to the swap operation, indicated by the dashed line.</p>
Full article ">Figure 2
<p>MSE, squared bias, and variance of the resulting estimator for <math display="inline"><semantics> <mrow> <msub> <mi>ϕ</mi> <mn>1</mn> </msub> <mrow> <mo>(</mo> <mi>μ</mi> <mo>,</mo> <mi>σ</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>μ</mi> <mo>+</mo> <mi>σ</mi> </mrow> </semantics></math>. Real gradients are computed.</p>
Full article ">Figure 3
<p>MSE, squared bias, and variance of the resulting estimator for <math display="inline"><semantics> <mrow> <msub> <mi>ϕ</mi> <mn>2</mn> </msub> <mrow> <mo>(</mo> <mi>μ</mi> <mo>,</mo> <mi>σ</mi> <mo>)</mo> </mrow> <mo>=</mo> <msup> <mi>μ</mi> <mn>2</mn> </msup> <mo>+</mo> <msup> <mi>σ</mi> <mn>2</mn> </msup> </mrow> </semantics></math>. Real gradients are computed.</p>
Full article ">Figure 4
<p>MSE, squared bias, and variance of the resulting estimator for <math display="inline"><semantics> <mrow> <msub> <mi>ϕ</mi> <mn>1</mn> </msub> <mrow> <mo>(</mo> <mi>μ</mi> <mo>,</mo> <mi>σ</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>μ</mi> <mo>+</mo> <mi>σ</mi> </mrow> </semantics></math>. Stochastic gradients are computed.</p>
Full article ">Figure 5
<p>MSE, squared bias, and variance of the resulting estimator for <math display="inline"><semantics> <mrow> <msub> <mi>ϕ</mi> <mn>2</mn> </msub> <mrow> <mo>(</mo> <mi>μ</mi> <mo>,</mo> <mi>σ</mi> <mo>)</mo> </mrow> <mo>=</mo> <msup> <mi>μ</mi> <mn>2</mn> </msup> <mo>+</mo> <msup> <mi>σ</mi> <mn>2</mn> </msup> </mrow> </semantics></math>. Stochastic gradients are computed.</p>
Full article ">Figure 6
<p>Empirical behavior on 25 two-dimensional Gaussian distributions. Stochastic gradients are computed.</p>
Full article ">Figure 7
<p>KL divergence of different sampling algorithms. Stochastic gradients are computed.</p>
Full article ">Figure 8
<p>Trajectory of various sampling algorithms. Stochastic gradients are computed.</p>
Full article ">Figure 9
<p>Empirical behavior on multi-modal distribution. Stochastic gradients are computed.</p>
Full article ">Figure 10
<p>KL divergence of different sampling algorithms. Stochastic gradients are computed.</p>
Full article ">Figure 11
<p>Log-loss calculated on test dataset for different algorithms. Stochastic gradients are computed.</p>
Full article ">
25 pages, 7834 KiB  
Review
Models for Simulation of Fractal-like Particle Clusters with Prescribed Fractal Dimension
by Oleksandr Tomchuk
Fractal Fract. 2023, 7(12), 866; https://doi.org/10.3390/fractalfract7120866 - 5 Dec 2023
Cited by 4 | Viewed by 1962
Abstract
This review article delves into the growing recognition of fractal structures in mesoscale phenomena. The article highlights the significance of realistic fractal-like aggregate models and efficient modeling codes for comparing data from diverse experimental findings and computational techniques. Specifically, the article discusses the [...] Read more.
This review article delves into the growing recognition of fractal structures in mesoscale phenomena. The article highlights the significance of realistic fractal-like aggregate models and efficient modeling codes for comparing data from diverse experimental findings and computational techniques. Specifically, the article discusses the current state of fractal aggregate modeling, with a focus on particle clusters that possess adjustable fractal dimensions (Df). The study emphasizes the suitability of different models for various Df–intervals, taking into account factors such as particle size, fractal prefactor, the polydispersity of structural units, and interaction potential. Through an analysis of existing models, this review aims to identify key similarities and differences and offer insights into future developments in colloidal science and related fields. Full article
(This article belongs to the Section Mathematical Physics)
Show Figures

Figure 1

Figure 1
<p>Physico-chemical properties affected by the features of the fractal structure of nanoparticle clusters.</p>
Full article ">Figure 2
<p>Visualization of the principal question of the review: how can a cluster of particles with a predetermined fractal dimension be assembled?</p>
Full article ">Figure 3
<p>Simulation of aggregates using morphological model of Ferri et al. [<a href="#B87-fractalfract-07-00866" class="html-bibr">87</a>]: (<b>a</b>) platelets of size ratio of 20:8:3 and compactness parameters <math display="inline"><semantics> <mrow> <mi>α</mi> <mo>=</mo> <mn>0.1</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>β</mi> <mo>=</mo> <mn>0.05</mn> </mrow> </semantics></math>; (<b>b</b>) cylinders with height/diameter ratio of 5, <math display="inline"><semantics> <mrow> <mi>α</mi> <mo>=</mo> <mn>0</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>β</mi> <mo>=</mo> <mn>0</mn> </mrow> </semantics></math>; (<b>c</b>) spheres, <math display="inline"><semantics> <mrow> <mi>α</mi> <mo>=</mo> <mn>0.1</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>β</mi> <mo>=</mo> <mn>0.05</mn> </mrow> </semantics></math>.</p>
Full article ">Figure 4
<p>Three-dimensional DLCA clusters of almost 6000 particles with no readjusting rotations (<b>a</b>) and with one (<b>b</b>), two (<b>c</b>) or three (<b>d</b>) readjusting rotations after the initial contact. The four clusters are shown on the same scale. Adapted with permission from ref. [<a href="#B52-fractalfract-07-00866" class="html-bibr">52</a>]. 1988, AIP Publishing.</p>
Full article ">Figure 5
<p>Three-dimensional BLCA clusters of 2<sup>14</sup> particles with no readjusting rotations (<b>a</b>) and after first (<b>b</b>), second (<b>c</b>) and third (<b>d</b>) stages of restructuring, respectively. The four clusters are shown on the same scale. Adapted with permission from ref. [<a href="#B95-fractalfract-07-00866" class="html-bibr">95</a>]. 1988, Elsevier.</p>
Full article ">Figure 6
<p>DLA- (<b>a</b>,<b>b</b>) and BLA-based (<b>c</b>,<b>d</b>) aggregates with varying effective aggregation radius, <math display="inline"><semantics> <mrow> <mi>Λ</mi> </mrow> </semantics></math>, and a fractal dimension of 1.31 (<b>a</b>,<b>c</b>) and 1.51 (<b>b</b>,<b>d</b>), correspondingly [<a href="#B106-fractalfract-07-00866" class="html-bibr">106</a>].</p>
Full article ">Figure 7
<p>Representative clusters formed from the same initial configuration by the attractive (AI), repulsive (RI), and non-interacting particles (NI) with and without (DLCA) rotational diffusion [<a href="#B90-fractalfract-07-00866" class="html-bibr">90</a>].</p>
Full article ">Figure 8
<p>Typical aggregates of 1000 particles generated using the Porous Eden Model with increasing inactivation parameter, <span class="html-italic">p</span>. Adapted with permission from ref. [<a href="#B112-fractalfract-07-00866" class="html-bibr">112</a>]. 2019, Elsevier.</p>
Full article ">Figure 9
<p>Typical particle–cluster fractal-like aggregates of monodisperse (<b>a</b>–<b>c</b>) and polydisperse (<b>d</b>–<b>f</b>) particles. (<b>a</b>) Sequential algorithm by Filippov et al., <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>D</mi> </mrow> <mrow> <mi>f</mi> </mrow> </msub> <mo>=</mo> <mn>1.8</mn> </mrow> </semantics></math>. Adapted with permission from Ref. [<a href="#B113-fractalfract-07-00866" class="html-bibr">113</a>]. 2000, Elsevier. (<b>b</b>) Sequential algorithm by Mackowski, <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>D</mi> </mrow> <mrow> <mi>f</mi> </mrow> </msub> <mo>=</mo> <mn>1.9</mn> </mrow> </semantics></math>. Adapted with permission from Ref. [<a href="#B115-fractalfract-07-00866" class="html-bibr">115</a>] © The Optical Society. (<b>c</b>) Tunable Sequential Aggregation-based model, <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>D</mi> </mrow> <mrow> <mi>f</mi> </mrow> </msub> <mo>=</mo> <mn>2.81</mn> </mrow> </semantics></math>. Adapted with permission from ref. [<a href="#B116-fractalfract-07-00866" class="html-bibr">116</a>]. 2020, Elsevier. (<b>d</b>) Polydisperse sequential algorithm by Dolotov et al., <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>D</mi> </mrow> <mrow> <mi>f</mi> </mrow> </msub> <mo>=</mo> <mn>1.9</mn> </mrow> </semantics></math>. Adapted with permission from Ref. [<a href="#B117-fractalfract-07-00866" class="html-bibr">117</a>]. 2007, Springer Nature. (<b>e</b>) Polydisperse sequential algorithm by Tomchuk et al., <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>D</mi> </mrow> <mrow> <mi>f</mi> </mrow> </msub> <mo>=</mo> <mn>2.4</mn> </mrow> </semantics></math>. Adapted with permission from ref. [<a href="#B118-fractalfract-07-00866" class="html-bibr">118</a>]. 2015, American Chemical Society. (<b>f</b>) The modified polydisperse Tunable Sequential Aggregation model, <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>D</mi> </mrow> <mrow> <mi>f</mi> </mrow> </msub> <mo>=</mo> <mn>2.5</mn> </mrow> </semantics></math> [<a href="#B119-fractalfract-07-00866" class="html-bibr">119</a>].</p>
Full article ">Figure 10
<p>On-lattice aggregates containing <math display="inline"><semantics> <mrow> <msup> <mrow> <mn>2</mn> </mrow> <mrow> <mn>14</mn> </mrow> </msup> </mrow> </semantics></math> particles each, with fractal dimension <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>D</mi> </mrow> <mrow> <mi>f</mi> </mrow> </msub> <mo>=</mo> <mn>1.5</mn> </mrow> </semantics></math> (<b>a</b>), <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>D</mi> </mrow> <mrow> <mi>f</mi> </mrow> </msub> <mo>=</mo> <mn>2.0</mn> </mrow> </semantics></math> (<b>b</b>), and <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>D</mi> </mrow> <mrow> <mi>f</mi> </mrow> </msub> <mo>=</mo> <mn>2.5</mn> </mrow> </semantics></math> (<b>c</b>) generated with tunable dimension cluster–cluster aggregation model. Adapted with permission from ref. [<a href="#B139-fractalfract-07-00866" class="html-bibr">139</a>]. 1998, Elsevier.</p>
Full article ">Figure 11
<p>(<b>a</b>–<b>d</b>) Examples of aggregates of <math display="inline"><semantics> <mrow> <msup> <mrow> <mn>10</mn> </mrow> <mrow> <mn>4</mn> </mrow> </msup> </mrow> </semantics></math> particles with fractal dimension <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>D</mi> </mrow> <mrow> <mi>f</mi> </mrow> </msub> <mo>=</mo> <mn>2.2</mn> </mrow> </semantics></math> (<b>a</b>), <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>D</mi> </mrow> <mrow> <mi>f</mi> </mrow> </msub> <mo>=</mo> <mn>2.5</mn> </mrow> </semantics></math> (<b>b</b>), <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>D</mi> </mrow> <mrow> <mi>f</mi> </mrow> </msub> <mo>=</mo> <mn>2.8</mn> </mrow> </semantics></math> (<b>c</b>), and <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>D</mi> </mrow> <mrow> <mi>f</mi> </mrow> </msub> <mo>=</mo> <mn>3.0</mn> </mrow> </semantics></math> (<b>d</b>) obtained with the modified tdCCA model and the following densification via Voronoi tessellation [<a href="#B122-fractalfract-07-00866" class="html-bibr">122</a>]. (<b>e</b>) An aggregate of 724 particles with fractal dimension <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>D</mi> </mrow> <mrow> <mi>f</mi> </mrow> </msub> <mo>=</mo> <mn>1.8</mn> </mrow> </semantics></math> generated using the tunable CCA algorithm. Adapted with permission from ref. [<a href="#B113-fractalfract-07-00866" class="html-bibr">113</a>]. 2000, Elsevier.</p>
Full article ">Figure 12
<p>(<b>a</b>) Typical aggregates generated by FracVAL with monodisperse (<b>a</b>) and polydisperse (<b>b</b>) primary particles. Adapted with permission from ref. [<a href="#B141-fractalfract-07-00866" class="html-bibr">141</a>]. 2019, Elsevier.</p>
Full article ">Figure 13
<p>Examples of fractal clusters of 1000 polydisperse particles (log-normal distribution, polydispersity 10%) simulated by the generalized algorithm with different fractal dimensions: <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>D</mi> </mrow> <mrow> <mi>f</mi> </mrow> </msub> <mo>=</mo> <mn>1.5</mn> </mrow> </semantics></math> (<b>a</b>), <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>D</mi> </mrow> <mrow> <mi>f</mi> </mrow> </msub> <mo>=</mo> <mn>2.0</mn> </mrow> </semantics></math> (<b>b</b>), <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>D</mi> </mrow> <mrow> <mi>f</mi> </mrow> </msub> <mo>=</mo> <mn>2.5</mn> </mrow> </semantics></math> (<b>c</b>). Adapted with permission from ref. [<a href="#B121-fractalfract-07-00866" class="html-bibr">121</a>]. 2020, Elsevier.</p>
Full article ">Figure 14
<p>Model stochastic fractal chains with various fractal dimensions: <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>D</mi> </mrow> <mrow> <mi>f</mi> </mrow> </msub> <mo>=</mo> <mn>2.4</mn> </mrow> </semantics></math> (<b>a</b>) [<a href="#B143-fractalfract-07-00866" class="html-bibr">143</a>], <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>D</mi> </mrow> <mrow> <mi>f</mi> </mrow> </msub> <mo>=</mo> <mn>1.1</mn> </mrow> </semantics></math> (<b>b</b>), <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>D</mi> </mrow> <mrow> <mi>f</mi> </mrow> </msub> <mo>=</mo> <mn>2.0</mn> </mrow> </semantics></math> (<b>c</b>), <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>D</mi> </mrow> <mrow> <mi>f</mi> </mrow> </msub> <mo>=</mo> <mn>2.9</mn> </mrow> </semantics></math> (<b>d</b>). Adapted with permission from ref. [<a href="#B144-fractalfract-07-00866" class="html-bibr">144</a>]. 2019, AIP Publishing.</p>
Full article ">Figure 15
<p>(<b>a</b>) Möbius fractal for <math display="inline"><semantics> <mrow> <mi>N</mi> <mo>=</mo> <mn>5000</mn> </mrow> </semantics></math> unique vertices. Adapted with permission from ref. [<a href="#B146-fractalfract-07-00866" class="html-bibr">146</a>]. 2021, AIP Publishing. (<b>b</b>) An illustration of the three-dimensional Yang and Wang model. A single cell of the cube processed is shown to explain the algorithm. Adapted with permission from ref. [<a href="#B148-fractalfract-07-00866" class="html-bibr">148</a>]. 2015, Elsevier.</p>
Full article ">Figure 16
<p>Dynamic Lattice Liquid structures at the same scale depending on the initial concentration. Initial concentration, fractal dimension, and growing time are indicated in the figures. Adapted with permission from ref. [<a href="#B150-fractalfract-07-00866" class="html-bibr">150</a>]. 2007, Elsevier.</p>
Full article ">
27 pages, 536 KiB  
Article
Convergence Rates for the Constrained Sampling via Langevin Monte Carlo
by Yuanzheng Zhu
Entropy 2023, 25(8), 1234; https://doi.org/10.3390/e25081234 - 18 Aug 2023
Viewed by 1414
Abstract
Sampling from constrained distributions has posed significant challenges in terms of algorithmic design and non-asymptotic analysis, which are frequently encountered in statistical and machine-learning models. In this study, we propose three sampling algorithms based on Langevin Monte Carlo with the Metropolis–Hastings steps to [...] Read more.
Sampling from constrained distributions has posed significant challenges in terms of algorithmic design and non-asymptotic analysis, which are frequently encountered in statistical and machine-learning models. In this study, we propose three sampling algorithms based on Langevin Monte Carlo with the Metropolis–Hastings steps to handle the distribution constrained within some convex body. We present a rigorous analysis of the corresponding Markov chains and derive non-asymptotic upper bounds on the convergence rates of these algorithms in total variation distance. Our results demonstrate that the sampling algorithm, enhanced with the Metropolis–Hastings steps, offers an effective solution for tackling some constrained sampling problems. The numerical experiments are conducted to compare our methods with several competing algorithms without the Metropolis–Hastings steps, and the results further support our theoretical findings. Full article
(This article belongs to the Collection Advances in Applied Statistical Mechanics)
Show Figures

Figure 1

Figure 1
<p>The trace graphs of <math display="inline"><semantics> <msub> <mi>x</mi> <mn>1</mn> </msub> </semantics></math> of the Markov chain determined by the four sampling algorithms.</p>
Full article ">Figure 2
<p>The densities of <math display="inline"><semantics> <msub> <mi>x</mi> <mn>1</mn> </msub> </semantics></math> of the Markov chain determined by the four sampling algorithms.</p>
Full article ">Figure 3
<p>Approximate mixing time with respect to dimension and error tolerance of Algorithm 2. (<b>a</b>) Dimension dependence for fixed error tolerance. (<b>b</b>) Error tolerance dependence for fixed dimension.</p>
Full article ">Figure 4
<p>Approximate mixing time with respect to dimension and error tolerance dependence of the four sampling algorithms. (<b>a</b>) Dimension dependence for fixed error tolerance. (<b>b</b>) Error tolerance dependence for fixed dimension.</p>
Full article ">Figure 5
<p>Bayesian regularized regression via Algorithm 3, where distinct colors represent various trajectories of parameter estimates for distinct variables. (<b>a</b>) <math display="inline"><semantics> <msub> <mi>L</mi> <mn>1</mn> </msub> </semantics></math>—norm-constraint. (<b>b</b>) <math display="inline"><semantics> <msub> <mi>L</mi> <mrow> <mn>1.5</mn> </mrow> </msub> </semantics></math>—norm-constraint. (<b>c</b>) <math display="inline"><semantics> <msub> <mi>L</mi> <mn>2</mn> </msub> </semantics></math>—norm-constraint.</p>
Full article ">
28 pages, 1583 KiB  
Article
Ornstein–Uhlenbeck Process on Three-Dimensional Comb under Stochastic Resetting
by Pece Trajanovski, Petar Jolakoski, Ljupco Kocarev and Trifce Sandev
Mathematics 2023, 11(16), 3576; https://doi.org/10.3390/math11163576 - 18 Aug 2023
Viewed by 974
Abstract
The Ornstein–Uhlenbeck (O-U) process with resetting is considered as the anomalous transport taking place on a three-dimensional comb. The three-dimensional comb is a comb inside a comb structure, consisting of backbones and fingers in the following geometrical correspondence x–backbone →y–fingers–backbone [...] Read more.
The Ornstein–Uhlenbeck (O-U) process with resetting is considered as the anomalous transport taking place on a three-dimensional comb. The three-dimensional comb is a comb inside a comb structure, consisting of backbones and fingers in the following geometrical correspondence x–backbone →y–fingers–backbone →z–fingers. Realisation of the O-U process on the three-dimensional comb leads to anomalous (non-Markovian) diffusion. This specific anomalous transport in the presence of resets results in non-equilibrium stationary states. Explicit analytical expressions for the mean values and the mean squared displacements along all three directions of the comb are obtained and verified numerically. The marginal probability density functions for each direction are obtained numerically by Monte Carlo simulation of a random transport described by a system of coupled Langevin equations for the comb geometry. Full article
(This article belongs to the Section Mathematical Physics)
Show Figures

Figure 1

Figure 1
<p>Three-dimensional comb structure with illustrated diffusive movement (orange balls and arrows, simulating movement of the particle) in its structural features, comprised of the backbone (<span class="html-italic">x</span>-axis), main fingers (<span class="html-italic">y</span>-axis), and secondary fingers (<span class="html-italic">z</span>-axis), and depicting the respective processes: Brownian motion (BM) and Ornstein–Uhlenbeck (O-U) in these directions.</p>
Full article ">Figure 2
<p>Visual representation explaining the rationale behind the inclusion of a factor of 2 in <math display="inline"><semantics> <mrow> <msub> <mi>σ</mi> <mi>δ</mi> </msub> <mo>=</mo> <mn>2</mn> <msubsup> <mi>σ</mi> <mi>δ</mi> <mo>′</mo> </msubsup> </mrow> </semantics></math>, where <math display="inline"><semantics> <msubsup> <mi>σ</mi> <mi>δ</mi> <mo>′</mo> </msubsup> </semantics></math> is defined in Ref. [<a href="#B62-mathematics-11-03576" class="html-bibr">62</a>]. The red arrows and points illustrate the consecutive movements of the particle in the comb structure.</p>
Full article ">Figure 3
<p>Typical trajectories of the particles on the backbone (<b>a</b>), the main fingers (<b>b</b>), and secondary fingers (<b>c</b>) according to the Langevin Equations (<a href="#FD43-mathematics-11-03576" class="html-disp-formula">43</a>)–(<a href="#FD45-mathematics-11-03576" class="html-disp-formula">45</a>), where the black dots stand for the resetting events and the shaded and white areas represent the time between two consecutive resetting events, for <math display="inline"><semantics> <mrow> <mo>Δ</mo> <mi>t</mi> <mo>=</mo> <mn>0.01</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>r</mi> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>μ</mi> <mo>=</mo> <mn>3</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi>σ</mi> <mi>x</mi> </msub> <mo>=</mo> <msub> <mi>σ</mi> <mi>y</mi> </msub> <mo>=</mo> <msub> <mi>σ</mi> <mi>z</mi> </msub> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi>x</mi> <mn>0</mn> </msub> <mo>=</mo> <mn>0</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi>λ</mi> <mi>x</mi> </msub> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi>λ</mi> <mi>y</mi> </msub> <mo>=</mo> <mn>0.001</mn> </mrow> </semantics></math>.</p>
Full article ">Figure 4
<p>Typical trajectories of the particle on the backbone (<b>a</b>), the main fingers (<b>b</b>), and secondary fingers (<b>c</b>) according to the Langevin Equations (<a href="#FD57-mathematics-11-03576" class="html-disp-formula">57</a>)–(<a href="#FD59-mathematics-11-03576" class="html-disp-formula">59</a>), where the black dots stand for the resetting events and the shaded and white areas represent the time between two consecutive resetting events, for <math display="inline"><semantics> <mrow> <mo>Δ</mo> <mi>t</mi> <mo>=</mo> <mn>0.01</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>r</mi> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>μ</mi> <mo>=</mo> <mn>3</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi>σ</mi> <mi>x</mi> </msub> <mo>=</mo> <msub> <mi>σ</mi> <mi>y</mi> </msub> <mo>=</mo> <msub> <mi>σ</mi> <mi>z</mi> </msub> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi>x</mi> <mn>0</mn> </msub> <mo>=</mo> <mn>0</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi>λ</mi> <mi>x</mi> </msub> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi>λ</mi> <mi>y</mi> </msub> <mo>=</mo> <mn>0.001</mn> </mrow> </semantics></math>.</p>
Full article ">Figure 5
<p>Typical trajectories of the particle on the backbone (<b>a</b>), the main fingers (<b>b</b>), and secondary fingers (<b>c</b>) according to the Langevin Equations (<a href="#FD71-mathematics-11-03576" class="html-disp-formula">71</a>)–(<a href="#FD73-mathematics-11-03576" class="html-disp-formula">73</a>), where the black dots stand for the resetting events and the shaded and white areas represent the time between two consecutive resetting events, for <math display="inline"><semantics> <mrow> <mo>Δ</mo> <mi>t</mi> <mo>=</mo> <mn>0.01</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>r</mi> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>μ</mi> <mo>=</mo> <mn>3</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi>σ</mi> <mi>x</mi> </msub> <mo>=</mo> <msub> <mi>σ</mi> <mi>y</mi> </msub> <mo>=</mo> <msub> <mi>σ</mi> <mi>z</mi> </msub> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi>x</mi> <mn>0</mn> </msub> <mo>=</mo> <mn>0</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi>λ</mi> <mi>x</mi> </msub> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi>λ</mi> <mi>y</mi> </msub> <mo>=</mo> <mn>0.001</mn> </mrow> </semantics></math>.</p>
Full article ">Figure 6
<p>(<b>a</b>) Analytical results (<a href="#FD42-mathematics-11-03576" class="html-disp-formula">42</a>) (solid lines) and Monte Carlo simulations (<a href="#FD43-mathematics-11-03576" class="html-disp-formula">43</a>)–(<a href="#FD45-mathematics-11-03576" class="html-disp-formula">45</a>) (markers) for the MSDs in the case of global resetting; (<b>b</b>) Simulations for the PDFs with different resetting rates of the global resetting to the initial position; (<b>c</b>) Analytical results (<a href="#FD56-mathematics-11-03576" class="html-disp-formula">56</a>) (solid lines) and Monte Carlo simulations (<a href="#FD57-mathematics-11-03576" class="html-disp-formula">57</a>)–(<a href="#FD59-mathematics-11-03576" class="html-disp-formula">59</a>) (markers) for the MSDs in the case of resetting to the backbone of the comb; (<b>d</b>) Simulations for the PDFs with different resetting rates of the resetting mechanism to the backbone of the comb; (<b>e</b>) Analytical results (<a href="#FD70-mathematics-11-03576" class="html-disp-formula">70</a>) (solid lines) and Monte Carlo simulations (<a href="#FD71-mathematics-11-03576" class="html-disp-formula">71</a>)–(<a href="#FD73-mathematics-11-03576" class="html-disp-formula">73</a>) (markers) for the MSDs in the case of resetting to the main fingers; (<b>f</b>) Simulations for the PDFs with different resetting rates of the resetting mechanism to the main fingers of the comb. For the analytical results, we use the Stehfest numerical inverse Laplace transform in MATHEMATICA [<a href="#B64-mathematics-11-03576" class="html-bibr">64</a>]. We set <math display="inline"><semantics> <mrow> <msub> <mi>x</mi> <mn>0</mn> </msub> <mo>=</mo> <mn>0</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi>σ</mi> <mi>x</mi> </msub> <mo>=</mo> <msub> <mi>σ</mi> <mi>y</mi> </msub> <mo>=</mo> <msub> <mi>σ</mi> <mi>z</mi> </msub> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>μ</mi> <mo>=</mo> <mn>3</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi>λ</mi> <mi>x</mi> </msub> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi>λ</mi> <mi>y</mi> </msub> <mo>=</mo> <mn>0.001</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>d</mi> <mi>t</mi> <mo>=</mo> <mn>0.001</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>N</mi> <mo>=</mo> <msup> <mn>10</mn> <mn>3</mn> </msup> <mo>,</mo> <mi>t</mi> <mo>=</mo> <msup> <mn>10</mn> <mn>3</mn> </msup> </mrow> </semantics></math>.</p>
Full article ">Figure 7
<p>Analytical results for the MSD along the backbone (<a href="#FD42-mathematics-11-03576" class="html-disp-formula">42</a>) (solid lines) and simulation results (markers) according to the coupled Langevin Equations (<a href="#FD43-mathematics-11-03576" class="html-disp-formula">43</a>)–(<a href="#FD45-mathematics-11-03576" class="html-disp-formula">45</a>), for different mean-reverting rates <math display="inline"><semantics> <msub> <mi>λ</mi> <mi>y</mi> </msub> </semantics></math>, in the case of global resetting. On the left panel (<b>a</b>) <math display="inline"><semantics> <mrow> <msub> <mi>λ</mi> <mi>y</mi> </msub> <mo>=</mo> <mn>0</mn> </mrow> </semantics></math>, the central panel (<b>b</b>) <math display="inline"><semantics> <mrow> <msub> <mi>λ</mi> <mi>y</mi> </msub> <mo>=</mo> <mn>0.1</mn> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mi>λ</mi> <mi>y</mi> </msub> <mo>=</mo> <mn>0.25</mn> </mrow> </semantics></math> on the right panel (<b>c</b>), where an increase in MSD is visible for larger <math display="inline"><semantics> <msub> <mi>λ</mi> <mi>y</mi> </msub> </semantics></math>. We set <math display="inline"><semantics> <mrow> <msub> <mi>x</mi> <mn>0</mn> </msub> <mo>=</mo> <mn>0</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi>σ</mi> <mi>x</mi> </msub> <mo>=</mo> <msub> <mi>σ</mi> <mi>y</mi> </msub> <mo>=</mo> <msub> <mi>σ</mi> <mi>z</mi> </msub> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>μ</mi> <mo>=</mo> <mn>5</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi>λ</mi> <mi>x</mi> </msub> <mo>=</mo> <mn>3</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>r</mi> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>N</mi> <mo>=</mo> <msup> <mn>10</mn> <mn>3</mn> </msup> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>d</mi> <mi>t</mi> <mo>=</mo> <mn>0.001</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>t</mi> <mo>=</mo> <msup> <mn>10</mn> <mn>3</mn> </msup> </mrow> </semantics></math>.</p>
Full article ">Figure 8
<p>Monte Carlo simulations of the PDF, according to the coupled Langevin equations in the case of (<b>a</b>) Global resetting, Equations (<a href="#FD43-mathematics-11-03576" class="html-disp-formula">43</a>)–(<a href="#FD45-mathematics-11-03576" class="html-disp-formula">45</a>); (<b>b</b>) Resetting to the backbone, Equations (<a href="#FD57-mathematics-11-03576" class="html-disp-formula">57</a>)–(<a href="#FD59-mathematics-11-03576" class="html-disp-formula">59</a>); (<b>c</b>) Resetting to the main fingers, Equations (<a href="#FD71-mathematics-11-03576" class="html-disp-formula">71</a>)–(<a href="#FD73-mathematics-11-03576" class="html-disp-formula">73</a>), for different mean-reverting rates <math display="inline"><semantics> <mrow> <msub> <mi>λ</mi> <mi>y</mi> </msub> <mo>=</mo> <mfenced separators="" open="{" close="}"> <mn>0</mn> <mo>,</mo> <mn>0.1</mn> <mo>,</mo> <mn>0.25</mn> </mfenced> </mrow> </semantics></math>. We set <math display="inline"><semantics> <mrow> <msub> <mi>x</mi> <mn>0</mn> </msub> <mo>=</mo> <mn>0</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi>σ</mi> <mi>x</mi> </msub> <mo>=</mo> <msub> <mi>σ</mi> <mi>y</mi> </msub> <mo>=</mo> <msub> <mi>σ</mi> <mi>z</mi> </msub> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>μ</mi> <mo>=</mo> <mn>5</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi>λ</mi> <mi>x</mi> </msub> <mo>=</mo> <mn>3</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>r</mi> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>N</mi> <mo>=</mo> <msup> <mn>10</mn> <mn>3</mn> </msup> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>d</mi> <mi>t</mi> <mo>=</mo> <mn>0.001</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>t</mi> <mo>=</mo> <msup> <mn>10</mn> <mn>3</mn> </msup> </mrow> </semantics></math>.</p>
Full article ">
21 pages, 7864 KiB  
Article
Variational Hybrid Monte Carlo for Efficient Multi-Modal Data Sampling
by Shiliang Sun, Jing Zhao, Minghao Gu and Shanhu Wang
Entropy 2023, 25(4), 560; https://doi.org/10.3390/e25040560 - 24 Mar 2023
Cited by 3 | Viewed by 1470
Abstract
The Hamiltonian Monte Carlo (HMC) sampling algorithm exploits Hamiltonian dynamics to construct efficient Markov Chain Monte Carlo (MCMC), which has become increasingly popular in machine learning and statistics. Since HMC uses the gradient information of the target distribution, it can explore the state [...] Read more.
The Hamiltonian Monte Carlo (HMC) sampling algorithm exploits Hamiltonian dynamics to construct efficient Markov Chain Monte Carlo (MCMC), which has become increasingly popular in machine learning and statistics. Since HMC uses the gradient information of the target distribution, it can explore the state space much more efficiently than random-walk proposals, but may suffer from high autocorrelation. In this paper, we propose Langevin Hamiltonian Monte Carlo (LHMC) to reduce the autocorrelation of the samples. Probabilistic inference involving multi-modal distributions is very difficult for dynamics-based MCMC samplers, which is easily trapped in the mode far away from other modes. To tackle this issue, we further propose a variational hybrid Monte Carlo (VHMC) which uses a variational distribution to explore the phase space and find new modes, and it is capable of sampling from multi-modal distributions effectively. A formal proof is provided that shows that the proposed method can converge to target distributions. Both synthetic and real datasets are used to evaluate its properties and performance. The experimental results verify the theory and show superior performance in multi-modal sampling. Full article
Show Figures

Figure 1

Figure 1
<p>The comparison of autocorrelation and maximum mean discrepancy (MMD) for three different methods. The (<b>left</b>) column shows the relationship between autocorrelation and lag of sample number after the burn-in period. The (<b>right</b>) column demonstrates the relationship between MMD and sample number (best viewed in color).</p>
Full article ">Figure 2
<p>The guide points (red points) when using VHMC (best viewed in color).</p>
Full article ">Figure 3
<p>The probability density function between parallel HMC and the actual distribution.</p>
Full article ">Figure 4
<p>Sampling experiment results for mixture Gaussian distributions. In the first line, HMC-C and MHMC-C and VHMC-C represent HMC, MHMC and VHMC sample from a Gaussian mixture whose modes are close to each other and the mean value of each mode of the Gaussian mixture is <math display="inline"><semantics> <mrow> <msub> <mi>μ</mi> <mn>0</mn> </msub> <mo>=</mo> <mrow> <mo>(</mo> <mn>2.5</mn> <mo>,</mo> <mo>−</mo> <mn>2.5</mn> <mo>)</mo> </mrow> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mi>μ</mi> <mn>1</mn> </msub> <mo>=</mo> <mrow> <mo>(</mo> <mo>−</mo> <mn>2.5</mn> <mo>,</mo> <mn>2.5</mn> <mo>)</mo> </mrow> </mrow> </semantics></math>, respectively. In the second line, HMC-F, MHMC-F and VHMC-F represent HMC, MHMC and VHMC sample from a Gaussian mixture whose modes are far away from each other and the mean value of each mode of the Gaussian mixture is <math display="inline"><semantics> <mrow> <msub> <mi>μ</mi> <mn>0</mn> </msub> <mo>=</mo> <mrow> <mo>(</mo> <mn>6.5</mn> <mo>,</mo> <mo>−</mo> <mn>6.5</mn> <mo>)</mo> </mrow> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mi>μ</mi> <mn>1</mn> </msub> <mo>=</mo> <mrow> <mo>(</mo> <mo>−</mo> <mn>6.5</mn> <mo>,</mo> <mn>6.5</mn> <mo>)</mo> </mrow> </mrow> </semantics></math>, respectively.</p>
Full article ">Figure 5
<p>The comparison of MMD and autocorrelation for four different methods. The (<b>left</b>) column shows the relationship between MMD and sample number and the (<b>right</b>) column demonstrates the relationship between autocorrelation and lag of sample number after the burn-in period.</p>
Full article ">Figure 6
<p>Relative error of mean on high dimensions.</p>
Full article ">Figure 7
<p>The performance of HMC and VHMC on the mixture of heterogeneous Gaussians. In the first column, we show the scatter diagram of HMC (<b>upper</b>) and VHMC (<b>bottom</b>). In the second column, we show the histogram of HMC (<b>upper</b>) and VHMC (<b>bottom</b>).</p>
Full article ">
27 pages, 422 KiB  
Article
From Bilinear Regression to Inductive Matrix Completion: A Quasi-Bayesian Analysis
by The Tien Mai
Entropy 2023, 25(2), 333; https://doi.org/10.3390/e25020333 - 11 Feb 2023
Cited by 2 | Viewed by 1438
Abstract
In this paper, we study the problem of bilinear regression, a type of statistical modeling that deals with multiple variables and multiple responses. One of the main difficulties that arise in this problem is the presence of missing data in the response matrix, [...] Read more.
In this paper, we study the problem of bilinear regression, a type of statistical modeling that deals with multiple variables and multiple responses. One of the main difficulties that arise in this problem is the presence of missing data in the response matrix, a problem known as inductive matrix completion. To address these issues, we propose a novel approach that combines elements of Bayesian statistics with a quasi-likelihood method. Our proposed method starts by addressing the problem of bilinear regression using a quasi-Bayesian approach. The quasi-likelihood method that we employ in this step allows us to handle the complex relationships between the variables in a more robust way. Next, we adapt our approach to the context of inductive matrix completion. We make use of a low-rankness assumption and leverage the powerful PAC-Bayes bound technique to provide statistical properties for our proposed estimators and for the quasi-posteriors. To compute the estimators, we propose a Langevin Monte Carlo method to obtain approximate solutions to the problem of inductive matrix completion in a computationally efficient manner. To demonstrate the effectiveness of our proposed methods, we conduct a series of numerical studies. These studies allow us to evaluate the performance of our estimators under different conditions and provide a clear illustration of the strengths and limitations of our approach. Full article
(This article belongs to the Special Issue Recent Advances in Statistical Theory and Applications)
20 pages, 13011 KiB  
Article
Finite Iterative Forecasting Model Based on Fractional Generalized Pareto Motion
by Wanqing Song, Shouwu Duan, Dongdong Chen, Enrico Zio, Wenduan Yan and Fan Cai
Fractal Fract. 2022, 6(9), 471; https://doi.org/10.3390/fractalfract6090471 - 26 Aug 2022
Cited by 4 | Viewed by 1213
Abstract
In this paper, an efficient prediction model based on the fractional generalized Pareto motion (fGPm) with Long-Range Dependent (LRD) and infinite variance characteristics is proposed. Firstly, we discuss the meaning of each parameter of the generalized Pareto distribution (GPD), and the LRD characteristics [...] Read more.
In this paper, an efficient prediction model based on the fractional generalized Pareto motion (fGPm) with Long-Range Dependent (LRD) and infinite variance characteristics is proposed. Firstly, we discuss the meaning of each parameter of the generalized Pareto distribution (GPD), and the LRD characteristics of the generalized Pareto motion are analyzed by taking into account the heavy-tailed characteristics of its distribution. Then, the mathematical relationship H=1α between the self-similar parameter H and the tail parameter α is obtained. Also, the generalized Pareto increment distribution is obtained using statistical methods, which offers the subsequent derivation of the iterative forecasting model based on the increment form. Secondly, the tail parameter α is introduced to generalize the integral expression of the fractional Brownian motion, and the integral expression of fGPm is obtained. Then, by discretizing the integral expression of fGPm, the statistical characteristics of infinite variance is shown. In addition, in order to study the LRD prediction characteristic of fGPm, LRD and self-similarity analysis are performed on fGPm, and the LRD prediction conditions H>1α is obtained. Compared to the fractional Brownian motion describing LRD by a self-similar parameter H, fGPm introduces the tail parameter α, which increases the flexibility of the LRD description. However, the two parameters are not independent, because of the LRD condition H>1α. An iterative prediction model is obtained from the Langevin-type stochastic differential equation driven by fGPm. The prediction model inherits the LRD condition H>1α of fGPm and the time series, simulated by the Monte Carlo method, shows the superiority of the prediction model to predict data with high jumps. Finally, this paper uses power load data in two different situations (weekdays and weekends), used to verify the validity and general applicability of the forecasting model, which is compared with the fractional Brown prediction model, highlighting the “high jump data prediction advantage” of the fGPm prediction model. Full article
(This article belongs to the Special Issue New Trends in Fractional Stochastic Processes)
Show Figures

Figure 1

Figure 1
<p>Comparison of <span class="html-italic">δ</span> value in GPD PDF plots.</p>
Full article ">Figure 2
<p>Comparison of <span class="html-italic">α</span> value in GPD PDF plots.</p>
Full article ">Figure 3
<p>GP time series.</p>
Full article ">Figure 4
<p>GP increment time series.</p>
Full article ">Figure 5
<p>Probability density of GP increment.</p>
Full article ">Figure 6
<p>Comparison of <span class="html-italic">δ</span> value in GPD PDF plots.</p>
Full article ">Figure 7
<p>Comparison of <span class="html-italic">α</span> value in GPD PDF plots.</p>
Full article ">Figure 8
<p>fGPm generated with different <math display="inline"><semantics> <mrow> <mi>α</mi> <mo>=</mo> <mn>2</mn> <mo>,</mo> <mo> </mo> <mn>1.75</mn> <mo> </mo> <mo>,</mo> <mo> </mo> <mn>1.5</mn> <mo> </mo> <mo>,</mo> <mo> </mo> <mn>1.25</mn> </mrow> </semantics></math>, respectively, <math display="inline"><semantics> <mrow> <mi>H</mi> <mo>=</mo> <mn>0.75</mn> </mrow> </semantics></math>.</p>
Full article ">Figure 8 Cont.
<p>fGPm generated with different <math display="inline"><semantics> <mrow> <mi>α</mi> <mo>=</mo> <mn>2</mn> <mo>,</mo> <mo> </mo> <mn>1.75</mn> <mo> </mo> <mo>,</mo> <mo> </mo> <mn>1.5</mn> <mo> </mo> <mo>,</mo> <mo> </mo> <mn>1.25</mn> </mrow> </semantics></math>, respectively, <math display="inline"><semantics> <mrow> <mi>H</mi> <mo>=</mo> <mn>0.75</mn> </mrow> </semantics></math>.</p>
Full article ">Figure 9
<p>fGPm increment data generated with different <math display="inline"><semantics> <mrow> <mi>α</mi> <mo>=</mo> <mn>2</mn> <mo>,</mo> <mo> </mo> <mn>1.75</mn> <mo> </mo> <mo>,</mo> <mo> </mo> <mn>1.5</mn> <mo> </mo> <mo>,</mo> <mo> </mo> <mn>1.25</mn> </mrow> </semantics></math>, respectively, <math display="inline"><semantics> <mrow> <mi>H</mi> <mo>=</mo> <mn>0.75</mn> </mrow> </semantics></math>.</p>
Full article ">Figure 9 Cont.
<p>fGPm increment data generated with different <math display="inline"><semantics> <mrow> <mi>α</mi> <mo>=</mo> <mn>2</mn> <mo>,</mo> <mo> </mo> <mn>1.75</mn> <mo> </mo> <mo>,</mo> <mo> </mo> <mn>1.5</mn> <mo> </mo> <mo>,</mo> <mo> </mo> <mn>1.25</mn> </mrow> </semantics></math>, respectively, <math display="inline"><semantics> <mrow> <mi>H</mi> <mo>=</mo> <mn>0.75</mn> </mrow> </semantics></math>.</p>
Full article ">Figure 10
<p>fGPm sequences generated with <math display="inline"><semantics> <mrow> <mi>α</mi> <mo>=</mo> <mn>1.5</mn> <mo> </mo> <mo>,</mo> <mo> </mo> <mo>µ</mo> <mo>=</mo> <mn>0.4586</mn> <mo>,</mo> <mo> </mo> <mi>δ</mi> <mo>=</mo> <mn>0.0396</mn> <mo> </mo> <mo>,</mo> <mo> </mo> <mi>H</mi> <mo>=</mo> <mn>0.75</mn> <mo> </mo> <mo>,</mo> <mo> </mo> <msub> <mi>X</mi> <mn>0</mn> </msub> <mo>=</mo> <mn>0.6</mn> </mrow> </semantics></math>, by the Monte Carlo simulation method.</p>
Full article ">Figure 11
<p>Forecasting process.</p>
Full article ">Figure 12
<p>The 96-h power load of the actual working day of the fGPm forecast model and the subsequent forecast 24-h trend chart. (<b>a</b>) 6 h (<b>b</b>) 12 h (<b>c</b>) 18 h (<b>d</b>) 24 h.</p>
Full article ">Figure 13
<p>Comparison of two forecasting models in working days. (<b>a</b>) 6 h (<b>b</b>) 12 h (<b>c</b>) 18 h (<b>d</b>) 24 h.</p>
Full article ">Figure 13 Cont.
<p>Comparison of two forecasting models in working days. (<b>a</b>) 6 h (<b>b</b>) 12 h (<b>c</b>) 18 h (<b>d</b>) 24 h.</p>
Full article ">Figure 14
<p>Actual weekend 96 h power load and subsequent forecast 24 h trend chart. (<b>a</b>) 6 h (<b>b</b>) 12 h (<b>c</b>) 18 h (<b>d</b>) 24 h.</p>
Full article ">Figure 14 Cont.
<p>Actual weekend 96 h power load and subsequent forecast 24 h trend chart. (<b>a</b>) 6 h (<b>b</b>) 12 h (<b>c</b>) 18 h (<b>d</b>) 24 h.</p>
Full article ">Figure 15
<p>Comparison of two forecasting models in weekend. (<b>a</b>) 6 h (<b>b</b>) 12 h (<b>c</b>) 18 h (<b>d</b>) 24 h.</p>
Full article ">
19 pages, 5449 KiB  
Article
Monte Carlo Simulation of Stochastic Differential Equation to Study Information Geometry
by Abhiram Anand Thiruthummal and Eun-jin Kim
Entropy 2022, 24(8), 1113; https://doi.org/10.3390/e24081113 - 12 Aug 2022
Cited by 7 | Viewed by 2164
Abstract
Information Geometry is a useful tool to study and compare the solutions of a Stochastic Differential Equations (SDEs) for non-equilibrium systems. As an alternative method to solving the Fokker–Planck equation, we propose a new method to calculate time-dependent probability density functions (PDFs) and [...] Read more.
Information Geometry is a useful tool to study and compare the solutions of a Stochastic Differential Equations (SDEs) for non-equilibrium systems. As an alternative method to solving the Fokker–Planck equation, we propose a new method to calculate time-dependent probability density functions (PDFs) and to study Information Geometry using Monte Carlo (MC) simulation of SDEs. Specifically, we develop a new MC SDE method to overcome the challenges in calculating a time-dependent PDF and information geometric diagnostics and to speed up simulations by utilizing GPU computing. Using MC SDE simulations, we reproduce Information Geometric scaling relations found from the Fokker–Planck method for the case of a stochastic process with linear and cubic damping terms. We showcase the advantage of MC SDE simulation over FPE solvers by calculating unequal time joint PDFs. For the linear process with a linear damping force, joint PDF is found to be a Gaussian. In contrast, for the cubic process with a cubic damping force, joint PDF exhibits a bimodal structure, even in a stationary state. This suggests a finite memory time induced by a nonlinear force. Furthermore, several power-law scalings in the characteristics of bimodal PDFs are identified and investigated. Full article
Show Figures

Figure 1

Figure 1
<p>Probability density at adjacent time steps for the SDE <math display="inline"><semantics> <mrow> <mi>d</mi> <msub> <mi>x</mi> <mi>t</mi> </msub> <mo>=</mo> <mo>−</mo> <msub> <mi>x</mi> <mi>t</mi> </msub> <mi>d</mi> <mi>t</mi> <mo>+</mo> <mn>0.1</mn> <mi>d</mi> <msub> <mi>W</mi> <mi>t</mi> </msub> </mrow> </semantics></math> with initial normal distribution <math display="inline"><semantics> <mrow> <mi mathvariant="script">N</mi> <mo stretchy="false">(</mo> <msup> <mn>10</mn> <mn>5</mn> </msup> <mo>,</mo> <msup> <mn>10</mn> <mrow> <mo>−</mo> <mn>10</mn> </mrow> </msup> <mo stretchy="false">)</mo> </mrow> </semantics></math>. Time steps were chosen adaptively to limit local error to <math display="inline"><semantics> <mrow> <mn>5</mn> <mo>×</mo> <msup> <mn>10</mn> <mrow> <mo>−</mo> <mn>4</mn> </mrow> </msup> </mrow> </semantics></math>. The specific time interval was chosen to showcase the lack of overlap between PDFs.</p>
Full article ">Figure 2
<p>(<b>left</b>) Error in <math display="inline"><semantics> <mi mathvariant="sans-serif">Γ</mi> </semantics></math> calculation for two Gaussians with same standard deviation (Std. Dev. = 1) but with different means. The error estimate of <math display="inline"><semantics> <mi mathvariant="sans-serif">Γ</mi> </semantics></math> is lowest when <math display="inline"><semantics> <mrow> <mi mathvariant="sans-serif">Δ</mi> <mi>Mean</mi> <mo>≈</mo> <mn>0.2</mn> <mspace width="0.166667em"/> <mrow> <mi>Std</mi> <mo>.</mo> </mrow> <mspace width="4.pt"/> <mi>Dev</mi> </mrow> </semantics></math>. (<b>right</b>) Error in <math display="inline"><semantics> <mi mathvariant="sans-serif">Γ</mi> </semantics></math> calculation for two Gaussians with same Mean (Mean = 0) but with different standard deviation. The error <math display="inline"><semantics> <mi mathvariant="sans-serif">Γ</mi> </semantics></math> estimate is lowest when ratio of standard deviation is approximately <math display="inline"><semantics> <mrow> <mn>0.9</mn> </mrow> </semantics></math> or <math display="inline"><semantics> <mrow> <mn>1.1</mn> </mrow> </semantics></math> (≈0.9<math display="inline"><semantics> <msup> <mrow/> <mrow> <mo>−</mo> <mn>1</mn> </mrow> </msup> </semantics></math>). Discretized version of Equation (<a href="#FD10-entropy-24-01113" class="html-disp-formula">10</a>) was used for the estimate and PDF was approximated by a histogram with 703 bins. We considered <math display="inline"><semantics> <mrow> <mn>2</mn> <mo>×</mo> <msup> <mn>10</mn> <mn>7</mn> </msup> </mrow> </semantics></math> samples for each distribution. <math display="inline"><semantics> <mrow> <mi>d</mi> <mi>t</mi> </mrow> </semantics></math> is chosen to be 1. Estimates were repeated 40 times and mean of the error was taken.</p>
Full article ">Figure 3
<p>Error in estimated (<b>left</b>) <math display="inline"><semantics> <mi mathvariant="sans-serif">Γ</mi> </semantics></math> and (<b>right</b>) <math display="inline"><semantics> <mi mathvariant="script">L</mi> </semantics></math> by simulating <math display="inline"><semantics> <mrow> <mn>2</mn> <mo>×</mo> <msup> <mn>10</mn> <mn>7</mn> </msup> </mrow> </semantics></math> parallel instances of the Equation <math display="inline"><semantics> <mrow> <mi>d</mi> <msub> <mi>x</mi> <mi>t</mi> </msub> <mo>=</mo> <mo>−</mo> <mn>1.0</mn> <msub> <mi>x</mi> <mi>t</mi> </msub> <mi>d</mi> <mi>t</mi> <mo>+</mo> <mn>0.1</mn> <mi>d</mi> <msub> <mi>W</mi> <mi>t</mi> </msub> </mrow> </semantics></math> with initial values sampled from the normal distribution <math display="inline"><semantics> <mrow> <mi mathvariant="script">N</mi> <mo stretchy="false">(</mo> <msub> <mi>x</mi> <mn>0</mn> </msub> <mo>,</mo> <msup> <mn>10</mn> <mrow> <mo>−</mo> <mn>10</mn> </mrow> </msup> <mo stretchy="false">)</mo> </mrow> </semantics></math> and computing the PDFs using histograms with 703 bins. The local error tolerance was <math display="inline"><semantics> <mrow> <mn>5</mn> <mo>×</mo> <msup> <mn>10</mn> <mrow> <mo>−</mo> <mn>4</mn> </mrow> </msup> </mrow> </semantics></math>. Note that an initial step size <math display="inline"><semantics> <mrow> <mi mathvariant="sans-serif">Δ</mi> <mi>t</mi> <mo>=</mo> <msup> <mn>10</mn> <mrow> <mo>−</mo> <mn>14</mn> </mrow> </msup> </mrow> </semantics></math> was chosen to produce more frequent estimates in the initial part of the simulation. See <a href="#app4-entropy-24-01113" class="html-app">Appendix D</a> for exact solution.</p>
Full article ">Figure 4
<p>The <math display="inline"><semantics> <mi mathvariant="sans-serif">Γ</mi> </semantics></math> of (<b>left</b>) Linear SDE and (<b>right</b>) Cubic SDE for different initial conditions <math display="inline"><semantics> <mrow> <mi mathvariant="script">N</mi> <mo stretchy="false">(</mo> <msub> <mi>x</mi> <mn>0</mn> </msub> <mo>,</mo> <msup> <mn>10</mn> <mrow> <mo>−</mo> <mn>10</mn> </mrow> </msup> <mo stretchy="false">)</mo> </mrow> </semantics></math>, as described in <a href="#sec3-entropy-24-01113" class="html-sec">Section 3</a>. Note that towards the end of the range of <span class="html-italic">t</span>, the values will be dominated by errors, as shown in <a href="#entropy-24-01113-f003" class="html-fig">Figure 3</a>. Therefore instead of dropping to zero, they will have a finite nonzero value. For exact solution of Linear SDE, see <a href="#app4-entropy-24-01113" class="html-app">Appendix D</a>.</p>
Full article ">Figure 5
<p>The mean of the distribution for different initial conditions <math display="inline"><semantics> <mrow> <mi mathvariant="script">N</mi> <mo stretchy="false">(</mo> <msub> <mi>x</mi> <mn>0</mn> </msub> <mo>,</mo> <msup> <mn>10</mn> <mrow> <mo>−</mo> <mn>10</mn> </mrow> </msup> <mo stretchy="false">)</mo> </mrow> </semantics></math> of (<b>left</b>) linear SDE and (<b>right</b>) cubic SDE.</p>
Full article ">Figure 6
<p>The standard deviation of the distribution for different initial conditions <math display="inline"><semantics> <mrow> <mi mathvariant="script">N</mi> <mo stretchy="false">(</mo> <msub> <mi>x</mi> <mn>0</mn> </msub> <mo>,</mo> <msup> <mn>10</mn> <mrow> <mo>−</mo> <mn>10</mn> </mrow> </msup> <mo stretchy="false">)</mo> </mrow> </semantics></math> of (<b>left</b>) linear SDE and (<b>right</b>) cubic SDE. Note that for the linear SDE, all the lines overlap.</p>
Full article ">Figure 7
<p>The Information Length of (<b>left</b>) linear SDE and (<b>right</b>) cubic SDE for different initial conditions <math display="inline"><semantics> <mrow> <mi mathvariant="script">N</mi> <mo stretchy="false">(</mo> <msub> <mi>x</mi> <mn>0</mn> </msub> <mo>,</mo> <msup> <mn>10</mn> <mrow> <mo>−</mo> <mn>10</mn> </mrow> </msup> <mo stretchy="false">)</mo> </mrow> </semantics></math>. For exact solution of linear SDE, see <a href="#app4-entropy-24-01113" class="html-app">Appendix D</a>.</p>
Full article ">Figure 8
<p>Scaling behavior of <math display="inline"><semantics> <msub> <mi mathvariant="script">L</mi> <mo>∞</mo> </msub> </semantics></math> with respect to <math display="inline"><semantics> <msub> <mi>x</mi> <mn>0</mn> </msub> </semantics></math> for linear and cubic SDE with <math display="inline"><semantics> <mrow> <mi>θ</mi> <mo>=</mo> <mn>1.0</mn> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>σ</mi> <mo>=</mo> <mn>0.1</mn> </mrow> </semantics></math>.</p>
Full article ">Figure 9
<p><math display="inline"><semantics> <mrow> <mi>P</mi> <mfenced separators="" open="(" close=")"> <mi>x</mi> <mfenced separators="" open="(" close=")"> <msub> <mi>t</mi> <mn>1</mn> </msub> </mfenced> <mo>,</mo> <mi>x</mi> <mfenced separators="" open="(" close=")"> <msub> <mi>t</mi> <mn>2</mn> </msub> </mfenced> </mfenced> </mrow> </semantics></math> for (<b>left</b>) linear SDE with <math display="inline"><semantics> <mrow> <msub> <mi>t</mi> <mn>1</mn> </msub> <mo>=</mo> <mn>200</mn> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mi>t</mi> <mn>2</mn> </msub> <mo>=</mo> <mn>207</mn> </mrow> </semantics></math> and (<b>right</b>) cubic SDE with <math display="inline"><semantics> <mrow> <msub> <mi>t</mi> <mn>1</mn> </msub> <mo>=</mo> <mn>1500</mn> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mi>t</mi> <mn>2</mn> </msub> <mo>=</mo> <mn>1507</mn> </mrow> </semantics></math>. For both the SDEs, <math display="inline"><semantics> <mrow> <mi>γ</mi> <mo>=</mo> <mn>1.0</mn> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>σ</mi> <mo>=</mo> <mn>0.1</mn> </mrow> </semantics></math>. Both SDEs have reached their stationary states.</p>
Full article ">Figure 10
<p>Panels show the values of <span class="html-italic">a</span>, <span class="html-italic">b</span>, <span class="html-italic">c</span>, <span class="html-italic">d</span> and <span class="html-italic">e</span> parameters of Equation (<a href="#FD22-entropy-24-01113" class="html-disp-formula">22</a>), obtained through nonlinear function fitting at each time point <math display="inline"><semantics> <msub> <mi>t</mi> <mn>2</mn> </msub> </semantics></math>, expressed as a function of <math display="inline"><semantics> <mrow> <mi mathvariant="sans-serif">Δ</mi> <mi>t</mi> <mo>:</mo> <mo>=</mo> <msub> <mi>t</mi> <mn>2</mn> </msub> <mo>−</mo> <msub> <mi>t</mi> <mn>1</mn> </msub> </mrow> </semantics></math> for different noise levels <math display="inline"><semantics> <mi>σ</mi> </semantics></math>. To ensure the density function <math display="inline"><semantics> <mrow> <mi>p</mi> <mo stretchy="false">(</mo> <mi>x</mi> <mo>,</mo> <mi>t</mi> <mo stretchy="false">)</mo> </mrow> </semantics></math> for the cubic process reached a stationary state, we chose <math display="inline"><semantics> <mrow> <msub> <mi>t</mi> <mn>1</mn> </msub> <mo>=</mo> <mn>1500</mn> </mrow> </semantics></math>. Note that for parameters <span class="html-italic">b</span>, <span class="html-italic">d</span> and <span class="html-italic">e</span>, the domain of the plots is restricted to the region where parameter <span class="html-italic">b</span> is nonzero. Noise starts dominating outside this region, since there is negligible contribution from the quadratic terms, and nonlinear fit cannot find a consistent unique value for these parameters.</p>
Full article ">Figure 10 Cont.
<p>Panels show the values of <span class="html-italic">a</span>, <span class="html-italic">b</span>, <span class="html-italic">c</span>, <span class="html-italic">d</span> and <span class="html-italic">e</span> parameters of Equation (<a href="#FD22-entropy-24-01113" class="html-disp-formula">22</a>), obtained through nonlinear function fitting at each time point <math display="inline"><semantics> <msub> <mi>t</mi> <mn>2</mn> </msub> </semantics></math>, expressed as a function of <math display="inline"><semantics> <mrow> <mi mathvariant="sans-serif">Δ</mi> <mi>t</mi> <mo>:</mo> <mo>=</mo> <msub> <mi>t</mi> <mn>2</mn> </msub> <mo>−</mo> <msub> <mi>t</mi> <mn>1</mn> </msub> </mrow> </semantics></math> for different noise levels <math display="inline"><semantics> <mi>σ</mi> </semantics></math>. To ensure the density function <math display="inline"><semantics> <mrow> <mi>p</mi> <mo stretchy="false">(</mo> <mi>x</mi> <mo>,</mo> <mi>t</mi> <mo stretchy="false">)</mo> </mrow> </semantics></math> for the cubic process reached a stationary state, we chose <math display="inline"><semantics> <mrow> <msub> <mi>t</mi> <mn>1</mn> </msub> <mo>=</mo> <mn>1500</mn> </mrow> </semantics></math>. Note that for parameters <span class="html-italic">b</span>, <span class="html-italic">d</span> and <span class="html-italic">e</span>, the domain of the plots is restricted to the region where parameter <span class="html-italic">b</span> is nonzero. Noise starts dominating outside this region, since there is negligible contribution from the quadratic terms, and nonlinear fit cannot find a consistent unique value for these parameters.</p>
Full article ">Figure 11
<p>(<b>left</b>) Scaling behavior of peak poisition of parameter <span class="html-italic">b</span> with respect to noise level <math display="inline"><semantics> <mi>σ</mi> </semantics></math> and (<b>right</b>) Scaling behavior of parameter values corresponding to the peak position of parameter <span class="html-italic">b</span>.</p>
Full article ">Figure 12
<p>Behavior of (<b>left</b>) Parameter <span class="html-italic">a</span> and (<b>right</b>) Parameter <span class="html-italic">b</span> for different <math display="inline"><semantics> <msub> <mi>t</mi> <mn>1</mn> </msub> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mi>t</mi> <mn>2</mn> </msub> <mo>=</mo> <msub> <mi>t</mi> <mn>1</mn> </msub> <mo>+</mo> <mi mathvariant="sans-serif">Δ</mi> <mi>t</mi> </mrow> </semantics></math> values.</p>
Full article ">Figure A1
<p>(<b>left</b>) Scaling of runtime with respect to the number of samples <span class="html-italic">n</span>. (<b>right</b>) Scaling of runtime with respect to local error tolerance <math display="inline"><semantics> <mrow> <mi>T</mi> <mi>o</mi> <mi>l</mi> </mrow> </semantics></math>.</p>
Full article ">Figure A2
<p>(<b>left</b>) Scaling of runtime with respect to different initial position <math display="inline"><semantics> <msub> <mi>x</mi> <mn>0</mn> </msub> </semantics></math>. Note that unlike other plots in this section, the y-axis is not in log scale. (<b>right</b>) Scaling of runtime with respect to different noise levels <math display="inline"><semantics> <mrow> <mi>D</mi> <mo>:</mo> <mo>=</mo> <msup> <mi>σ</mi> <mn>2</mn> </msup> <mo>/</mo> <mn>2</mn> </mrow> </semantics></math>.</p>
Full article ">
20 pages, 1250 KiB  
Article
Locally Scaled and Stochastic Volatility Metropolis– Hastings Algorithms
by Wilson Tsakane Mongwe, Rendani Mbuvha and Tshilidzi Marwala
Algorithms 2021, 14(12), 351; https://doi.org/10.3390/a14120351 - 30 Nov 2021
Cited by 4 | Viewed by 2637
Abstract
Markov chain Monte Carlo (MCMC) techniques are usually used to infer model parameters when closed-form inference is not feasible, with one of the simplest MCMC methods being the random walk Metropolis–Hastings (MH) algorithm. The MH algorithm suffers from random walk behaviour, which results [...] Read more.
Markov chain Monte Carlo (MCMC) techniques are usually used to infer model parameters when closed-form inference is not feasible, with one of the simplest MCMC methods being the random walk Metropolis–Hastings (MH) algorithm. The MH algorithm suffers from random walk behaviour, which results in inefficient exploration of the target posterior distribution. This method has been improved upon, with algorithms such as Metropolis Adjusted Langevin Monte Carlo (MALA) and Hamiltonian Monte Carlo being examples of popular modifications to MH. In this work, we revisit the MH algorithm to reduce the autocorrelations in the generated samples without adding significant computational time. We present the: (1) Stochastic Volatility Metropolis–Hastings (SVMH) algorithm, which is based on using a random scaling matrix in the MH algorithm, and (2) Locally Scaled Metropolis–Hastings (LSMH) algorithm, in which the scaled matrix depends on the local geometry of the target distribution. For both these algorithms, the proposal distribution is still Gaussian centred at the current state. The empirical results show that these minor additions to the MH algorithm significantly improve the effective sample rates and predictive performance over the vanilla MH method. The SVMH algorithm produces similar effective sample sizes to the LSMH method, with SVMH outperforming LSMH on an execution time normalised effective sample size basis. The performance of the proposed methods is also compared to the MALA and the current state-of-art method being the No-U-Turn sampler (NUTS). The analysis is performed using a simulation study based on Neal’s funnel and multivariate Gaussian distributions and using real world data modeled using jump diffusion processes and Bayesian logistic regression. Although both MALA and NUTS outperform the proposed algorithms on an effective sample size basis, the SVMH algorithm has similar or better predictive performance when compared to MALA and NUTS across the various targets. In addition, the SVMH algorithm outperforms the other MCMC algorithms on a normalised effective sample size basis on the jump diffusion processes datasets. These results indicate the overall usefulness of the proposed algorithms. Full article
Show Figures

Figure 1

Figure 1
<p>Inference results across each dataset over thirty iterations of each algorithm. The first row is the effective sample size while the second row is the time-normalised effective sample size for each dataset.</p>
Full article ">Figure 2
<p>Negative log-likelihood diagnostic trace plots for various targets. These are traces plots from a single run of the MCMC chain. (<b>a</b>) Diagnostic negative log-likelihood trace plots for the Fraud dataset and (<b>b</b>) Diagnostic negative log-likelihood trace plots for the multivariate Gaussian with <math display="inline"> <semantics> <mrow> <mi>D</mi> <mo>=</mo> <mn>10</mn> </mrow> </semantics> </math>. The other targets produce similar convergence behavior.</p>
Full article ">Figure 3
<p>Autocorrelation plots for the first dimension across the various targets. These are autocorrelations from a single run of the MCMC chain. (<b>a</b>) Autocorrelation plot for the first dimension on the Neal funnel for <math display="inline"> <semantics> <mrow> <mi>D</mi> <mo>=</mo> <mn>20</mn> </mrow> </semantics> </math> and (<b>b</b>) Autocorrelation plot for the first dimension on the multivariate Gaussian with <math display="inline"> <semantics> <mrow> <mi>D</mi> <mo>=</mo> <mn>20</mn> </mrow> </semantics> </math>.</p>
Full article ">Figure 4
<p>Predictive performance for all the sampling methods on the real world classification datasets. The results were averaged over thirty runs of each algorithm. (<b>a</b>) ROC curves for the Heart dataset and (<b>b</b>) ROC curves for Australian credit dataset.</p>
Full article ">Figure 5
<p>Predictive performance for all the sampling methods on the real world classification datasets. The results were averaged over thirty runs of each algorithm. (<b>a</b>) ROC curves for the Fraud dataset and (<b>b</b>) ROC curves for German dataset.</p>
Full article ">
19 pages, 6036 KiB  
Article
The X-ray Sensitivity of an Amorphous Lead Oxide Photoconductor
by Oleksandr Grynko, Tristen Thibault, Emma Pineau and Alla Reznik
Sensors 2021, 21(21), 7321; https://doi.org/10.3390/s21217321 - 3 Nov 2021
Cited by 8 | Viewed by 2379
Abstract
The photoconductor layer is an important component of direct conversion flat panel X-ray imagers (FPXI); thus, it should be carefully selected to meet the requirements for the X-ray imaging detector, and its properties should be clearly understood to develop the most optimal detector [...] Read more.
The photoconductor layer is an important component of direct conversion flat panel X-ray imagers (FPXI); thus, it should be carefully selected to meet the requirements for the X-ray imaging detector, and its properties should be clearly understood to develop the most optimal detector design. Currently, amorphous selenium (a-Se) is the only photoconductor utilized in commercial direct conversion FPXIs for low-energy mammographic imaging, but it is not practically feasible for higher-energy diagnostic imaging. Amorphous lead oxide (a-PbO) photoconductor is considered as a replacement to a-Se in radiography, fluoroscopy, and tomosynthesis applications. In this work, we investigated the X-ray sensitivity of a-PbO, one of the most important parameters for X-ray photoconductors, and examined the underlying mechanisms responsible for charge generation and recombination. The X-ray sensitivity in terms of electron–hole pair creation energy, W±, was measured in a range of electric fields, X-ray energies, and exposure levels. W± decreases with the electric field and X-ray energy, saturating at 18–31 eV/ehp, depending on the energy of X-rays, but increases with the exposure rate. The peculiar dependencies of W± on these parameters lead to a conclusion that, at electric fields relevant to detector operation (~10 V/μm), the columnar recombination and the bulk recombination mechanisms interplay in the a-PbO photoconductor. Full article
(This article belongs to the Section Optical Sensors)
Show Figures

Figure 1

Figure 1
<p>Schematics of the XPM setup (not to scale).</p>
Full article ">Figure 2
<p>A typical X-ray response to 60 kVp irradiation at different electric fields.</p>
Full article ">Figure 3
<p>(<b>a</b>) <math display="inline"><semantics> <mrow> <msub> <mi>W</mi> <mo>±</mo> </msub> </mrow> </semantics></math> as a function of the electric field at different X-ray tube voltages. <span class="html-italic">W</span><sub>±</sub> decreases with the field and the energy of X-rays. (<b>b</b>) The same values replotted as a function of the reciprocal field 1/<span class="html-italic">F</span>.</p>
Full article ">Figure 4
<p><math display="inline"><semantics> <mrow> <msub> <mi>W</mi> <mo>±</mo> </msub> </mrow> </semantics></math> as a function of the exposure rate in different electric fields and at different tube voltages. <math display="inline"><semantics> <mrow> <msub> <mi>W</mi> <mo>±</mo> </msub> </mrow> </semantics></math> increases with the exposure rate.</p>
Full article ">Figure 5
<p><math display="inline"><semantics> <mrow> <msub> <mi>W</mi> <mo>±</mo> </msub> </mrow> </semantics></math> as a function of tube voltage at different electric fields. <math display="inline"><semantics> <mrow> <msub> <mi>W</mi> <mo>±</mo> </msub> </mrow> </semantics></math> decreases as the energy of X-rays increases.</p>
Full article ">Figure 6
<p><math display="inline"><semantics> <mrow> <msub> <mi>W</mi> <mo>±</mo> </msub> </mrow> </semantics></math>as a function of the mean energy of X-ray photons at different tube voltages and electric field strengths. <math display="inline"><semantics> <mrow> <msub> <mi>W</mi> <mo>±</mo> </msub> </mrow> </semantics></math> decreases as the energy of X-rays increases.</p>
Full article ">Figure 7
<p>Typical electron trajectories in PbO for a 37.7-keV electron beam.</p>
Full article ">Figure 8
<p>Charge collected versus the exposure for (<b>a</b>) the constant exposure rate and (<b>b</b>) the constant pulse duration at 60 kVp irradiation. (<b>c</b>) Slope values for the case of the constant pulse duration at different kVp and field strengths.</p>
Full article ">Figure A1
<p>(<b>a</b>) The simulated X-ray spectra at different tube voltages, normalized to 1 R of exposure. The inset shows typical (see text) parameters of the beams. (<b>b</b>) Exposure measured as a function of added Al filtration thickness for different tube voltages. The dashed lines correspond to 50% of the original exposure and HVL of Al. The inset shows calculated and measured HVL values for a naked tube.</p>
Full article ">Figure A2
<p>Schematic illustration of the charge generation and recombination processes. (<b>a</b>) A primary photoelectron ejected by an incident X-ray photon creates multiple ehps in each ionization event. (<b>b</b>) Secondary charge carriers thermalize in a drift-diffusion process and form a spur. (<b>c</b>) Overlapping spurs produce an ionization column along the track of the primary photoelectron. Oppositely charged carriers with a separation smaller than the Coulombic capture radius recombine. (<b>d</b>) The remaining carriers escape columnar recombination and (<b>e</b>) drift through the bulk of the photoconductor. (<b>f</b>) Drifting carriers from different columns and spurs recombine in the bulk.</p>
Full article ">
45 pages, 704 KiB  
Article
Accelerated Diffusion-Based Sampling by the Non-Reversible Dynamics with Skew-Symmetric Matrices
by Futoshi Futami, Tomoharu Iwata, Naonori Ueda and Issei Sato
Entropy 2021, 23(8), 993; https://doi.org/10.3390/e23080993 - 30 Jul 2021
Cited by 3 | Viewed by 3969
Abstract
Langevin dynamics (LD) has been extensively studied theoretically and practically as a basic sampling technique. Recently, the incorporation of non-reversible dynamics into LD is attracting attention because it accelerates the mixing speed of LD. Popular choices for non-reversible dynamics include underdamped Langevin dynamics [...] Read more.
Langevin dynamics (LD) has been extensively studied theoretically and practically as a basic sampling technique. Recently, the incorporation of non-reversible dynamics into LD is attracting attention because it accelerates the mixing speed of LD. Popular choices for non-reversible dynamics include underdamped Langevin dynamics (ULD), which uses second-order dynamics and perturbations with skew-symmetric matrices. Although ULD has been widely used in practice, the application of skew acceleration is limited although it is expected to show superior performance theoretically. Current work lacks a theoretical understanding of issues that are important to practitioners, including the selection criteria for skew-symmetric matrices, quantitative evaluations of acceleration, and the large memory cost of storing skew matrices. In this study, we theoretically and numerically clarify these problems by analyzing acceleration focusing on how the skew-symmetric matrix perturbs the Hessian matrix of potential functions. We also present a practical algorithm that accelerates the standard LD and ULD, which uses novel memory-efficient skew-symmetric matrices under parallel-chain Monte Carlo settings. Full article
(This article belongs to the Special Issue Approximate Bayesian Inference)
Show Figures

Figure 1

Figure 1
<p>Double-potential example: Poincaré constant is related to the eigenvalue at <math display="inline"><semantics> <msup> <mi>x</mi> <mo>∗</mo> </msup> </semantics></math>.</p>
Full article ">Figure 2
<p>Eigenvalue changes (averaged over ten trials).</p>
Full article ">Figure 3
<p>Convergence behavior of toy data in MMD (averaged over ten trials).</p>
Full article ">Figure 4
<p>Final performances of LDA under different values of <math display="inline"><semantics> <mi>α</mi> </semantics></math> (averaged over ten trials).</p>
Full article ">Figure 5
<p>LDA experiments (Averaged over 10 trials).</p>
Full article ">Figure 6
<p>MNIST classification (Averaged over ten trials).</p>
Full article ">Figure A1
<p>The convergence rate of ULD under the different Poincaré constants.</p>
Full article ">
16 pages, 2230 KiB  
Article
Comparative Modeling of Frequency Mixing Measurements of Magnetic Nanoparticles Using Micromagnetic Simulations and Langevin Theory
by Ulrich M. Engelmann, Ahmed Shalaby, Carolyn Shasha, Kannan M. Krishnan and Hans-Joachim Krause
Nanomaterials 2021, 11(5), 1257; https://doi.org/10.3390/nano11051257 - 11 May 2021
Cited by 10 | Viewed by 3257
Abstract
Dual frequency magnetic excitation of magnetic nanoparticles (MNP) enables enhanced biosensing applications. This was studied from an experimental and theoretical perspective: nonlinear sum-frequency components of MNP exposed to dual-frequency magnetic excitation were measured as a function of static magnetic offset field. The Langevin [...] Read more.
Dual frequency magnetic excitation of magnetic nanoparticles (MNP) enables enhanced biosensing applications. This was studied from an experimental and theoretical perspective: nonlinear sum-frequency components of MNP exposed to dual-frequency magnetic excitation were measured as a function of static magnetic offset field. The Langevin model in thermodynamic equilibrium was fitted to the experimental data to derive parameters of the lognormal core size distribution. These parameters were subsequently used as inputs for micromagnetic Monte-Carlo (MC)-simulations. From the hysteresis loops obtained from MC-simulations, sum-frequency components were numerically demodulated and compared with both experiment and Langevin model predictions. From the latter, we derived that approximately 90% of the frequency mixing magnetic response signal is generated by the largest 10% of MNP. We therefore suggest that small particles do not contribute to the frequency mixing signal, which is supported by MC-simulation results. Both theoretical approaches describe the experimental signal shapes well, but with notable differences between experiment and micromagnetic simulations. These deviations could result from Brownian relaxations which are, albeit experimentally inhibited, included in MC-simulation, or (yet unconsidered) cluster-effects of MNP, or inaccurately derived input for MC-simulations, because the largest particles dominate the experimental signal but concurrently do not fulfill the precondition of thermodynamic equilibrium required by Langevin theory. Full article
(This article belongs to the Special Issue Applications and Properties of Magnetic Nanoparticles)
Show Figures

Figure 1

Figure 1
<p>MNP nonlinear magnetic moment for dual frequency excitation at mixing frequencies <math display="inline"><semantics> <mrow> <msub> <mi>f</mi> <mrow> <mi>L</mi> <mi>F</mi> </mrow> </msub> <mo>+</mo> <mi>n</mi> <mo>⋅</mo> <msub> <mi>f</mi> <mrow> <mi>H</mi> <mi>F</mi> </mrow> </msub> </mrow> </semantics></math> with <math display="inline"><semantics> <mrow> <mi>n</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>,</mo> <mn>3</mn> <mo>,</mo> <mn>4</mn> </mrow> </semantics></math> from experimental measurement (<math display="inline"><semantics> <mrow> <msub> <mi>H</mi> <mn>1</mn> </msub> <mo>=</mo> <mn>1.29</mn> </mrow> </semantics></math> mT/<span class="html-italic">µ</span><sub>0</sub>, <math display="inline"><semantics> <mrow> <msub> <mi>f</mi> <mn>1</mn> </msub> <mo>=</mo> <mn>30</mn> <mo>,</mo> <mn>534</mn> </mrow> </semantics></math> Hz, <math display="inline"><semantics> <mrow> <msub> <mi>H</mi> <mn>2</mn> </msub> <mo>=</mo> <mn>16.4</mn> </mrow> </semantics></math> mT/<span class="html-italic">µ</span><sub>0</sub>, <math display="inline"><semantics> <mrow> <msub> <mi>f</mi> <mn>2</mn> </msub> <mo>=</mo> <mn>62.95</mn> </mrow> </semantics></math> Hz) and fitted with the Langevin model of Equation (10) with the same parameters.</p>
Full article ">Figure 2
<p>Exemplary magnetization curves (<span class="html-italic">M</span>(<span class="html-italic">H</span>)-loops) generated from micromagnetic MC-simulations for different static offset fields <math display="inline"><semantics> <mrow> <msub> <mi>H</mi> <mn>0</mn> </msub> <mo>=</mo> <mfenced> <mrow> <mn>0</mn> <mo>,</mo> <mn>10</mn> <mo>,</mo> <mn>20</mn> </mrow> </mfenced> <mo> </mo> </mrow> </semantics></math>mT/<span class="html-italic">µ</span><sub>0</sub>. Inset shows magnification of small applied fields, revealing a slight opening of the loops.</p>
Full article ">Figure 3
<p>Normalized MNP nonlinear magnetic moment for dual frequency excitation at mixing frequencies <math display="inline"><semantics> <mrow> <msub> <mi>f</mi> <mn>1</mn> </msub> <mo>+</mo> <mi>n</mi> <mo>⋅</mo> <msub> <mi>f</mi> <mn>2</mn> </msub> </mrow> </semantics></math> with <math display="inline"><semantics> <mrow> <mi>n</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>,</mo> <mn>3</mn> <mo>,</mo> <mn>4</mn> </mrow> </semantics></math> comparing experimental results (<math display="inline"><semantics> <mrow> <msub> <mi>H</mi> <mn>1</mn> </msub> <mo>=</mo> <mn>1.29</mn> </mrow> </semantics></math> mT/<span class="html-italic">µ</span><sub>0</sub>, <math display="inline"><semantics> <mrow> <msub> <mi>f</mi> <mn>1</mn> </msub> <mo>=</mo> <mn>30</mn> <mo>,</mo> <mn>534</mn> </mrow> </semantics></math> Hz, <math display="inline"><semantics> <mrow> <msub> <mi>H</mi> <mn>2</mn> </msub> <mo>=</mo> <mn>16.4</mn> </mrow> </semantics></math> mT/<span class="html-italic">µ</span><sub>0</sub>, <math display="inline"><semantics> <mrow> <msub> <mi>f</mi> <mn>2</mn> </msub> <mo>=</mo> <mn>62.95</mn> </mrow> </semantics></math> Hz) and predictions from micromagnetic MC-simulations (<math display="inline"><semantics> <mrow> <msub> <mi>H</mi> <mn>1</mn> </msub> <mo>=</mo> <mn>1</mn> <mrow> <mo> </mo> <mi>mT</mi> </mrow> <mo>/</mo> <mi>µ</mi> <mn>0</mn> <mo>,</mo> <mo> </mo> <msub> <mi>f</mi> <mn>1</mn> </msub> <mo>=</mo> <mn>40</mn> <mo>,</mo> <mn>000</mn> <mrow> <mo> </mo> <mi>Hz</mi> </mrow> <mo>,</mo> <msub> <mi>H</mi> <mn>2</mn> </msub> <mo>=</mo> <mn>16</mn> </mrow> </semantics></math> mT/<span class="html-italic">µ</span><sub>0</sub>, <math display="inline"><semantics> <mrow> <msub> <mi>f</mi> <mn>2</mn> </msub> <mo>=</mo> <mn>2000</mn> </mrow> </semantics></math> Hz).</p>
Full article ">Figure 4
<p>Normalized MNP nonlinear magnetic moment for dual frequency excitation at mixing frequencies <math display="inline"><semantics> <mrow> <msub> <mi>f</mi> <mn>1</mn> </msub> <mo>+</mo> <mi>n</mi> <mo>⋅</mo> <msub> <mi>f</mi> <mn>2</mn> </msub> </mrow> </semantics></math> with <math display="inline"><semantics> <mrow> <mi>n</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>,</mo> <mn>3</mn> <mo>,</mo> <mn>4</mn> </mrow> </semantics></math> from micromagnetic MC-simulations fitted with the thermodynamic Langevin model with fixed core diameter <math display="inline"><semantics> <mrow> <msub> <mi>d</mi> <mi>C</mi> </msub> <mo>=</mo> <mn>7.813</mn> </mrow> </semantics></math> nm.</p>
Full article ">Figure 5
<p>PDF of lognormal distribution (solid line) with <span class="html-italic">d</span><sub>0</sub> = 7.81 nm and <span class="html-italic">σ</span> = 0.346 and its reverse CDF, counted from large sizes (dashed line). The quantiles which yield 90% (99%; 99.9%) of the FMMD signal are shaded in dark grey (grey; light gray), they consist of 10.3% (31.8%; 56.2%) of particles on the large-sized tail of the distribution.</p>
Full article ">Figure A1
<p>Estimated corridor for magnetic dipole–dipole interaction energy between neighboring MNPs (shaded) versus the average interparticle distance, estimated from Equation (A2). Thermal excitation energy is shown for reference.</p>
Full article ">Figure A2
<p>PDF of lognormal distribution (solid line) with <span class="html-italic">d</span><sub>0</sub> = 7.81 nm and <span class="html-italic">σ</span> = 1.466 and its reverse CDF, counted from large sizes (dashed line). The quantiles which yield 90% (99%; 99.9%) of the FMMD signal are shaded in dark grey (grey; light gray), they consist of 3.0% (11.4%; 24.3%) of particles on the large-sized tail of the distribution.</p>
Full article ">
48 pages, 8203 KiB  
Review
An Overview of the Lagrangian Dispersion Modeling of Heavy Particles in Homogeneous Isotropic Turbulence and Considerations on Related LES Simulations
by Daniel G. F. Huilier
Fluids 2021, 6(4), 145; https://doi.org/10.3390/fluids6040145 - 8 Apr 2021
Cited by 14 | Viewed by 3972
Abstract
Particle tracking is a competitive technique widely used in two-phase flows and best suited to simulate the dispersion of heavy particles in the atmosphere. Most Lagrangian models in the statistical approach to turbulence are based either on the eddy interaction model (EIM) and [...] Read more.
Particle tracking is a competitive technique widely used in two-phase flows and best suited to simulate the dispersion of heavy particles in the atmosphere. Most Lagrangian models in the statistical approach to turbulence are based either on the eddy interaction model (EIM) and the Monte-Carlo method or on random walk models (RWMs) making use of Markov chains and a Langevin equation. In the present work, both discontinuous and continuous random walk techniques are used to model the dispersion of heavy spherical particles in homogeneous isotropic stationary turbulence (HIST). Their efficiency to predict particle long time dispersion, mean-square velocity and Lagrangian integral time scales are discussed. Computation results with zero and no-zero mean drift velocity are reported; they are intended to quantify the inertia, gravity, crossing-trajectory and continuity effects controlling the dispersion. The calculations concern dense monodisperse spheres in air, the particle Stokes number ranging from 0.007 to 4. Due to the weaknesses of such models, a more sophisticated matrix method will also be explored, able to simulate the true fluid turbulence experienced by the particle for long time dispersion studies. Computer evolution and performance since allowed to develop, instead of Reynold-Averaged Navier-Stokes (RANS)-based studies, large eddy simulation (LES) and direct numerical simulation (DNS) of turbulence coupled to Generalized Langevin Models. A short review on the progress of the Lagrangian simulations based on large eddy simulation (LES) will therefore be provided too, highlighting preferential concentration. The theoretical framework for the fluid time correlation functions along the heavy particle path is that suggested by Wang and Stock. Full article
(This article belongs to the Section Flow of Multi-Phase Fluids and Granular Materials)
Show Figures

Figure 1

Figure 1
<p>Sketch of the eddy interaction model.</p>
Full article ">Figure 2
<p>Construction principle of an inertial particle trajectory in the random walk model (RWM).</p>
Full article ">Figure 3
<p>Matrices A &amp; B used to calculate the turbulent fluid velocity seen by the particle along its path.</p>
Full article ">Figure 4
<p>Particle Lagrangian time scale (g = 0).</p>
Full article ">Figure 5
<p>Normalized variance of particle velocity (g = 0).</p>
Full article ">Figure 6
<p>Particle dispersion coefficient (g = 0).</p>
Full article ">Figure 7
<p>Normalized variance of particle velocity (g = 0).</p>
Full article ">Figure 8
<p>Particle dispersion coefficient (g = 0).</p>
Full article ">Figure 9
<p>Particle Lagrangian time scale (g ≠ 0).</p>
Full article ">Figure 10
<p>Particle dispersion coefficient (g ≠ 0).</p>
Full article ">Figure 11
<p>Normalized variance of particle velocity (g ≠ 0).</p>
Full article ">Figure 12
<p>Normalized variance of particle velocity (g = 0).</p>
Full article ">Figure 13
<p>Particle Lagrangian time scale (g = 0).</p>
Full article ">Figure 14
<p>Particle Dispersion coefficient (g = 0).</p>
Full article ">Figure 15
<p>Normalized variance of particle velocity (g ≠ 0).</p>
Full article ">Figure 16
<p>Particle dispersion coefficients (g ≠ 0).</p>
Full article ">Figure 17
<p>Particle Lagrangian time scale (g ≠ 0).</p>
Full article ">Figure 18
<p>Normalized variance of particle velocity (g ≠ 0).</p>
Full article ">Figure 19
<p>Particle dispersion coefficient (g ≠ 0).</p>
Full article ">Figure 20
<p>Particle Lagrangian time scale (g = 0).</p>
Full article ">Figure 21
<p>Normalized variance of particle velocity (g = 0).</p>
Full article ">Figure 22
<p>Particle dispersion coefficient (g = 0).</p>
Full article ">Figure 23
<p>Particle Lagrangian time scale (g ≠ 0).</p>
Full article ">Figure 24
<p>Normalized variance of particle velocity (g ≠ 0).</p>
Full article ">Figure 25
<p>Particle dispersion coefficients (g ≠ 0).</p>
Full article ">Figure A1
<p>Autocorrelation coefficients as proposed by Equation (A2).</p>
Full article ">
18 pages, 1871 KiB  
Article
A Neural Network MCMC Sampler That Maximizes Proposal Entropy
by Zengyi Li, Yubei Chen and Friedrich T. Sommer
Entropy 2021, 23(3), 269; https://doi.org/10.3390/e23030269 - 25 Feb 2021
Cited by 4 | Viewed by 2962
Abstract
Markov Chain Monte Carlo (MCMC) methods sample from unnormalized probability distributions and offer guarantees of exact sampling. However, in the continuous case, unfavorable geometry of the target distribution can greatly limit the efficiency of MCMC methods. Augmenting samplers with neural networks can potentially [...] Read more.
Markov Chain Monte Carlo (MCMC) methods sample from unnormalized probability distributions and offer guarantees of exact sampling. However, in the continuous case, unfavorable geometry of the target distribution can greatly limit the efficiency of MCMC methods. Augmenting samplers with neural networks can potentially improve their efficiency. Previous neural network-based samplers were trained with objectives that either did not explicitly encourage exploration, or contained a term that encouraged exploration but only for well structured distributions. Here we propose to maximize proposal entropy for adapting the proposal to distributions of any shape. To optimize proposal entropy directly, we devised a neural network MCMC sampler that has a flexible and tractable proposal distribution. Specifically, our network architecture utilizes the gradient of the target distribution for generating proposals. Our model achieved significantly higher efficiency than previous neural network MCMC techniques in a variety of sampling tasks, sometimes by more than an order magnitude. Further, the sampler was demonstrated through the training of a convergent energy-based model of natural images. The adaptive sampler achieved unbiased sampling with significantly higher proposal entropy than a Langevin dynamics sample. The trained sampler also achieved better sample quality. Full article
Show Figures

Figure 1

Figure 1
<p>Illustration of learning for exploring a state space. Left panel: a Langevin sampler that has poor exploration. Middle panel: our proposed method—samples travel far within the target distribution. Right panel: a sampler with a higher L2 jump than ours—the exploration is still worse. In each panel, the yellow dot on the top left is the initial point <span class="html-italic">x</span>; blue and black dots are accepted and rejected samples, respectively.</p>
Full article ">Figure 2
<p>Comparison of our method with Hamiltonian Monte Carlo (HMC) on the 20d Funnel-3 distribution. (<b>a</b>) Chain and samples of <math display="inline"><semantics> <msub> <mi>x</mi> <mn>0</mn> </msub> </semantics></math> (from neck to base direction) for HMC. (<b>b</b>) Same as (<b>a</b>) but for our learned sampler. Note, samples in (<b>a</b>) look significantly more correlated than those in (<b>b</b>), although they are plotted over a longer time scale.</p>
Full article ">Figure 3
<p>Training of the convergent energy-based model (EBM) with pixel space sampling. (<b>a</b>) Samples from replay buffer after training. (<b>b</b>) Proposal entropy of trained sampler vs. Metropolis-adjusted Langevin algorithm (MALA) early during training—note that the entropy of the learned sampler is significantly higher. (<b>c</b>) Samples from 100,000 sampling steps by the learned sampler, initialized at samples from replay buffer. Large transitions like the one in the first row are rare; this atypical example was selected for display.</p>
Full article ">Figure A1
<p>Illustrating the role of gradient information. (<b>a</b>) Comparison of the proposal entropy (up to a common constant) during training. Full: full model with gradient; LD: model variant 2; no grad: model variant 1. (<b>b</b>–<b>d</b>): Example proposal distribution of full, LD and no grad models.</p>
Full article ">Figure A2
<p>Visualizations of proposal distributions learned on the Funnel-3 distribution. Yellow dot: <span class="html-italic">x</span>. Blue dots: accepted <math display="inline"><semantics> <msup> <mi>x</mi> <mo>′</mo> </msup> </semantics></math>. Black dots: rejected <math display="inline"><semantics> <msup> <mi>x</mi> <mo>′</mo> </msup> </semantics></math>. The sampler has an accept rate of 0.6. Although not perfectly covering the target distribution, the proposed samples travel far from the previous sample and in a manner that complies with the geometry of the target distribution.</p>
Full article ">Figure A3
<p>Further results for Deep EBM. (<b>a</b>–<b>c</b>) Proposal entropy, Fréchet Inception Distance (FID) of replay buffer and energy difference during training. Results for MALA are also included in (<b>a</b>,<b>b</b>). (<b>a</b>) shows that the learned sampler achieves better proposal entropy early during training. (<b>b</b>) shows that the learned sampler converges faster than MALA. (<b>c</b>) shows the EBM remains stable with a mixture of positive and negative energy difference. (<b>d</b>) Compares the L2 expected jump of MALA and the learned sampler, plotted in log scale. It has almost the exact same shape as the proposal’s entropy plot in the main text. (<b>e</b>) More samples from sampling process of 100,000 steps with the learned sampler. (<b>f</b>,<b>g</b>) Samples from the replay buffer and the corresponding visualization of the pixel-wise variance of displacement vector <span class="html-italic">z</span> evaluated at the samples. Images in (<b>f</b>,<b>g</b>) are arranged in the same order. Image-like structures that depend on the sample of origin are clearly visible in (<b>g</b>). A MALA sampler would give uniform variance.</p>
Full article ">Figure A4
<p>Checking the correctness of samples and the EBM training process. (<b>a</b>) Comparing the dimension-wise means, standard deviations and 4th moments of samples obtained from HMC and the learned sampler on the Bayesian logistic regression datasets. Moments are matching very closely, indicating the learned sampler generates samples from the correct target distribution. (<b>b</b>) One-hundred-thousand sampling steps by MALA sampler on an EBM energy function trained by the adaptive sampler; samples are initialized from the replay buffer. Samples look plausible throughout the sampling process. This indicates that stable attractor basins are formed that are not specific to the learned sampler, and that EBM training is not biased by the adaptive sampler.</p>
Full article ">
Back to TopTop