8000 done! · aejensen/TrendinessOfTrends@e124c9b · GitHub
[go: up one dir, main page]

Skip to content

Commit

Permalink
done!
Browse files Browse the repository at this point in the history
  • Loading branch information
aejensen committed Oct 3, 2020
1 parent cc4dc83 commit e124c9b
Show file tree
Hide file tree
Showing 4 changed files with 8 additions and 8 deletions.
10 changes: 5 additions & 5 deletions manuscript/ToT.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -181,7 +181,7 @@ Y^\ast(t^\ast) \mid t^\ast, \mathbf{Y}, \mathbf{t}, \bm{\Theta} \sim N\left(\mu_
\end{align}
where $Y^\ast$ is a new random variable predicted at some arbitrary time point $t^\ast$.

Proposition \ref{prop:GPposterior} grants the foundation for the rest of the methodological development. While the Gaussian Process priors might seem restrictive it has the computational advantage that the posterior of $(f, df, d^2\!f)$ is characterized by the finite dimensional joint distributions which in turn are given by the mean and covariance functions. Further, for fixed $\Theta$ there exists one-step update formulas for the posteriors that can be useful and efficient for real-time or online applications. A theoretical result motivating the applicability is that the model possesses the universal approximation property meaning that it can approximate any continuous function uniformly on a closed interval of the real line to any desired tolerance given sufficient data [@micchelli2006universal].
Proposition \ref{prop:GPposterior} grants the foundation for the rest of the methodological development. While the Gaussian Process prior might seem restrictive it has the computational advantage that the posterior of $(f, df, d^2\!f)$ is characterized by the finite dimensional joint distributions which in turn are given by the mean and covariance functions. Further, for fixed $\Theta$ there exists one-step update formulas for the posteriors that can be useful and efficient for real-time or online applications. A theoretical result motivating the applicability is that the model possesses the universal approximation property meaning that it can approximate any continuous function uniformly on a closed interval of the real line to any desired tolerance given sufficient data [@micchelli2006universal].

In the following two subsections we show how the Trend Direction Index and the Expected Trend Instability can be expressed under the data generating model in Equation (\ref{eq:generatingProcess}) using the results of Proposition \ref{prop:GPposterior}.

Expand Down Expand Up @@ -350,7 +350,7 @@ We note that with our model being implemented in Stan, efficient approximate lea

To assess the performance of our method we performed a simulation study. We generated $r = 1,\ldots, 10,000$ random Gaussian Processes on the unit interval with zero mean and the Squared Exponential (SE) covariance function (see Equation (\ref{eq:covariancefunctions})) with parameters $\alpha = 1$ and $\rho = \frac{\sqrt{3}}{{2\pi}}$ in 15 different scenarios in which we varied the number of observation points ($n = 25, 50, 100$) and the measurement noise ($\sigma = 0.025, 0.05, 0.1, 0.15, 0.2$). The Supplementary Material shows 50 random sample paths for each scenario.

In each $r$ simulation we know the true latent functions $(f_r, df_r)$, and by fitting our model we obtain estimates $\widehat{f_r^\text{GP}}$ and $\widehat{df_r^\text{GP}}$ corresponding to the posterior expectations in Proposition \ref{prop:GPposterior}, and $\mathrm{TDI}_r$ and $d\mathrm{ETI}_r$ from Propositions \ref{prop:TDIposterior} and \ref{prop:ETIposterior}. We compare these estimates to the truths using two different measures: an integrated residual and the squared $L^2$ norm. The integrated residuals are defined as $\int_0^1 (f_r(t) - \widehat{f_r^\text{GP}}(t))\mathrm{d}t$ and similarly for $\widehat{df_r^\text{GP}}$. For the Trend Direction Index the cumulative residual is defined as $\int_0^1 (1(df_r(t) > 0) - \mathrm{TDI}_r(t))\mathrm{d}t$ where $1$ denotes the indicator function. For the Expected Trend Instability the cumulative residual is defined as $\int_0^1(N_r(t) - d\mathrm{ETI}_r(t))\mathrm{d}t$ where $N_r(t)$ is the càdlàg counting process that jumps with a value of 1 every time $df_r$ has a root on the interval. If our estimates are unbiased we expect these integrated residuals to have zero mean across the simulations. The squared $L^2$ norms are defined in a similar manner for all the quantities as e.g., $\int_0^1 (f_r(t) - \widehat{f_r^\text{GP}}(t))^2\mathrm{d}t$ and reflect the variability of the estimates.
In each of the $r$ simulations we know the true latent functions $(f_r, df_r)$, and by fitting our model we obtain estimates $\widehat{f_r^\text{GP}}$ and $\widehat{df_r^\text{GP}}$ corresponding to the posterior expectations in Proposition \ref{prop:GPposterior}, and $\mathrm{TDI}_r$ and $d\mathrm{ETI}_r$ from Propositions \ref{prop:TDIposterior} and \ref{prop:ETIposterior}. We compare these estimates to the truths using two different measures: an integrated residual and the squared $L^2$ norm. The integrated residuals are defined as $\int_0^1 (f_r(t) - \widehat{f_r^\text{GP}}(t))\mathrm{d}t$ and similarly for $\widehat{df_r^\text{GP}}$. For the Trend Direction Index the cumulative residual is defined as $\int_0^1 (1(df_r(t) > 0) - \mathrm{TDI}_r(t))\mathrm{d}t$ where $1$ denotes the indicator function. For the Expected Trend Instability the cumulative residual is defined as $\int_0^1(N_r(t) - d\mathrm{ETI}_r(t))\mathrm{d}t$ where $N_r(t)$ is the càdlàg counting process that jumps with a value of 1 every time $df_r$ has a root on the interval. If our estimates are unbiased we expect these integrated residuals to have zero mean across the simulations. The squared $L^2$ norms are defined in a similar manner for all the quantities as e.g., $\int_0^1 (f_r(t) - \widehat{f_r^\text{GP}}(t))^2\mathrm{d}t$ and reflect the variability of the estimates.

For comparison we employed the Trend Filtering method implemented in the R package \texttt{genlasso} [@genlasso] on the same simulated data and reported similar measures for its estimated mean, $\widehat{f^\text{TF}}$, and derivative, $\widehat{df^\text{TF}}$, using 10-fold cross-validation of the penalty parameter. We only compare the estimates of the latent mean and its derivative between the two approaches as Trend Filtering does not provide a probability distribution for the derivative.

Expand Down Expand Up @@ -417,9 +417,9 @@ A naive approach for trying to answer questions Q1 and Q2 is to apply sequential

\begin{table}[htbp]
\center
\begin{tabular}{c|rrrrr} \hline
2018 & 2017 & 2016 & 2015 & 2014 & 2013\\
p-value & 0.074 & \textbf{0.020} & 0.495 & \textbf{0.012} & 0.576\\ \hline
\begin{tabular}{c|rrrrr}
& 2017 & 2016 & 2015 & 2014 & 2013\\ \hline
p-value & 0.074 & \textbf{0.020} & 0.495 & \textbf{0.012} & 0.576\\
\end{tabular}
\caption{p-values obtained from $\chi^2$-tests of independence between the proportion of smokers in 2018 and the five previous years. Numbers in bold are statistically significant differences at the $5\%$ level.}
\label{tab:chisqtests}
Expand Down
8000
Binary file modified manuscript/ToT.pdf
Binary file not shown.
Binary file modified manuscript/final tex version/ToT.pdf
Binary file not shown.
6 changes: 3 additions & 3 deletions manuscript/final tex version/ToT.tex
Original file line number Diff line number Diff line change
Expand Up @@ -1042,9 +1042,9 @@ \subsection*{Trend of proportion of Danish

\begin{table}[htbp]
\center
\begin{tabular}{c|rrrrr} \hline
& 2017 & 2016 & 2015 & 2014 & 2013\\
p-value & 0.074 & \textbf{0.020} & 0.495 & \textbf{0.012} & 0.576\\ \hline
\begin{tabular}{c|rrrrr}
& 2017 & 2016 & 2015 & 2014 & 2013\\ \hline
p-value & 0.074 & \textbf{0.020} & 0.495 & \textbf{0.012} & 0.576
\end{tabular}
\caption{p-values obtained from $\chi^2$-tests of independence between the proportion of smokers in 2018 and the five previous years. Numbers in bold are statistically significant differences at the $5\%$ level.}
\label{tab:chisqtests}
Expand Down

0 comments on commit e124c9b

Please sign in to comment.
0