0% found this document useful (0 votes)

15 views4 pages

R Session Bootstrapping Randomisation 2024

Uploaded by

Marinette Ravenclaw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views4 pages

R Session Bootstrapping Randomisation 2024

Uploaded by

Marinette Ravenclaw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

EC 306

KI 2024

R – Session
Customising computer-intensive procedures

The programming capabilities of R can be combined with available statistical functions to carry out (1) non-
traditional statistical analyses and (2) simulations that allow you to explore the suitability of different
statistical techniques for specific data-analysis situations.

1. Bootstrapping
Resampling to construct uncertainty measures, e.g., 95% Confidence Interval, around parameter estimates

What do you do when you wish to present confidence intervals around a measure, such as the mean of a
sample, but the sample is not normally distributed? What do you do when you fit statistical models to your
data but the model assumptions are violated and therefore, while the parameter estimates are okay,
uncertainty measures that are distribution-based (e.g., standard error of model coefficents, confidence
intervals of model coefficients) cannot be trusted? What if you try out all the different models you have at
hand but still model assumptions are violated? The most robust option available (i.e. the one that is likely to
give you the most robust uncertainty measures) is non-parametric bootstrapping (for p values, you would use
permutation tests – described in the previous section).

The idea behind bootstrapping is that if you do not have any other information about a population, the
distribution of values in your sample is the best guide to the distribution in the population. The main
assumption is that your sample is a random sample of the population and a good representation of the
population. To get confidence limits around the statistic that you calculate from your sample, resample the
sample (with replacement). This approximates what would happen if your population was resampled. (The
population is to the sample what the sample is to the bootstrap samples)

The basic procedure is to generate N samples (with replacement) from the original dataset; perform the
analysis of interest on each sample and store the parameter estimates of interest; and then find the 95%
confidence intervals from the distribution of the parameter estimates.

The procedure described above is a non-parametric bootstrapping procedure, which makes the fewest
assumptions about the nature of the data. There are other bootstrapping procedures that make assumptions
about the nature of stochasticity in the data. REF explains these different procedures well.

The R script “randomisation_non-parametric_bootstrap_code.R” has examples of calculating bootstrapped

confidence intervals for (1) mean of a data set (2) model coefficients for a linear model with continuous
predictors (3) model coefficients for a linear model with continuous and categorical predictors

Calculating bootstrapped confidence intervals for coefficients in a mixed model requires that re-sampling
should be carried out keeping in mind the grouped structure (random effects) of the data. If you are interested
in this, please get in touch and I can send you example code for this.

2. Permutation (Randomization) tests

Statistical tests of hypotheses by creating null distribution of test statistics

We often have to deal with data that do not conform to assumptions of parametric tests. Let us suppose that
we are interested in the effect of some variable X on a trait Y whose distribution we are unsure of, and the
assumptions of ordinary regression analysis are likely to be violated, increasing Type I error. Let us also
suppose that we are unsure of the underlying distribution of Y either because there isn’t enough information
in our data or for theoretical reasons. Therefore, parametric tests other than ordinary regressions are also
ruled out.

1
EC 306
KI 2024

So, we turn to randomisation tests: our overall goal is to create a distribution of the test statistic (e.g., slope
of the line describing the relationship between X and Y) under the null hypothesis that there is no effect of X
on Y. We can then compare our observed test statistic with the null distribution we have created to assess its
significance.

To create a null distribution of the test statistic, we shuffle Y across X values. This breaks up any pattern
between X and Y and any significant result could only have been obtained by chance. If we repeat this
procedure many times, we get a distribution of the test statistic under the null hypothesis of no effect. We can
now compare the actual test statistic from the original data with the created null distribution to see how likely
we are to get our actual statistic under the null hypothesis (i.e., what is the probability of getting the observed
statistic or more extreme outcomes under the null hypothesis?).

The R script “randomisation_non-parametric_bootstrap_code.R” has examples of performing permutation

tests for (1) comparing two means (2) model coefficients for a linear model

3. Monte-Carlo simulations
You may sometimes wish to check whether the use of a particular statistical procedure is appropriate when
analysing your data. For example, does an ordinary regression perform well even when you expect error
variance to be strongly unequal? Can one use ordinary regression if the response variable is highly skewed?

To answer these questions, this would be the general procedure:

1. Generate artificial data according to the process you think might be occurring
2. Run your chosen analysis (or analyses) using these data
3. Repeat steps 1 and 2 many times to measure
a. Type I error (proportion of times P≤0.05 when the null hypothesis is true)
b. Statistical power (proportion of times P≤0.05 when the null hypothesis is false)
c. Confidence interval around parameter estimates to check precision

We’ll address the first question above to illustrate this approach. Both questions above are very simple
questions, but this procedure can be used to investigate much more complex situations.

Does ordinary regression perform well when error variance is strongly unequal?
One of the assumptions of simple regression analysis is that error variance is equal at all points along the
regression line. In reality, error variance often increases with the predictor variable and/or with the dependent
variable. Here, we will investigate a different cause of unequal error variance: that created by different
sample sizes.

Suppose we were interested in the relationship between the number of helpers in a group and offspring
growth rate in cooperatively breeding babblers. As an estimate of the growth rate of offspring in a group we
take an average of all offspring measured in each group. However, groups differ in the number of offspring
measured, say from 1 to 7. Clearly, an average taken from 7 offspring is more reliable (is subject to less
uncertainty or error) as an estimate for that group than is an average taken from 2 offspring. In other words,
given that there is a relationship between y and x, a y-value calculated by averaging a large sample is likely
to deviate less from the predicted line than is a y-value calculated by averaging a small sample. Simple
regression, however, permits no such systematic change in error variance. Are there serious consequences?

Our aim is to simulate the process that we think might underlie our data and test different potential statistical
techniques to see which one is most appropriate for our data situation. In the current example, we will
generate data in which the sample size for different observations varies, analyse these data using two
techniques (simple regression and weighted regression), and see which of these two techniques is better at
recovering the original simulated relationship. Imagine that the true intercept is 2g/hr and each additional
helper adds 0.3 to this (so the slope is 0.3g/hr/helper).

2
EC 306
KI 2024

1. Generate data
helper <- c(1,2,2,3,3,5,6,9,10,12) ## suppose these are the no. of
helpers
s.size <- c(1,1,1,1,2,3,4,6,7,7)
## no. of offspring measured in each group

To generate averages from different sample sizes we write a small loop

helper.fun <- function(int, slope, helper, s.size){
growth <- numeric(length=length(helper))
for (i in 1:length(helper)){
growth[i] <- mean(rnorm(s.size[i], int+slope*helper[i], sd=2))
}
growth
}
growth <- helper.fun(int=2, slope=0.3, helper=helper, s.size=s.size)
## these are our growth data
plot(helper, growth)

There are two ways we could analyse these data: by simple regression or by weighted regression. Weighted
regression can account for changes in error variance if we specify a series of weights. In this example, we
want to weight by sample size because we have more confidence in averages from larger samples.

2. Analyse data
summary(lm(growth~helper)) ## simple regression
summary(lm(growth~helper, weights=s.size)) ## weighted regression

Write a function to generate and analyse data (combine 1 and 2)

unequal.var.fun <- function(int, slope, helper, s.size, wts){
growth <- helper.fun(int=int, slope=slope, helper=helper,
s.size=s.size)
simple.p <- coef(summary(lm(growth~helper)))[8] ## P from ordinary
regr.
wtd.p <- coef(summary(lm(growth~helper, weights=wts)))[8]
## P from wtd regr.
simple.b <- coef(summary(lm(growth~helper)))[2] ## slope from ordinary
regr.
wtd.b <- coef(summary(lm(growth~helper, weights=wts)))[2]
## slope from wtd regr.
c(simple.p, wtd.p, simple.b, wtd.b) ## output of function}

3. Iterate 1 and 2 many times

iter <- 1000
pval.simple <- numeric(length=iter) ## create vectors to store
p.values
pval.wtd <- numeric(length=iter)
slope.simple <- numeric(length=iter) ## create vectors to store slope
values
slope.wtd <- numeric(length=iter)

for (n in 1:iter){
temp <- unequal.var.fun(int=2, slope=0.3, helper=helper,

3
EC 306
KI 2024

s.size=s.size, wts=s.size)
pval.simple[n] <- temp[1]
pval.wtd[n] <- temp[2]
slope.simple[n] <- temp[3]
slope.wtd[n] <- temp[4]
}

length(pval.simple[pval.simple<=0.05])/iter
## prop. of runs where P<=0.05:simple
length(pval.wtd[pval.wtd<=0.05])/iter ## prop. of runs where P<=0.05: wtd
quantile(slope.simple, probs=c(0.025, 0.975)) ## 95% CI of the slope
quantile(slope.wtd, probs=c(0.025, 0.975))

What do you conclude from these simulations? Will you use weighted
regression or simple regression to analyse your data?

STA3030 Note
No ratings yet
STA3030 Note
141 pages
Resampling Methods A Practical Guide To Data Analysis Digital EPUB Download
100% (17)
Resampling Methods A Practical Guide To Data Analysis Digital EPUB Download
16 pages
Unit5 Randomsampling
No ratings yet
Unit5 Randomsampling
21 pages
Resampling Methods A Practical Guide To Data Analysis New Edition PDF
No ratings yet
Resampling Methods A Practical Guide To Data Analysis New Edition PDF
14 pages
Modelling in R
No ratings yet
Modelling in R
47 pages
Statistical Methods II
No ratings yet
Statistical Methods II
284 pages
00 Lab Notes
No ratings yet
00 Lab Notes
8 pages
Greenwood Intermediate Statistics With R
No ratings yet
Greenwood Intermediate Statistics With R
429 pages
HW 9 Bootstrap, Jackknife, and Permutation Tests
No ratings yet
HW 9 Bootstrap, Jackknife, and Permutation Tests
7 pages
CMSC 177 Statistical Experiments and Significance Testing
No ratings yet
CMSC 177 Statistical Experiments and Significance Testing
31 pages
Statistical Computing by Using R
100% (1)
Statistical Computing by Using R
11 pages
Unit4 R
No ratings yet
Unit4 R
21 pages
A First Course in Experimental Design
No ratings yet
A First Course in Experimental Design
193 pages
Bootstrap T Test
No ratings yet
Bootstrap T Test
19 pages
Sokal Rohlf 2012 Contents
No ratings yet
Sokal Rohlf 2012 Contents
12 pages
Summary DOE
No ratings yet
Summary DOE
82 pages
How To Measure Anything 3rd - Ed Ch.9 Examples
No ratings yet
How To Measure Anything 3rd - Ed Ch.9 Examples
16 pages
Lecture Notes Statistics
100% (2)
Lecture Notes Statistics
117 pages
Statistical Tests Guide
No ratings yet
Statistical Tests Guide
15 pages
Eco Stats Data Analysis in Ecology From T Tests To Multivariate Abundances Readable PDF Download
100% (11)
Eco Stats Data Analysis in Ecology From T Tests To Multivariate Abundances Readable PDF Download
17 pages
cs447 - Tool Using Simulation To Test A Hypothesis
No ratings yet
cs447 - Tool Using Simulation To Test A Hypothesis
4 pages
GLMM
No ratings yet
GLMM
74 pages
Advanced Data Analysis Notes
No ratings yet
Advanced Data Analysis Notes
376 pages
STAT359 Study Guide
No ratings yet
STAT359 Study Guide
7 pages
ANOVA Analysis in R Guide
No ratings yet
ANOVA Analysis in R Guide
7 pages
BES - R Lab 4
No ratings yet
BES - R Lab 4
6 pages
IAR Lecture 3
No ratings yet
IAR Lecture 3
6 pages
Stat Technical Notes
0% (1)
Stat Technical Notes
430 pages
Type I and Type II Errors Type I Error
No ratings yet
Type I and Type II Errors Type I Error
7 pages
Lab Kamal Sir
No ratings yet
Lab Kamal Sir
5 pages
Presentation15 (One Way ANOVA Random Effects Model)
No ratings yet
Presentation15 (One Way ANOVA Random Effects Model)
37 pages
Slidesc53 5
No ratings yet
Slidesc53 5
68 pages
2024HW2Boot GOF Eng
No ratings yet
2024HW2Boot GOF Eng
4 pages
DOE by Leonard C. Onyiah
No ratings yet
DOE by Leonard C. Onyiah
839 pages
What Teachers Should Know About The Bootstrap Resa
No ratings yet
What Teachers Should Know About The Bootstrap Resa
84 pages
Homework 3 R Tutorial: How To Use This Tutorial
No ratings yet
Homework 3 R Tutorial: How To Use This Tutorial
8 pages
Intro To Probability and Statistics
No ratings yet
Intro To Probability and Statistics
147 pages
R Manual PDF
No ratings yet
R Manual PDF
78 pages
Math631 Course Notes
No ratings yet
Math631 Course Notes
281 pages
Permutation, Parametric, and Bootstrap Tests of Hypotheses - 3rd Edition EPUB DOCX PDF Download
No ratings yet
Permutation, Parametric, and Bootstrap Tests of Hypotheses - 3rd Edition EPUB DOCX PDF Download
17 pages
Hypothesis Test Errors
No ratings yet
Hypothesis Test Errors
16 pages
Advanced Statistical Methods Using R Notes
No ratings yet
Advanced Statistical Methods Using R Notes
55 pages
STATISTICS
No ratings yet
STATISTICS
6 pages
Stat2 Textbook
100% (4)
Stat2 Textbook
1,656 pages
STAB27
No ratings yet
STAB27
51 pages
Homework Example
No ratings yet
Homework Example
8 pages
Stat 231 Course Notes
100% (1)
Stat 231 Course Notes
326 pages
A Guide To Doing Statistics in Second Language Research Using R
No ratings yet
A Guide To Doing Statistics in Second Language Research Using R
320 pages
Maharashtra Nature Park's Role in Environmental Education
No ratings yet
Maharashtra Nature Park's Role in Environmental Education
5 pages
Pharmaceutical Quality Audits A Review
No ratings yet
Pharmaceutical Quality Audits A Review
9 pages
Political Cynicism in 2016 Philippine First-Time Voters
No ratings yet
Political Cynicism in 2016 Philippine First-Time Voters
10 pages
Jurnal Tesis Inggris
No ratings yet
Jurnal Tesis Inggris
14 pages
Grade 11 Job Evaluation Lesson
No ratings yet
Grade 11 Job Evaluation Lesson
4 pages
Audit Sampling - ISA 530
No ratings yet
Audit Sampling - ISA 530
13 pages
PROPOSAL
No ratings yet
PROPOSAL
32 pages
SPE-184814-MS Long Term Conductivity vs. Point Specific Conductivity
No ratings yet
SPE-184814-MS Long Term Conductivity vs. Point Specific Conductivity
9 pages
Process Safety Api
No ratings yet
Process Safety Api
6 pages
Research Format CityHigh
No ratings yet
Research Format CityHigh
12 pages
Proposal Defense Questions
No ratings yet
Proposal Defense Questions
2 pages
Concise Planning Manual For The Recovery of Aggregate Quarries
No ratings yet
Concise Planning Manual For The Recovery of Aggregate Quarries
26 pages
Pffalmctpac Final
No ratings yet
Pffalmctpac Final
156 pages
Form1 French Exam
No ratings yet
Form1 French Exam
4 pages
03 Lalit+Kumar+Singh
No ratings yet
03 Lalit+Kumar+Singh
5 pages
Assignment On Good Research
No ratings yet
Assignment On Good Research
7 pages
2.factors Affecting Work Life Balance of Employees in Indian Manufacturing Companies An Empirical Analysis Using Structural Equation Modeling (SEM)
No ratings yet
2.factors Affecting Work Life Balance of Employees in Indian Manufacturing Companies An Empirical Analysis Using Structural Equation Modeling (SEM)
5 pages
Repeated Measures ANOVA Guide
No ratings yet
Repeated Measures ANOVA Guide
8 pages
Microenterprise Resource Strategies
No ratings yet
Microenterprise Resource Strategies
11 pages
Pearson Science 7 10 Year 8 Teacher Companion Sample Pages Sample
No ratings yet
Pearson Science 7 10 Year 8 Teacher Companion Sample Pages Sample
7 pages
Romanian Traffic Locus of Control Study
No ratings yet
Romanian Traffic Locus of Control Study
43 pages
10 1108 - Ecam 01 2021 0052
No ratings yet
10 1108 - Ecam 01 2021 0052
23 pages
Stat Chapter 1
No ratings yet
Stat Chapter 1
30 pages
2007 AP English Language Synthesis Essay
100% (3)
2007 AP English Language Synthesis Essay
8 pages
GMATHS Waste Management Study
100% (2)
GMATHS Waste Management Study
72 pages
Ch. 8. Data Analysis and Results
No ratings yet
Ch. 8. Data Analysis and Results
27 pages
MATLAB AMDF Pitch Detection Guide
No ratings yet
MATLAB AMDF Pitch Detection Guide
5 pages
Business Analytics Résumé
No ratings yet
Business Analytics Résumé
48 pages
UE23MA242A - Unit-2 - Class-21 - 22 - Confidence Intervals Continuation Small Samples, Difference Between Means
No ratings yet
UE23MA242A - Unit-2 - Class-21 - 22 - Confidence Intervals Continuation Small Samples, Difference Between Means
43 pages
Intro to Random Sampling Methods
No ratings yet
Intro to Random Sampling Methods
26 pages

R Session Bootstrapping Randomisation 2024

Uploaded by

R Session Bootstrapping Randomisation 2024

Uploaded by

EC 306

The R script “randomisation_non-parametric_bootstrap_code.R” has examples of calculating bootstrapped

2. Permutation (Randomization) tests

The R script “randomisation_non-parametric_bootstrap_code.R” has examples of performing permutation

To answer these questions, this would be the general procedure:

To generate averages from different sample sizes we write a small loop

Write a function to generate and analyse data (combine 1 and 2)

3. Iterate 1 and 2 many times

You might also like