ANOVA, single, and
multiple factor
experiments
E X P E R I M E N TA L D E S I G N I N R
Joanne Xiong
Data Scientist
ANOVA
Used to compare 3+ groups
An omnibus test:
won't know which groups' means are different without additional post hoc testing
Two ways to implement in R:
#one
model_1 <- lm(y ~ x, data = dataset)
anova(model_1)
#two
aov(y ~ x, data = dataset)
EXPERIMENTAL DESIGN IN R
Single factor experiments
model_1 <- lm(y ~ x)
y = outcome variable
Tensile strength of different cotton fabrics
x = explanatory factor variable
Percent cotton in the fabric
EXPERIMENTAL DESIGN IN R
Multiple factor experiments
model2 <- lm(y ~ x + r + s + t)
y = outcome
ToothGrowth length
x , r , s , t = possible explanatory factor variables
How much vitamin C & delivery method
EXPERIMENTAL DESIGN IN R
Intro to Lending Club data
Lending Club is a U.S. based peer-to-peer loan company.
Data is openly available on Kaggle
Includes all loans issued from 2007-2015
Big!
890k observations and 75 variables
EXPERIMENTAL DESIGN IN R
Let's practice!
E X P E R I M E N TA L D E S I G N I N R
Model validation
E X P E R I M E N TA L D E S I G N I N R
Joanne Xiong
Data Scientist
Pre-modeling EDA
Mean and variance of outcome by variable of interest
lendingclub %>% summarise(median(loan_amnt),
mean(int_rate),
mean(annual_inc))
lendingclub %>% group_by(verification_status) %>%
summarise(mean(funded_amnt),
var(funded_amnt))
# A tibble: 3 x 3
verification_status `mean(funded_amnt)` `var(funded_amnt)`
<chr> <dbl> <dbl>
1 Not Verified 114.15 349.41953
2 Source Verified 156.14 723.53265
3 Verified 166.08 848.54561
EXPERIMENTAL DESIGN IN R
Pre-modeling EDA continued
Boxplot of outcome (y-axis) by variable of interest (x-axis).
ggplot(data = lendingclub,
aes(x = verification_status, y = funded_amnt)) +
geom_boxplot()
EXPERIMENTAL DESIGN IN R
EXPERIMENTAL DESIGN IN R
Post-modeling model validation
Residual plot
QQ-plot for normality
Test ANOVA assumptions
Homogeneity of variances
Try non-parametric alternatives to ANOVA
EXPERIMENTAL DESIGN IN R
EXPERIMENTAL DESIGN IN R
Let's practice!
E X P E R I M E N TA L D E S I G N I N R
A/B testing
E X P E R I M E N TA L D E S I G N I N R
Joanne Xiong
Data Scientist
A/B testing
A type of controlled experiment with only two variants of something, for example:
1 word different in a marketing email
Red 'buy' button on a website vs. blue button
How many consumers click through to create an account based on two different website
headers?
EXPERIMENTAL DESIGN IN R
Power and sample size in A/B tests
Calculate sample size, given some power, significance level, and effect size
Run your A/B test until you attain the sample size you calculated
EXPERIMENTAL DESIGN IN R
Lending Club A/B test
EXPERIMENTAL DESIGN IN R
Let's practice!
E X P E R I M E N TA L D E S I G N I N R