[go: up one dir, main page]

0% found this document useful (0 votes)
38 views18 pages

Chapter 2

otro capitulo 2

Uploaded by

zopauy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views18 pages

Chapter 2

otro capitulo 2

Uploaded by

zopauy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

ANOVA, single, and

multiple factor
experiments
E X P E R I M E N TA L D E S I G N I N R

Joanne Xiong
Data Scientist
ANOVA
Used to compare 3+ groups
An omnibus test:
won't know which groups' means are different without additional post hoc testing

Two ways to implement in R:

#one
model_1 <- lm(y ~ x, data = dataset)
anova(model_1)

#two
aov(y ~ x, data = dataset)

EXPERIMENTAL DESIGN IN R
Single factor experiments
model_1 <- lm(y ~ x)

y = outcome variable
Tensile strength of different cotton fabrics
x = explanatory factor variable
Percent cotton in the fabric

EXPERIMENTAL DESIGN IN R
Multiple factor experiments
model2 <- lm(y ~ x + r + s + t)

y = outcome
ToothGrowth length
x , r , s , t = possible explanatory factor variables
How much vitamin C & delivery method

EXPERIMENTAL DESIGN IN R
Intro to Lending Club data
Lending Club is a U.S. based peer-to-peer loan company.
Data is openly available on Kaggle

Includes all loans issued from 2007-2015

Big!
890k observations and 75 variables

EXPERIMENTAL DESIGN IN R
Let's practice!
E X P E R I M E N TA L D E S I G N I N R
Model validation
E X P E R I M E N TA L D E S I G N I N R

Joanne Xiong
Data Scientist
Pre-modeling EDA
Mean and variance of outcome by variable of interest

lendingclub %>% summarise(median(loan_amnt),


mean(int_rate),
mean(annual_inc))
lendingclub %>% group_by(verification_status) %>%
summarise(mean(funded_amnt),
var(funded_amnt))

# A tibble: 3 x 3
verification_status `mean(funded_amnt)` `var(funded_amnt)`
<chr> <dbl> <dbl>
1 Not Verified 114.15 349.41953
2 Source Verified 156.14 723.53265
3 Verified 166.08 848.54561

EXPERIMENTAL DESIGN IN R
Pre-modeling EDA continued
Boxplot of outcome (y-axis) by variable of interest (x-axis).

ggplot(data = lendingclub,
aes(x = verification_status, y = funded_amnt)) +
geom_boxplot()

EXPERIMENTAL DESIGN IN R
EXPERIMENTAL DESIGN IN R
Post-modeling model validation
Residual plot
QQ-plot for normality

Test ANOVA assumptions


Homogeneity of variances

Try non-parametric alternatives to ANOVA

EXPERIMENTAL DESIGN IN R
EXPERIMENTAL DESIGN IN R
Let's practice!
E X P E R I M E N TA L D E S I G N I N R
A/B testing
E X P E R I M E N TA L D E S I G N I N R

Joanne Xiong
Data Scientist
A/B testing
A type of controlled experiment with only two variants of something, for example:
1 word different in a marketing email

Red 'buy' button on a website vs. blue button

How many consumers click through to create an account based on two different website
headers?

EXPERIMENTAL DESIGN IN R
Power and sample size in A/B tests
Calculate sample size, given some power, significance level, and effect size
Run your A/B test until you attain the sample size you calculated

EXPERIMENTAL DESIGN IN R
Lending Club A/B test

EXPERIMENTAL DESIGN IN R
Let's practice!
E X P E R I M E N TA L D E S I G N I N R

You might also like