0% found this document useful (0 votes)

25 views25 pages

AB Testing - Part I

A/B testing is a method for comparing two versions of a webpage or app to determine which performs better by randomly dividing users into control and treatment groups. The process involves monitoring performance metrics such as conversion rates and time spent on the site, and emphasizes the importance of randomization to avoid confounding factors. Additionally, hypothesis testing is used to assess the significance of the results, with a focus on the average treatment effect (ATE) and p-values to determine the likelihood of the null hypothesis.

Uploaded by

anureddy1722

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views25 pages

AB Testing - Part I

Uploaded by

anureddy1722

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

ITCS 6100: Big Data for Competitive Advantage

Data Driven Decision Making: A/B Testing

Dr. Gabriel Terejanu

https://towardsdatascience.com/a-b-testing-a-
complete-guide-to-statistical-testing-e3f1db140499
A/B Testing

• A/B testing is a method of comparing two versions of a web

page or app to see which one performs better.
• This is typically done by randomly dividing website visitors or
app users into two groups, and showing each group a different
version of the page or app.
• The version that performs better is then chosen for further use.
Example A/B testing use cases

• Navigation links

• mobile apps • Calls to action (CTAs)

• Design/layout
• website pages
• Content offer
• components on web pages
• Headline
• emails
• Email subject line
• newsletters
• Friendly email “from” address
• advertisements • Images
• text messages • Social media buttons (or other
buttons)
• Logos and taglines/slogans

https://www.oracle.com/cx/marketing/what-is-ab-testing/
A/B testing process

• Randomly subset the

users into the two
groups.
• Control group –
interacts with the current
state of your product.
• Treatment group –
interacts with the
variant(s) of the product
that you want to test.
• Monitor which product
(control or treatment)
perform best.
Metrics – quantify performance

• Metrics are performance indicators that we want to minimize or

maximize.
• They indicate how engaged the users are with your product.

• conversion rates
• signups
• subscriptions
• time spent on the site
• click-through rates
Why control and treat at the same time?

• An alternative (bad idea) is to do it sequentially

• Measure the metrics on the current version
• Make the change to your product
• Then measure the metrics on the new version
• Flaw – misses factors such as external events, temporal trends
and seasonality.
• Differences in control and treatment performance will be
impacted by these factors that have nothing to do with the
proposed improvements of the product.
• Very challenging to estimate just the contribution of the
proposed improvements.
Why control and treat at the same time?

Oprah calls
Kindle "her
new favorite
thing"
BAD idea

Oprah calls
Kindle "her
new favorite
thing"
GOOD idea
The importance or randomization

• Random assignments isolate the impact of product changes.

• (Bad idea) Assignments criteria that are not random may
introduce confounders which makes it challenging to estimate
the contribution of the proposed changes. Example:
assignments based on time, region, demographics etc.
• Confounders are factors that affect the relationship between
the treatment and the performance metric. They introduce bias
into the A/B testing. Example type of device used:
• Version A optimized for desktop users
• Version B optimized for mobile users
• Even though version B may have a higher conversion rate, it
could be because more mobile users are visiting the site,
rather than because version B of the page is better.
Multivariate test

• Multivariate test examine the effect of multiple variables on the

outcome of interest.
• Unlike traditional A/B testing, multivariate testing allows you to
test multiple elements of a page or product simultaneously, by
creating multiple versions of the page that vary by more than
one element.
• For example, you can test different headlines, images, and call-
to-action buttons all at once.
• Full factorial design tests all possible combinations of the
multiple variables.
• This requires higher traffic as it depends on the number of
variables that you are testing. Most websites and email
campaigns would struggle to find the traffic to support that.
Use cases for A/B testing

• Use A/B testing when users are impacted individually

• Test changes that can directly impact their behavior

• DO NOT use A/B testing when the problem exhibits network

effect among users
• Challenging to untangle the impact of the test
Class Experiment

What is your price?

Analyzing the results - Example

Time spent on a webpage [seconds]:

A: [ 8.50 0.93 2.18 6.83 9.14 9.12 1.23 6.47 9.60 9.26 ]
B: [ 6.93 2.16 3.07 4.06 1.76 7.78 2.09 6.75 10.24 8.79 ]
Analyzing the results – Sufficient?

Time spent on a webpage [seconds]:

mean(A) = 6.33 > mean(B) = 5.36

A: [ 8.50 0.93 2.18 6.83 9.14 9.12 1.23 6.47 9.60 9.26 ]
B: [ 6.93 2.16 3.07 4.06 1.76 7.78 2.09 6.75 10.24 8.79 ]
Repeat with different users multiple times

Central Limit Theorem (CLT): for a large enough sample size,

the distribution of the sample mean will be approximately
normal, regardless of the underlying distribution of the
population from which the sample is drawn.
What is the error in our mean estimates?

Standard error (SE) is a measure of the

variability of a statistic, such as the mean,
estimated from a sample.

It is calculated as the standard deviation of

the sampling distribution of a statistic.

It is used to indicate the precision of the

estimate of a population parameter.
What is the error in our mean estimates?

Are we still confident to say that

mean(A) > mean(B) ?

(these are sample means and not

the true population means)
Sample mean vs True mean

• Denote sample_mean(A) = mean(A)

• true_mean(A) is not known but rather a random variable

sample_mean(A) ~ Normal( true_mean(A), SE(A)^2 )

sample_mean(A)−true_mean(A)
~ Student-t(n-1)
SE(A)
Interested in Average Treatment Effect (ATE)

• ATE is a measure of the difference in outcomes between a

group of individuals who receive a treatment and a group of
individuals who do not receive the treatment
true_ATE = true_mean(B) - true_mean(A)
• We have access to an unbiased estimate given by our
observations:
sample_ATE = sample_mean(B) - sample_mean(A)

sample_ATE ~ Normal(true_ATE, SE(A)^2+SE(B)^2)

sample_ATE−true_ATE
~ Student-t(2n-2)
SE(A)^2+SE(B)^2
We want to perform Hypothesis testing

• In statistics, a hypothesis is a statement or claim about a

property of a population or a relationship between two or more
populations.
• There are two types of hypotheses:
• Null hypothesis (H0): This is the default assumption that there
is no significant difference or relationship between the groups or
variables being studied. It is usually a statement of equality (e.g.
the means of two groups are equal)
• Alternative hypothesis (Ha): This is the hypothesis that is put
forward as an alternative to the null hypothesis. It is usually a
statement of inequality (e.g. the means of two groups are not
equal) or a statement of difference (e.g. the mean of a sample is
different from a specified value).
Hypothesis testing

Null hypothesis H0: true_ATE = 0

Alternative hypothesis Ha: true_ATE > 0

• The hypothesis we want to test is if Ha is “likely” true.

• Two outcomes are possible
• Reject H0 and accept Ha because of sufficient evidence in
favor of Ha
• Do not reject H0 because of insufficient evidence
• This does not mean that the null hypothesis is true!
Null Hypothesis

H0: true_ATE = 0

true_ATE = 0
Hypothesis testing

Significance level – usually 0.05

https://web.northeastern.edu/dummit/teaching_su20_3081/pr
p-value obstat_4_hypothesis_testing_v1.01.pdf

t-critic t-stat

sample_ATE
t-statistic = ~ Student-t(2n-2)
SE(A)^2+SE(B)^2

p-value is a probability value that is used in statistical hypothesis testing to indicate the
level of evidence against a null hypothesis. The p-value represents the probability of
obtaining a test statistic as extreme or more extreme than the one observed, assuming
that the null hypothesis is true. Usually reject null hypothesis if p-value < 0.05.
Python Example A/B testing
Test anchoring bias – Mini-assignment

• Anchoring bias, also known A

as anchoring effect, is a
cognitive bias that occurs
when an individual relies too
heavily on an initial piece of
information, known as the
"anchor," when making
subsequent judgments.
B
• For example, if an individual
is asked to estimate the
number of doctors in a city
and is given an anchor of
1000, they will likely provide
a higher estimate than if
they were given an anchor
of 100.
H0: mean(B) = mean(A)
Ha: mean(B) > mean(A)

A - B Testing - Data Science Guide
No ratings yet
A - B Testing - Data Science Guide
12 pages
Module 3
No ratings yet
Module 3
79 pages
MKT 367 - Statistical Testing - Student Notes
No ratings yet
MKT 367 - Statistical Testing - Student Notes
46 pages
25 A - B Testing Concepts You Must Know - Interview Refresher
No ratings yet
25 A - B Testing Concepts You Must Know - Interview Refresher
7 pages
AB Testing - Part II
No ratings yet
AB Testing - Part II
20 pages
Hypothesis
No ratings yet
Hypothesis
6 pages
AB Test Notes
No ratings yet
AB Test Notes
7 pages
7B Inferential Statistics
No ratings yet
7B Inferential Statistics
32 pages
Spss Hypotheses
No ratings yet
Spss Hypotheses
28 pages
A/B Testing Guide for Job Interviews
No ratings yet
A/B Testing Guide for Job Interviews
13 pages
4 ABTesting
No ratings yet
4 ABTesting
18 pages
Statistics Interview Questions & Answers For Data Scientists
No ratings yet
Statistics Interview Questions & Answers For Data Scientists
43 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
28 pages
Business Research Methods: Hypothesis Testing
No ratings yet
Business Research Methods: Hypothesis Testing
31 pages
Session 2 On Hypothesis Testing
No ratings yet
Session 2 On Hypothesis Testing
13 pages
Stats Unit5
No ratings yet
Stats Unit5
26 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
75 pages
Unit Iii
No ratings yet
Unit Iii
19 pages
A - B Testing
No ratings yet
A - B Testing
27 pages
Business Research Methods: Hypothesis Testing
No ratings yet
Business Research Methods: Hypothesis Testing
29 pages
CMSC 177 Statistical Experiments and Significance Testing
No ratings yet
CMSC 177 Statistical Experiments and Significance Testing
31 pages
Week 6 - Result and Analysis 2 (UP)
No ratings yet
Week 6 - Result and Analysis 2 (UP)
7 pages
Statistics Interview Questions
No ratings yet
Statistics Interview Questions
20 pages
AB Testing Cheat Sheet
No ratings yet
AB Testing Cheat Sheet
13 pages
Najmul Ztest
No ratings yet
Najmul Ztest
11 pages
Fundamentals of Business Statistics: Hypothesis Testing Dr. P.K.Viswanathan
No ratings yet
Fundamentals of Business Statistics: Hypothesis Testing Dr. P.K.Viswanathan
26 pages
MKT3602 Week+8 Slides
No ratings yet
MKT3602 Week+8 Slides
47 pages
Introduction To Statistical Hypothesis Testing in R
No ratings yet
Introduction To Statistical Hypothesis Testing in R
8 pages
408 Mid
No ratings yet
408 Mid
7 pages
Final STAT193 Cheat Sheet
No ratings yet
Final STAT193 Cheat Sheet
2 pages
T-Test Analysis in Quantitative Data
No ratings yet
T-Test Analysis in Quantitative Data
50 pages
A534011718 23 2025 Unit5
No ratings yet
A534011718 23 2025 Unit5
64 pages
A/B Testing: Mazher Khan - IIT (BHU) - B.Tech (DR-2)
No ratings yet
A/B Testing: Mazher Khan - IIT (BHU) - B.Tech (DR-2)
4 pages
Hypothesis Testing in Business
No ratings yet
Hypothesis Testing in Business
26 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
26 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
31 pages
DA Unit II - II
No ratings yet
DA Unit II - II
47 pages
Class 3 After
No ratings yet
Class 3 After
28 pages
MKT3600 - L08 - Hypothesis Testing
No ratings yet
MKT3600 - L08 - Hypothesis Testing
44 pages
Lecture 9 Hypothesis Testing
No ratings yet
Lecture 9 Hypothesis Testing
31 pages
Lean Six Sigma Green Belt: Hypothesis Testing
No ratings yet
Lean Six Sigma Green Belt: Hypothesis Testing
77 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
34 pages
SPSS Guide: Tests of Differences: One-Sample T-Test
No ratings yet
SPSS Guide: Tests of Differences: One-Sample T-Test
11 pages
Inferential Statistics
No ratings yet
Inferential Statistics
45 pages
Hypothesis Testing Basics
No ratings yet
Hypothesis Testing Basics
41 pages
Week 7 - Computer Lab 2
No ratings yet
Week 7 - Computer Lab 2
33 pages
Lecture 5 Inferential Stat
No ratings yet
Lecture 5 Inferential Stat
58 pages
BI Lec 6 - Hypothesis Testing
No ratings yet
BI Lec 6 - Hypothesis Testing
22 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
8 pages
U02Lecture05 - Statistical Experiments and Significance Testing
No ratings yet
U02Lecture05 - Statistical Experiments and Significance Testing
51 pages
Lean Six Sigma Green Belt Certification Training Manual CSSC 2018 06b (1) (301 350)
No ratings yet
Lean Six Sigma Green Belt Certification Training Manual CSSC 2018 06b (1) (301 350)
50 pages
Hypothesis Testing Lecture
No ratings yet
Hypothesis Testing Lecture
28 pages
Research Glossary
No ratings yet
Research Glossary
20 pages
02 ABTest
No ratings yet
02 ABTest
3 pages
A B Testing
100% (1)
A B Testing
28 pages
Group Comparisons
No ratings yet
Group Comparisons
18 pages
Data Warehouse
No ratings yet
Data Warehouse
9 pages
Assignment2 Q1
No ratings yet
Assignment2 Q1
3 pages
Assignment III 2
No ratings yet
Assignment III 2
2 pages
User Manual
No ratings yet
User Manual
6 pages
Slide 1: Title Slide (1 Minute) : UNICORN: Runtime Provenance-Based Detector For Advanced Persistent Threats
No ratings yet
Slide 1: Title Slide (1 Minute) : UNICORN: Runtime Provenance-Based Detector For Advanced Persistent Threats
10 pages
Slide 1: Title Slide (1 Minute) : UNICORN: Runtime Provenance-Based Detector For Advanced Persistent Threats
No ratings yet
Slide 1: Title Slide (1 Minute) : UNICORN: Runtime Provenance-Based Detector For Advanced Persistent Threats
9 pages
Homework 4: Shortest Path & Pattern Matching
No ratings yet
Homework 4: Shortest Path & Pattern Matching
8 pages
Sprint Reports
No ratings yet
Sprint Reports
4 pages
Bcse499j One-semester-Internship PJT 1.0 67 Bcse499j One Semester Internship 14 Credits
No ratings yet
Bcse499j One-semester-Internship PJT 1.0 67 Bcse499j One Semester Internship 14 Credits
2 pages
Attachment & Body Dissatisfaction Study
No ratings yet
Attachment & Body Dissatisfaction Study
7 pages
The Effect of Lean Methods and Tools On The Environmental Performance of Manufacturing Organizations
No ratings yet
The Effect of Lean Methods and Tools On The Environmental Performance of Manufacturing Organizations
11 pages
Pedoman Komentar
No ratings yet
Pedoman Komentar
6 pages
QFD for Banking Customer Satisfaction
No ratings yet
QFD for Banking Customer Satisfaction
14 pages
Marital Satisfaction Scale
No ratings yet
Marital Satisfaction Scale
4 pages
Data Literacy Homework Guide
No ratings yet
Data Literacy Homework Guide
4 pages
University of Okara
No ratings yet
University of Okara
5 pages
Research Proposal Presentation: Bsba Marketing
No ratings yet
Research Proposal Presentation: Bsba Marketing
16 pages
Introduction To Cultural Resources Archaeology Second Edition 2127490
No ratings yet
Introduction To Cultural Resources Archaeology Second Edition 2127490
100 pages
Projectmanagementsurvey KPMG NZ
100% (1)
Projectmanagementsurvey KPMG NZ
52 pages
Bourne-2008-Relationship Between Handedness and Lateralization
No ratings yet
Bourne-2008-Relationship Between Handedness and Lateralization
8 pages
University Students & Academic Burnout
No ratings yet
University Students & Academic Burnout
46 pages
Assignment
No ratings yet
Assignment
10 pages
The Effect of Leadership, Organizational Culture, and Job Rotation On Teacher Performance in Public Senior High Schools in Regional 1 Education Office of South Jakarta Administrative City
No ratings yet
The Effect of Leadership, Organizational Culture, and Job Rotation On Teacher Performance in Public Senior High Schools in Regional 1 Education Office of South Jakarta Administrative City
8 pages
Dependency and Position of Rural Elderly People in Family
No ratings yet
Dependency and Position of Rural Elderly People in Family
6 pages
Ideal Parent Figure Method in The Treatment of Complex Posttraumatic Stress Disorder Related To Childhood Trauma A Pilot Study
No ratings yet
Ideal Parent Figure Method in The Treatment of Complex Posttraumatic Stress Disorder Related To Childhood Trauma A Pilot Study
13 pages
Business Math & Statistics Lecture
No ratings yet
Business Math & Statistics Lecture
39 pages
Process Evaluation for Managers
No ratings yet
Process Evaluation for Managers
4 pages
Contractor Selection
No ratings yet
Contractor Selection
25 pages
Risk Assessment Techniques: Quality Engineering May 2014
No ratings yet
Risk Assessment Techniques: Quality Engineering May 2014
6 pages
SSRN Id2158921
No ratings yet
SSRN Id2158921
55 pages
What Is Market Research?
No ratings yet
What Is Market Research?
4 pages
R Programming: Descriptive Stats Guide
No ratings yet
R Programming: Descriptive Stats Guide
3 pages
A Study On HR Policies and Practices in Textile Industry With Reference To Coimbatore Dr. S. Usha & M. Purushothaman
No ratings yet
A Study On HR Policies and Practices in Textile Industry With Reference To Coimbatore Dr. S. Usha & M. Purushothaman
5 pages
Survey Inst Level 5
No ratings yet
Survey Inst Level 5
5 pages
Literary Criticism - Writing Checklist
No ratings yet
Literary Criticism - Writing Checklist
2 pages
Gantt Chart
100% (2)
Gantt Chart
2 pages
Nyu Thesis Database
100% (3)
Nyu Thesis Database
7 pages
Course Outline
No ratings yet
Course Outline
4 pages

AB Testing - Part I

Uploaded by

AB Testing - Part I

Uploaded by

ITCS 6100: Big Data for Competitive Advantage

Data Driven Decision Making: A/B Testing

Dr. Gabriel Terejanu

• A/B testing is a method of comparing two versions of a web

• mobile apps • Calls to action (CTAs)

• Randomly subset the

• Metrics are performance indicators that we want to minimize or

• An alternative (bad idea) is to do it sequentially

• Random assignments isolate the impact of product changes.

• Multivariate test examine the effect of multiple variables on the

• Use A/B testing when users are impacted individually

• DO NOT use A/B testing when the problem exhibits network

What is your price?

Time spent on a webpage [seconds]:

Time spent on a webpage [seconds]:

mean(A) = 6.33 > mean(B) = 5.36

Central Limit Theorem (CLT): for a large enough sample size,

Standard error (SE) is a measure of the

It is calculated as the standard deviation of

It is used to indicate the precision of the

Are we still confident to say that

(these are sample means and not

• Denote sample_mean(A) = mean(A)

• true_mean(A) is not known but rather a random variable

sample_mean(A) ~ Normal( true_mean(A), SE(A)^2 )

• ATE is a measure of the difference in outcomes between a

sample_ATE ~ Normal(true_ATE, SE(A)^2+SE(B)^2)

• In statistics, a hypothesis is a statement or claim about a

Null hypothesis H0: true_ATE = 0

Alternative hypothesis Ha: true_ATE > 0

• The hypothesis we want to test is if Ha is “likely” true.

Significance level – usually 0.05

• Anchoring bias, also known A

You might also like