the change we are going to test: student clicks the ‘start free trial’ button → there
will be a pop-up window to ask how much time (in hours) the student will
dedicate for this course per week? (free trial screener) :
1. >= 5 hours per week → go to the checkout page directly
2. < 5 hours per week → friendly message that hints the student might not
suitable for this free trial, they could access the course material directly
instead → if they persist, then it will take the students to the checkout page as
well
control group: click ‘start free trial’ → no screener, directly to checkout page
experiment group: click ‘start free trial’ → screener, offer two options 1.continue to
enroll the free trial 2.access the course material without enrolling the free trial
Unit of diversion: cookie; A cookie uniquely identifies a user's browser session.
When a user first visits the page, a cookie is generated, they are randomly assigned
to either the control group (no screener) or the treatment group (with screener).
Invariant metric:
Number of cookies: since our unit of diversion is cookie, this metric should be
comparable between these two groups.
Number of user-ids (Did not work):
● The intervention directly affects whether a user proceeds to enrollment and
receives a user-id. This means the number of user-ids will likely differ
between the control and treatment groups.
● Because the number of user-ids is influenced by the screener, it cannot
serve as a stable baseline to compare the two groups before the
intervention.
Number of Clicks: The change (free trial screener) happens after students click the
‘start free trial’ button, since there is nothing change at the user interface (like the
button size/color etc), so the number of clicks should be comparable between the
experiment and control group.
Click-through-probability: That is, number of unique cookies to click the "Start free
trial" button divided by number of unique cookies to view the course overview page.
Gross conversion(Did not work): That is, number of user-ids to complete checkout
and enroll in the free trial divided by number of unique cookies to click the "Start free
trial" button. Not Invariant: The screener directly impacts the number of user IDs
enrolled (numerator). Users who click "Start free trial" in the control group
automatically proceed to checkout, while those in the treatment group might be
discouraged by the screener, leading to fewer enrollments (treatment group)
compared to the control group (who bypass the screener). This difference isn't due to
the overall pool of users (reflected by clicks) but by the intervention itself (screener).
Retention (Did not work): That is, number of user-ids to remain enrolled past the
14-day boundary (and thus make at least one payment) divided by number of
user-ids to complete checkout. Not Invariant (Indirectly): While the screener
doesn't directly affect users who have already enrolled, it might indirectly impact the
pool of users who enroll (denominator) as explained above for gross conversion.
This creates an uneven baseline for comparing retention rates between groups.
Net conversion (Did not work): That is, number of user-ids to remain enrolled past
the 14-day boundary (and thus make at least one payment) divided by the number of
unique cookies to click the "Start free trial" button.
Selection Bias: The screener introduces selection bias. Users in the treatment
group who click "Start free trial" and then choose to enroll after seeing the screener
might be more likely to be those with a higher initial interest (due to the screener
potentially filtering out some users). This creates an uneven starting point for the two
groups (control vs. treatment) when comparing net conversion rates. It might show
a higher net conversion rate in the treatment group simply because the
screener filtered out users less likely to convert in the first place.
Evaluation metrics:
1.retention(did not work): Since the screener might discourage users with less time
commitment from enrolling in the first place, the treatment group might have a
higher retention rate simply because it started with a more engaged pool of
users.
Denominator: number of unique cookies to click the "Start free trial" button, is comparable
between the control and experimental group:
2.net conversion(did not work):
selection bias: happens when the group of users taking part in your A/B test doesn't
accurately represent your overall audience.
The denominator is comparable for each group; these are the people who show
initial interest;
control group: All users who click "Start free trial" proceed directly to checkout,
representing a wider range of user motivations and commitment levels;
Treatment Group: Users who see the screener before checkout. This group
might have a higher concentration of users with a stronger initial interest due
to the screener potentially filtering out some users with less interest. So, in the
treatment group, the net conversion rate might seem higher compared to the control
group, but it's because the screener discourages some users from enrolling.
3.gross conversion(works!): numerator: number of user-ids to complete checkout
and enroll in the free trial; By focusing on gross conversion (treatment group) and
avoiding metrics heavily influenced by the screener's selection bias (net
conversion), you gain a clearer picture of the screener's true impact on the free
trial process.
Measuring variability:
Baseline values help set a
benchmark for what is
"normal" or expected in the
current system without any
changes.
The evaluation metric i choose here is gross conversion:
Gross conversion: That is, number of user-ids to complete checkout and enroll in the free
trial divided by number of unique cookies to click the "Start free trial" button. (dmin= 0.01)
5000 cookies visiting the course overview page, the denominator:
5000*0.08=400; the probability is given click, p of enrolling = 0.20625
standard deviation =
Sizing:
1.Choosing Number of Samples given Power: how many pageviews total
(across both groups) would you need to collect to adequately power the
experiment?
Use an alpha of 0.05 and a beta of 0.2; the baseline conversion rate for gross
conversion here is 20.625% and the minimum detectable effect is 1%, then use the
online calculator → 25835 (clicks we need, based on the calculator we use, this
number is for one group) → 25835*2 is for two groups → to calculate the pageview
we needed → 25835*2/0.08 = 645875
2.Choosing Duration vs. Exposure: What percentage of Udacity's traffic would you
divert to this experiment (assuming there were no other experiments you wanted to
run simultaneously)? Is the change risky enough that you wouldn't want to run on all
traffic?
According to our estimates of baseline value for metrics, Unique cookies to view
course overview page per day is 40000, and from our calculation, the pageview
we need is 645875 (for both groups);
1.if we run on all traffic: 645875/40000 = 16.15 days
2.if we run on 50% of traffic: 645875/20000 = 32.29 days
3.if we run on 25% of traffic: 645875/10000 = 64.59 days