a/b testing - udacity project

6/10/2018

I recently took the course A/B Testing on Udacity. It provides a final project to help me understand the concepts and know the statistical concepts.

A/B testing, at its most basic, is a way to compare two versions of something to figure out which performs better. -- Amy Gallo, A Refresher on A/B Testing, HBR

It is a very common method for web analytics and product improvement. From this Udacity project, I had an overview of this method...

Project Overview: Udacity tested a change on its homepage. If the student clicked "start free trial", they were asked how much time they had available to devote to the course. If the student indicated 5 or more hours per week, they would be taken through the checkout process as usual. If they indicated fewer than 5 hours per week, a message would appear indicating that Udacity courses usually require a greater time commitment for successful completion, and suggesting that the student might like to access the course materials for free. At this point, the student would have the option to continue enrolling in the free trial, or access the course materials for free instead. This screenshot shows what the experiment looks like.

Step 1: Choosing Invariant Metrics (for Sanity Checks later, these metrics shouldn't be changed during the test, or the experiment setup is incorrect)

Number of cookies: That is, number of unique cookies to view the course overview page. (dmin=3000) -- number of cookies should not be affected by the experiment.
Number of user-ids: That is, number of users who enroll in the free trial. (dmin=50)
Number of clicks: That is, number of unique cookies to click the "Start free trial" button (which happens before the free trial screener is trigger). (dmin=240) -- number of clicks should not be affected by the experiment.
Click through probability (CTR): That is, number of unique cookies to click the "Start free trial" button divided by number of unique cookies to view the course overview page. (dmin=0.01)
Gross conversion: That is, number of user-ids to complete checkout and enroll in the free trial divided by number of unique cookies to click the "Start free trial" button. (dmin= 0.01)
Retention: That is, number of user-ids to remain enrolled past the 14day boundary (and thus make at least one payment) divided by number of user-ids to complete checkout. (dmin=0.01)
Net conversion: That is, number of user-ids to remain enrolled past the 14day boundary (and thus make at least one payment) divided by the number of unique cookies to click the "Start free trial" button. (dmin= 0.0075)

Step 2: Choosing Evaluation Metrics (the metrics to evaluate the experiment design)

Number of cookies: That is, number of unique cookies to view the course overview page. (dmin=3000)
Number of user-ids: That is, number of users who enroll in the free trial. (dmin=50)
Number of clicks: That is, number of unique cookies to click the "Start free trial" button (which happens before the free trial screener is trigger). (dmin=240)
Click through probability (CTR): That is, number of unique cookies to click the "Start free trial" button divided by number of unique cookies to view the course overview page. (dmin=0.01)
Gross conversion: That is, number of user-ids to complete checkout and enroll in the free trial divided by number of unique cookies to click the "Start free trial" button. (dmin= 0.01) -- this number should be less, because the total number of students complete checkout and enroll is less.
Retention: That is, number of user-ids to remain enrolled past the 14day boundary (and thus make at least one payment) divided by number of user-ids to complete checkout. (dmin=0.01) -- this number should be bigger, because the total number of students complete the courses is more.
Net conversion: That is, number of user-ids to remain enrolled past the 14day boundary (and thus make at least one payment) divided by the number of unique cookies to click the "Start free trial" button. (dmin= 0.0075) -- this number should be bigger, because the total number of students complete checkout and enroll is more after the experiment.

Step 3: Calculating Standard Deviation
*The formula: sqrt(p*(1-p)/N)

Retention:
p = 0.53
n = 5000 * 0.08 * 0.20625 = 82.5
sd = sqrt(0.53*(1-0.53)/82.5) = 0.0549

Net Conversion:
p = 0.1093125
n = 5000 * 0.08 = 400
sd = sqrt(0.1093125*(1-0.1093125)/400) = 0.0156

Step 4: Select a evaluation metrics and calculate the page views the experiment needs

Choose Net conversion
How many page views will you need? (Use alpha = 0.05 and beta = 0.2)
online calculator
1. Gross Conversion: Probability of enrolling, given click: 20.625% base conversion rate, 1% min d.
Samples needed: 25,835
2. Retention: Probability of payment, given enroll: 53% base conversion rate, 1% min d.
Samples needed: 39,115
3. Net Conversion: Probability of payment, given click: 10.93125% base conversion rate, 0.75% min d.
Samples needed: 27,413

Control Group and Experiment Group Page View Needed:
1. According to Gross Conversion: 2*[25835/0.08] = 645875
2. According to Retention：2*[39115/(0.08 * 0.20625)] = 47412121
3. According to Net Conversion: 2*(27413/0.08) = 685325
Choose the bigger number, however, 47 million is too much, then choose the second large number - according to Net Conversion, 685325

Step 5: Choosing Duration and Exposure (how long the experiment lasts)

Number of Page Views: 685325 (step 4)
Fraction of traffic exposed: 0.5
Length of experiment: 35 (685235/（40000*0.5）= 34.26)

Step 6: Sanity Checks (use invariant metrics -- step 1)
With 95% Confidence Interval

Number of Cookies: sd = sqrt(0.5*0.5/(344660+345543))=0.0006
margin of error = 1.96 * sd = 0.001796
Confidence Interval : [0.5 – 0.001796, 0.5 + 0.001796 ] = [0.4988, 0.5012]
= 0.5006 -- Within the range, pass sanity check
Number of Clicks: sd = sqrt(0.5*0.5/())=0.00209
margin of error = 1.96 * sd = 0.00411
Confidence Interval : [0.5 – 0.00411, 0.5 + 0.00411 ] = [0.4959, 0.5041]
= 0.5005 -- Within the range, pass sanity check
If there are metrics don't locate on the range, then doesn't pass sanity check, which means you need to go through the experiment again.

Step 7: Effect Size Tests

Do you use Bonfferoni correction: No (because the metrics are correlated)
Gross Conversion: Control Gross Conversion: 3785/17293 = 0.2189
Experiment retention: 3423/17260 = 0.1983
--> difference between control and experiment：0.2189 - 0.1983 = 0.0206
Pooled Probability: (3785+3423)/(17293+17260) = 0.2086
Standard Error: sqrt(0.2086*(1-0.2086)*(1/ 17293+1/17260)) = 0.0044
--> margin of error with 95% CI：1.96*0.0044 = 0.0086
--> CI: [0.0206-0.0086, 0.0206+0.0086] -> [0.0120, 0.0292]
Statistically significant: Yes, because CI doesn’t contain zero.
Practically significant: Yes, because CI doesn’t contain Dmin value 0.01.
Retention: Control retention: 2033/3785 = 0.5371
Experiment retention: 1945/2779 = 0.6998
--> difference between control and experiment：0.6998 - 0.5371 = 0.1627
Pooled Probability: (2033+1945)/(3785+2779) = 0.6060
Standard Error: sqrt(0.6060*(1-0.6060)*(1/ 3785+1/2779)) = 0.0122
--> margin of error with 95% CI：1.96*0.0122 = 0.0239
--> CI: [0.6060-0.0239, 0.6060+0.0239] -> [0.5821, 0.6299]
Statistically significant: Yes, because CI doesn’t contain zero.
Practically significant: Yes, because CI doesn’t contain Dmin value 0.01.
Net Conversion: Control Net Conversion: 2033/17293 = 0.11756
Experiment Net Conversion: 1945/17260 = 0.11268
--> difference between control and experiment：0.11268-0.11756 = -0.0049
Pooled Probability: (2033+1945)/(17293+17260) = 0.1151
Standard Error: sqrt(0.1151*(1-0.1151)*(1/ 17293+1/17260)) = 0.0034
--> margin of error with 95% CI：1.96*0.0034 = 0.0067
--> CI: [-0.0049- 0.0067, -0.0049+0.0067] -> [-0.0116, 0.0018]
Statistically significant: No, because CI does contain zero.
Practically significant: No, because CI does contain Dmin value (+/-0.0075).

Step 8: Sign Tests

Online Calculator 2: www.graphpad.com/quickcalcs/binomial1.cfm
Gross Conversion:

Number of Success - when Experiment Group greater than Control Group: 4
Number of Failure - when Experiment Group less than Control Group: 19
Number of Days: 23
Probability: 0.5
Two tailed p-value: 0.0026

Net Conversion:

Number of Success - when Experiment Group greater than Control Group: 10
Number of Failure - when Experiment Group less than Control Group: 13
Number of Days: 23
Probability: 0.5
Two tailed p-value: 0.6776

Summary: Because in the Effect Size Tests, the net conversion doesn't have statistically significance nor the Practically significance, which means the experiment the payment number during the 14 days doesn't increase. Therefore, the experiment shouldn't launch to a bigger group.

If you would like to take the course and have a try, here is the link: https://www.udacity.com/course/ab-testing--ud257
And it's FREE!

0 Comments

Data Science Blog

a/b testing - udacity project

Leave a Reply.

Author

Archives

Categories