Simulating different split test methodologies:
Suppose you are trying to maximise the number of clicks. You start off with two versions and evenly split impressions between them until each has 1000 impressions.
Next you perform some kind of maths to determine what to do next. Depending on what the maths tells you, you either continue to run the test or you pause one variation and introduce another one. Then you wait 1000 impressions and repeat...
After 10,000 impressions how many clicks will you have?
Obviously, the answer depends a bit on what maths you do. The chart below shows the result of simulating 100 runs of the above game with six different strategies:
1: If 99% confident then pause the worst ad and add a new one
2. Same as 1. but with 95% confidence
3. 90% confidence
4. 80% confidence
5. Just pick the advert that is performing the best when you look (best observed CTR)
6. Cheat (because this is a simulation we know which version is actually better - so pick that one)
The results of the simulation show that, as long as you can keep up by producing enough new variations, waiting for statistical significance does not improve outcomes.
This result is very surprising to me. I'm sure it is at least partly because my simulation is overly simplistic; if you can precisely define what aspects of real life I'm missing here then I will have a go at coding it up.
Want more stuff like this? I'm starting a mailing list/newsletter http://www.eanalytica.com/subscribe/