I love things like this, which quite simply demonstrate how to do something many people think they already know how to do. In this instance, it’s how to sort-by-ratings for the upteen number of sites which depend on user ratings to bubble things up to the top. And it’s not like these are unsophisticated sites doing this – Amazon, e.g., gets it wrong.
The simple solution is actually ‘complex’ – the lower bound of Wilson score confidence interval for a Bernoulli parameter. As Miller points out: “We need to balance the proportion of positive ratings with the uncertainty of a small number of observations. Fortunately, the math for this was worked out in 1927 by Edwin B. Wilson. What we want to ask is: Given the ratings I have, there is a 95% chance that the ‘real’ fraction of positive ratings is at least what? Wilson gives the answer.”
So the solution is really something like:
But in an age of programming, this complex solution is itself simple.
This reminds me of a problem I’ve kicked around in my head for years now, and I’m avoiding going to a statistician for a ‘true’ answer (which I imagine to be devlishly complex, but which probably is not). It’s to figure out how to maximize the joint probability of winning the maximum amount of money on a 50/50 bet, and not losing that money. That is, if you have a $100, do you bet $1 one hundred times? Or $100 once? Or $25 four times? Puzzle puzzle.
Comments are disabled for this post