I love things like this, which quite simply demonstrate how to do something many people think they already know how to do. In this instance, it’s how to sort-by-ratings for the upteen number of sites which depend on user ratings to bubble things up to the top. And it’s not like these are unsophisticated sites doing this – Amazon, e.g., gets it wrong.
The simple solution is actually ‘complex’ – the lower bound of Wilson score confidence interval for a Bernoulli parameter. As Miller points out: “We need to balance the proportion of positive ratings with the uncertainty of a small number of observations. Fortunately, the math for this was worked out in 1927 by Edwin B. Wilson. What we want to ask is: Given the ratings I have, there is a 95% chance that the ‘real’ fraction of positive ratings is at least what? Wilson gives the answer.”
But in an age of programming, this complex solution is itself simple.
This reminds me of a problem I’ve kicked around in my head for years now, and I’m avoiding going to a statistician for a ‘true’ answer (which I imagine to be devlishly complex, but which probably is not). It’s to figure out how to maximize the joint probability of winning the maximum amount of money on a 50/50 bet, and not losing that money. That is, if you have a $100, do you bet $1 one hundred times? Or $100 once? Or $25 four times? Puzzle puzzle.