Digg, YouTube, Slashdot, and many other sites employ user voting to generate collaborative rankings for their content. This is a great idea, but simply counting votes is a horrible way to do it. Fortunately, the fix is simple.
A basic ranking system allows each user to add a vote to the items they like, then builds a “top rated” list by counting votes. The problem with this scheme is that users can only vote on items they’ve seen, and they are far more likely to see items near the top of the list. In fact, anything off the front page may get essentially no views at all — and therefore has virtually no chance of rising to top.
This is rather serious if the content being rated is serious. It’s fine for Digg to have weird positive-feedback popularity effects, but it’s not fine if we are trying to decide what goes on the front page of a news site. Potentially important stories might never make it to the top simply because they started a little lower in the rankings for whatever reason.
Slightly more sophisticated systems allow users to rate items on a scale, typically 1-5 stars. This seems better, but still introduces weird biases. Adding up the stars assigned by all users to a single item doesn’t work, because users still have to see an item to vote on it. Averaging all the ratings assigned to a single item doesn’t work either, because it can push something permanently to the bottom of the list, if the first user to view it rates it only one star.
There are lots of subtle hacks that one can make to try to fix the system, but it turns out there might actually be a right way to do things.
If every item was rated by every user, there would be no problem with popularity feedback effects.
That’s completely impractical with thousands or even millions of items. But we can actually get close to the same result with much less work, if we take random samples. Like a telephone poll, the opinion of a small group of randomly selected people will be an accurate indicator, to within a few percent, of the result that we would get if we asked everyone.
In practice, this would mean adding a few select “sampling” stories to each front page served, different every time. Items can then by ranked simply their average rating, with no skewing due to who got to the front page first. (In fact, basic sampling math will tell us which items have the most uncertain ratings and need to be seen with the highest priority.) In effect, we are distributing the work of rating a huge body of items across a huge body of users — true collaborative filtering, using sampling methods to remove the “can’t see it can’t vote on it” bias.
This is not an end-all solution to the problem of distributed agenda-setting. User ratings are not necessarily the ideal criterion for measuring “relevance.” One problem is that not every user is going to take the trouble to assign a rating, so you will only be sampling from particularly motivated individuals. Other metrics such as length of time on page might be better — did this person read the whole thing?
Even more fundamentally, it’s not clear that popularity, however defined, is really the right way to set a news agenda in the public interest.
However, any attempt to use user polling for collaborative agenda setting needs to be aware of basic statistical bias issues. Sampling is a simple and very well-developed way to think about such problems.
7 thoughts on “Rating Items by Number of Votes: Ur Doin It Rong”
All I ask is that I be invited to be a beta tester!
This is brilliant work, Jonathan. I think you can provide some motivations for voting by keeping analytics of when people bother to vote as well as how often they trend with the eventual consensus. A vote on a site like this is essentially a guess as to what other people find interesting, and I’m guessing people would find it interesting to know how often and about what topics they’re right.
There’s also the truism that people without much of a strong opinion can safely keep silent on their ambiguous feelings without biasing the sample.