Tag: collaborative filtering

What is news when the audience is editor?

January 15, 2011January 16, 2011collaborative filtering, journalism, news agenda2 Comments

This is a paper I wrote in December 2009. I’ve decided to post it now, partially because it contains a previously unreported 30-day content comparison of Digg versus the New York times. Looking back on this work, I think that its greatest weakness is an under-appreciation of the importance of production processes in determining what gets reported and how. In other words, I believe now that the intense pressure of daily deadlines shapes the news far more than external influences such as political and commercial pressures — at least in countries where the press is relatively free. Also available as a pdf.

Abstract
There are now several websites which allow users to assemble news content from around the internet by means of voting systems. The result is a new kind of front page that directly reflects what the audience believes to be salient, as opposed to what the editorial staff of a newsroom believes the audience should know. Content analyses of such sites show that they have little overlap with mainstream media agendas (5% in a previous study). In fact, many of the items selected by users would not traditionally be considered “news” at all. This paper examines the shift from editor to audience agendas in the context of previous theories of news production, discusses existing content analysis work on the subject, and reports on a new 30 day study of Digg.com versus NYTimes.com.

Introduction
No news organization can cover everything. Traditionally, it is ultimately the editor of a news publication who decides what is newsworthy: what stories reporters will follow, and what stories will be published. It has been considered part of the value of a news organization to determine what its audiences need to know about.

It’s never been entirely clear how professional journalists decide which events are worth reporting, out of all the events taking place in the world. Neither has it been obvious how editorial choices relate to the audience’s personal judgments about what is important, but such questions were largely theoretical before the advent of the web. “I own a newspaper, you do not” was always the implicit end to discussions about who got to decide what was news.

Today, publishing is near-free and the news package has been disaggregated. An online audience member can select single stories that interest them, without reading or even really being aware of the traditional news package. Alongside this disaggregation we find a new class of online applications that re-aggregate content from multiple sources. Readers vote on pages from across the web, and the top-rated items are displayed on the aggregator’s home page.

News consumers are literally tearing the world’s newspapers apart and re-assembling them to fit their own agendas, including lots of content not traditionally considered news at all.

This paper examines what we can learn about the online audience’s judgment not only of what is important but what is news at all, and how it differs from that of traditional newsrooms. I review previous work on “news values” and “news agenda” in professional journalism, look at measurements of what audiences view online, and report on my own 30 day quantitative study of Digg as compared to the New York Times.

Features of the audience-generated agenda
Continue reading What is news when the audience is editor?

What’s the point of social news?

November 17, 2010August 9, 2023collaborative filtering, journalism, reporting, social media, social news16 Comments

According to Facebook, social news seems to be mostly about knowing what all my friends are reading. I’m not so sure. But I think there really is something to the idea of “social news” for journalism, and for journalism product design.

I take “social” to mean “interacting with other people.” That’s a fundamental technical possibility of digital media, as basic to the internet as moving pictures are to television. I’m not sure that anyone really knows yet what to do with that possibility, but happily there are already at least two very well-developed uses. Maybe social news isn’t about “friends” at all, but about filtering and news-gathering.

Twitter is really a filter
I get most of both my general and special interest news from Twitter. I rarely go to the home page of a news site, or use a news app. It’s not the tweets themselves that are informative, but the links within them to articles posted elsewhere. I follow a large set of people with varied interests, and some of them work for news organizations, but most do not. My twitter feed is faster, more diverse, and available across more platforms (all of them) than any one news organization’s output.

This doesn’t mean that Twitter is a perfect news delivery system, but to me it’s proven better than just about anything else at getting me the news mix that I want, and keeping me interested in the world at large. (Admittedly, I follow people I’ve met in other countries, so yeah, travel is way better than Twitter for that.) I am not alone in this opinion. The structure of follower relationships among Twitter users suggests that it’s more of a news network than a social network.

The usefulness of Twitter for news has a lot to do with certain basic design choices. First, a tweet is really as short as you can get and still communicate a complete concept, so it’s basically an extended headline. Second, Twitter differs from Facebook in that relationships can be unidirectional: I don’t need anyone’s permission to follow them, and they may not know or care that I do. Following someone on Twitter also differs from following a blog via RSS because most tweets refer to someone else’s work through a link — Twitter is more about re-publishing than publishing. Retweets also include the name of the original tweeter, which enables discovery of interesting new curators.

Filtering is much more valuable than it used to be, in this era of information overload, and these properties make Twitter an excellent filtering system. There are several news products based almost entirely on displaying links tweeted by the people you follow, such as The Twitter Tim.es and Flipboard. The medium that Twitter invented — global public short messaging with links — has already been endlessly replicated and will be with us forever.

There is a sense in which news organizations have always seen filtering as a big part of their value. One of the duties of the professional editor is to decide what you need to see. But at least one thing has upset that model irretrievably: the internet is not a broadcast medium. While each person reads an identical copy of the Times and watches an identical CNN broadcast, there’s no reason my internet has to look the same as your internet. A small team of human editors can’t personalize the headlines for every reader, so that leaves algorithmic filtering, such as Google News’ personalization features, or social filtering, such as Twitter.

The point is, there’s probably something to learn from how Twitter uses social relationships to route information. As the Nieman Journalism Lab said: “social news isn’t about the people you know so much as the people with whom you share interests.” To put this in terms of the product I wish I had: when I use your news product, I want to be able to follow the recommended reading of other members of the audience, if they so allow. Also, can I follow a particular reporter? And does your product integrate with the other methods I already use for getting information, so I don’t have to choose?

Social networks are great for reporting
Audience-journalist collaboration, blah blah blah. If the idea that professionals are no longer the only players in news is new to you, see blogging and Wikipedia. But a news organization probably has to look at this from a different angle. For me, the core idea of social news-gathering is that the audience is, or could be, an extension of the news organization’s source network.

Hopefully, a newsroom knows about interesting developments before anyone else, and then verifies and publicizes them, but that’s getting near impossible when anyone can publish, and when virality can amplify primary sources without the involvement of a media organization. We don’t know yet very much about collective news-gathering, but there are promising directions. It seems like maybe there are two broad categories of breaking news: public events that anyone could have witnessed, and private events initially known only to privileged observers.

Social media is now routinely used to augment reporting of public events. There are entire units in news organizations dedicated to getting stories from the audience, often under the awkward rubric of “user-generated content.” But why sift for events online when you can give your audience the tools to give you the story directly? Right now if I see a plane land in a river, I tweet it. Wouldn’t a news organization prefer that I send my eye-witness photo to the UGC editor instead? To this end, several mobile news apps include the ability to submit pictures. CNN’s iReport app and website is probably the best developed of these. Ideally, I could send that breaking news tweet to the newsroom and to my friends at the same time, within the same application.

Fast reporting of private events has always depended on having the right sources. A well established source may call the reporter or send an email when something newsworthy happens. Someone with a much looser connection to the organization may not, and perhaps this is an opportunity for social news tools. When someone knows something — or can talk about something — you want them to contact the newsroom first. The potential of this weak-tie news sourcing approach hasn’t really been studied, to my knowledge, but I imagine that it would require, at minimum, a trusted brand, an easily-reachable editorial staff, and frictionless communication tools. If it’s easier just to tweet or blog the news, the source will.

There are several other good examples of social news-gathering, on the theme of asking your audience for help. Crowdsourcing is usually thought of as the recruitment of many unspecialized helpers, as the Guardian did with its MP expenses project. But the Guardian also reached out to its audience to find that one specialist attorney who could unravel the mystery of Tony Blair’s tax returns. Hopefully the specialists a newsroom needs to consult are already among the audience, and they will see the call for experts when a reporter sends one out. For that matter, a smart and engaged audience can correct you quickly when you are wrong. Nothing says “we care about accuracy” like a fact check box on every story.

But is it journalism?
Yes, absolutely. The job of journalism is to collect accurate information on an ongoing basis and ensure that the audience for each story learns about that story. Any way you can deliver that service is fair game. People depend on each other for the news all the time, so journalists better get in those conversations.

Rating Items by Number of Votes: Ur Doin It Rong

September 25, 2009September 25, 2009collaborative filtering, information, news7 Comments

Digg, YouTube, Slashdot, and many other sites employ user voting to generate collaborative rankings for their content. This is a great idea, but simply counting votes is a horrible way to do it. Fortunately, the fix is simple.

A basic ranking system allows each user to add a vote to the items they like, then builds a “top rated” list by counting votes. The problem with this scheme is that users can only vote on items they’ve seen, and they are far more likely to see items near the top of the list. In fact, anything off the front page may get essentially no views at all — and therefore has virtually no chance of rising to top.

This is rather serious if the content being rated is serious. It’s fine for Digg to have weird positive-feedback popularity effects, but it’s not fine if we are trying to decide what goes on the front page of a news site. Potentially important stories might never make it to the top simply because they started a little lower in the rankings for whatever reason.

Slightly more sophisticated systems allow users to rate items on a scale, typically 1-5 stars. This seems better, but still introduces weird biases. Adding up the stars assigned by all users to a single item doesn’t work, because users still have to see an item to vote on it. Averaging all the ratings assigned to a single item doesn’t work either, because it can push something permanently to the bottom of the list, if the first user to view it rates it only one star.

There are lots of subtle hacks that one can make to try to fix the system, but it turns out there might actually be a right way to do things.

If every item was rated by every user, there would be no problem with popularity feedback effects.

That’s completely impractical with thousands or even millions of items. But we can actually get close to the same result with much less work, if we take random samples. Like a telephone poll, the opinion of a small group of randomly selected people will be an accurate indicator, to within a few percent, of the result that we would get if we asked everyone.

In practice, this would mean adding a few select “sampling” stories to each front page served, different every time. Items can then by ranked simply their average rating, with no skewing due to who got to the front page first. (In fact, basic sampling math will tell us which items have the most uncertain ratings and need to be seen with the highest priority.) In effect, we are distributing the work of rating a huge body of items across a huge body of users — true collaborative filtering, using sampling methods to remove the “can’t see it can’t vote on it” bias.

This is not an end-all solution to the problem of distributed agenda-setting. User ratings are not necessarily the ideal criterion for measuring “relevance.” One problem is that not every user is going to take the trouble to assign a rating, so you will only be sampling from particularly motivated individuals. Other metrics such as length of time on page might be better — did this person read the whole thing?

Even more fundamentally, it’s not clear that popularity, however defined, is really the right way to set a news agenda in the public interest.

However, any attempt to use user polling for collaborative agenda setting needs to be aware of basic statistical bias issues. Sampling is a simple and very well-developed way to think about such problems.

Jonathan Stray

Information, culture, and belief

Archives

What is news when the audience is editor?

Rating Items by Number of Votes: Ur Doin It Rong