Dec 12 2009

The Structure of Social Journalism

Shortest way I can describe how I think journalism must change: the internet is not just for distribution, but production too. I’m not saying that “citizen journalists” will be making all the news. I suspect a complex collaboration between many people, including something like a newsroom full of pro journalists. In this article I’m going to explore what that might look like, by asking what the component tasks are that make up “journalism”, and thinking about who can do those most efficiently. And I’m going to sketch out the design for a piece of social software to support this.

Here’s a list of things that professional journalists do:

  • decide what should be more broadly known
  • decide what should be more deeply investigated
  • collect information from sources both public and private
  • check that information for factual accuracy
  • construct narratives to make sense of that information
  • produce content to convey those narratives
  • publish and market that content

This list is by no means definitive or exhaustive. It’s just illustrative, a starting point for a thought experiment. Who could do each of these things best? And what tools to do they need to do it?

Having a network of people producing journalism around a newsroom is not a new idea. Jeff Jarvis has been discussing networked journalism since at least 2006, and naturally I think he’s on to something. In this essay I want concentrate on process and roles. If cheap networks make new types of collaboration possible, they also set the stage for new types of specialization. I think one of the problems of the traditional, mainstream media newsroom is that it it tries to handle the entire journalistic process internally, even the parts that it’s not actually very good at.

An example

On November 25, a video appeared on YouTube which appears to be the testimonial of a young woman recently fired from the credit card collections division of Bank of America. She had been allowing the bank’s most desperate customers to enroll in fixed-payment debt recovery schemes. Many of these customers are currently paying 30% interest as a result of recent rate hikes, so this was a great kindness. It was also against company policy.

The video is powerful. It’s an amazing first-person testimonial of the greed and heartlessness of large corporations.

So is this journalism?

Continue Reading »

No responses yet

Dec 02 2009

What is the Right Number of Journalists?

How many journalists does the world need to adequately serve the public? It’s a difficult question, and I’m going to argue that the market won’t tell us. But here’s a thought: for the first time in history, it’s trivial to check how many outlets covered any particular story.

From Google News just now:

TooMuchCoverage

In this case, there were 2,907 articles on the web covering the story of Iran freeing the Brits. Even accounting for the fact that most of these pages will be mirrors of the same text, we can see that Aljazeera, The Telegraph, the Associated Press, CNN, the New York Times, and many others covered the story. And they all did their own reporting, a horde of journalists from all over the world making phone calls and doing interviews.

Meanwhile, written-word reporters are getting laid off right and left, essentially due to the end of the print-based information distribution monopolies. Last year, 35,000 journalists lost their jobs in the US. This has made a number of quite clever people fret about the end of “accountability journalism,” the press that keeps people honest by serving as a watchdog.

This talk between Clay Shirky and Alex Jones is a great introduction to the argument that we’re “losing the news,” to use Jones’ phrase.

But what the Google numbers suggest to me is that, if we really are losing the kind of journalism that is essential for democracy, it’s not because we have too few people. It’s because they’re doing the wrong things. According to people like Shirky and the Knight Foundation, local news in particular is vastly under-served.

I discussed this with a journalist from the International Herald Tribune yesterday. He raised the point that we definitely need more than one organization covering each story. Competition is important, as is a plurality of viewpoints — I think we really do want to preserve the difference between the CNN and Al Jazeera.

So the right number of newsrooms on each story is greater than one. But I bet it’s less then 10 or 20, which is what we have now. Sadly, I think this means that there is a currently a tremendous duplication of effort in the global news-gathering system.

The world’s journalists need to get better organized and stop wasting their efforts. Markets are very good at maximizing efficiency for all sorts of things, but I think this is a case where just letting the market strip apart newspapers until they’re profitable again (if ever) is unlikely to give us the right answer.

Journalism is, arguably, a public good in the economic sense. This means that everyone benefits from it, but everyone also shares it — you will tell your friends the hot news, which makes it difficult to charge for. It’s been understood for at least a century that a competitive market tends to produce too little of a public good. The only reason we had so many newspapers before the internet is that they had a monopoly on distribution. Presses and paper are expensive.

I am not arguing that newspapers need to be run as non-profits or government subsidized, as some people have (and it’s worth noting that the world-wide BBC operation runs on UK government money.) Personally I favor hybrid “social-venture” models that subsidize news through other means. I find Jay Rosen’s list of ways that the news has been subsidized in the past to be particularly enlightening.

It seems likely to me that the bare market is going to under-produce journalism, at least the general-interest public service sort of journalism (financial journalism is currently quite profitable, thank you very much.). If we believe this, and we are going to start designing models and policies to ensure that we get more, then the question of when we have enough needs to be asked in a serious, empirical way.

Counting story duplication isn’t a very complete answer to “how do we know when we have enough journalists?” but it’s a start.

What I really don’t know how to address is the question of how many stories should have been written, if only someone had been there to cover them.

One response so far

Nov 20 2009

Advertising Got There First

Phantom 3D objects floating in the air, visible only through the portal of your phone? An urban game played with same? Mobile ad boutique The Hyper Factory seems to have got there first. Their recent ad campaign for Nike used image recognition of printed targets (on posters, in magazines, on the ground of a football field, etc.) to superimpose hovering shoes over the real world.

This is, without a doubt, creative. But looking at it strictly as a creative work, it is severely hamstrung by the fact that the objective is to sell shoes. My guess is that it will be games that push the aesthetic and technical boundaries of this technology. We’re going to see strange reality-fantasy hybrids that will make World of Warcraft and Second Life look old, boring, and flat. Then again, it might also make LARPing socially acceptable, and do we really want that?

And after the technology is ubiquitous and cheap, we’re going to use it to put deep labels on our environment in real time — this is already starting with a sort of Wikipedia for objects. If you’re one of those people who feel sorta blind without your smartphone, just wait until it’s built into your sunglasses.

No responses yet

Oct 12 2009

The Search Problem vs. The News Problem

I think I’ve found a useful distinction between the “search” and “news” problems. News organizations like to complain that search engines are taking their business, but that’s only because no one has yet built a passable news engine.

Search is when the user asks the computer for a particular type of information, and the computer finds it.

News is when the computer has to figure out, by itself, what information a user wants in each moment.

This definition has useful consequences. For example, it says that accurately modeling the user and their needs is going to be absolutely essential for news, because the news problem doesn’t have a query to go on. All a news selection algorithm can know is what the user has done in the past. For this reason, I don’t believe that online news systems can truly be useful until they take into account everything of ourselves that we’ve put online, including Facebook profiles and emails, and viewing histories.

And yes, I do want my news engine to keep track of cool YouTube uploads and recommend videos to me. This in addition to telling me that Iran has a secret uranium enrichment facility. In the online era, “news” probably just means recently published useful information, of which journalistic reporting is clearly a very small segment.

It’s worth remembering that keyword web search wasn’t all that useful until Google debuted in 1998 with an early version of the now-classic PageRank algorithm.  I suspect that we have not yet seen the equivalent for news. In other words, the first killer news app has yet to be deployed. Because such an app will need to know a great deal about you, it will probably pull in data from Facebook and Gmail, at a minimum. But no one really knows yet how to turn a pile of emails into a filter that selects from the best of the web, blogosphere, Twitter, and mainstream media.

Classic journalism organizations are at a disadvantage in designing modern news apps, because broadcast media taught them bad habits. News organizations still think in terms of editors who select content for the audience. This one-size fits all attitude seems ridiculous in the internet era, a relic of the age when it would have been inconceivably expensive to print a different paper for each customer.

Of course, there are some serious potential problems with the logical end-goal of total customization. The loss of a socially shared narrative is one; the Daily Me effect where an individual is never challenged by anything outside of what they already believe is another. But shared narratives seem to emerge in social networks regardless of how we organize them — this is the core meaning of something “going viral.” And I believe the narcissism problem can be addressed through information maps. In fact, maps are so important that we should add another required feature to our hypothetical killer news app: it must in some way present a useful menu of the vast scope of available information. This is another function that existing search products have hardly begun to address.

Not that we have algorithms today that are as good as human editors as putting together a front page. But we will. Netflix’s recent million dollar award for a 10% improvement in their film recommendation system is a useful reminder of how seriously certain companies are taking the problem of predicting user preferences.

The explosion of blog, Twitter, and Wikipedia consumption demonstrates that classic news editors may not have been so good at giving us what we want, anyway.

No responses yet

Sep 25 2009

Rating Items by Number of Votes: Ur Doin It Rong

Digg, YouTube, Slashdot, and many other sites employ user voting to generate collaborative rankings for their content. This is a great idea, but simply counting votes is a horrible way to do it. Fortunately, the fix is simple.

A basic ranking system allows each user to add a vote to the items they like, then builds a “top rated” list by counting votes. The problem with this scheme is that users can only vote on items they’ve seen, and they are far more likely to see items near the top of the list. In fact, anything off the front page may get essentially no views at all — and therefore has virtually no chance of rising to top.

digg

This is rather serious if the content being rated is serious. It’s fine for Digg to have weird positive-feedback popularity effects, but it’s not fine if we are trying to decide what goes on the front page of a news site. Potentially important stories might never make it to the top simply because they started a little lower in the rankings for whatever reason.

Slightly more sophisticated systems allow users to rate items on a scale, typically 1-5 stars.  This seems better, but still introduces weird biases. Adding up the stars assigned by all users to a single item doesn’t work, because users still have to see an item to vote on it. Averaging all the ratings assigned to a single item doesn’t work either, because it can push something permanently to the bottom of the list, if the first user to view it rates it only one star.

There are lots of subtle hacks that one can make to try to fix the system, but it turns out there might actually be a right way to do things.

If every item was rated by every user, there would be no problem with popularity feedback effects.

That’s completely impractical with thousands or even millions of items. But we can actually get close to the same result with much less work, if we take random samples. Like a telephone poll, the opinion of a small group of randomly selected people will be an accurate indicator, to within a few percent, of the result that we would get if we asked everyone.

In practice, this would mean adding a few select “sampling” stories to each front page served, different every time. Items can then by ranked simply their average rating, with no skewing due to who got to the front page first. (In fact, basic sampling math will tell us which items have the most uncertain ratings and need to be seen with the highest priority.) In effect, we are distributing the work of rating a huge body of items across a huge body of users — true collaborative filtering, using sampling methods to remove the “can’t see it can’t vote on it” bias.

This is not an end-all solution to the problem of distributed agenda-setting. User ratings are not necessarily the ideal criterion for measuring “relevance.” One problem is that not every user is going to take the trouble to assign a rating, so you will only be sampling from particularly motivated individuals. Other metrics such as length of time on page might be better — did this person read the whole thing?

Even more fundamentally, it’s not clear that popularity, however defined, is really the right way to set a news agenda in the public interest.

However, any attempt to use user polling for collaborative agenda setting needs to be aware of basic statistical bias issues. Sampling is a simple and very well-developed way to think about such problems.

2 responses so far

Sep 16 2009

American Press Covers Debate, Not Health Care

Representative Joe Wilson yelled “you lie!” at the president, and the papers loved it. Unfortunately, by a count of more than three to one, the major media articles covering the event did not bother to comment on the substance of issue of that provoked Wilson’s outburst: whether or not illegal immigrants would be provided health care under proposed reforms. There is no health care debate in the mainstream American press. There is only political drama.

The president did not lie. All of the proposed health care reform bills contain language excluding those residing illegally in the US from government-subsidized coverage. This single-sentence fact check was entirely absent from 50 of the 70 articles mentioning “wilson” and “lie” on the New York Times and Washington Post websites as of Monday night. Of the 20 which discussed actual policy, only nine articles mentioned it in the first two paragraphs. (Spreadsheet here.)

Wilson’s outburst will be forgotten long after millions of Americans are insured — or not — under Obama’s plan. It’s just noise and heat. Yet some of the most reputable newspapers in the world have lead with it for the last five days. In fact, the press has in some cases actively dodged the underlying issue. Consider this exchange from an online Q&A session with Dana Milbank of the Washington Post:

Cincinnati: Are you saying the President wasn’t lying when he said illegal immigrants won’t be covered? Why not look at the House bill and tell us whether or not it allows illegals to be covered? The Congressional Research service issued a report last week saying there was NOTHING in the House bill that excludes illegals from receiving government-run health care. In other words, be a REPORTER instead of a hack for Barack.

Dana Milbank:  Actually I wasn’t addressing the factual nature of Obama’s speech. The issue wasn’t that Wilson thought the president wasn’t telling the truth; part of the presidential job description calls for expertise in truth shading. The issue was shouting “you lie!” at the president on the House floor during an address to a joint session of Congress.

(For the record, the CRS report in question notes that HR 3200 says “Nothing in this subtitle shall allow Federal payments for affordability credits on behalf of individuals who are not lawfully present in the United States.” Which has, oddly, been spun as meaning that illegals would be subsidized!)

It should be no surprise that there is actually substance to the question of coverage for illegal immigrants. Only nine of the 70 pieces get into it: yes, a few undocumented workers could end up getting subsidized health care. No, it’s not worth taxpayer money to add an enforcement mechanism.

But even this is one level removed, and only one article grappled with the fundamental question: would it really be so bad if the poorest workers in America got a break? In fact we might even owe it to them. On average, migrant labor is thought to be a small net gain to the American economy.

I get that Wilson’s little moment is a great story, right up there with the guy who threw a shoe at Bush (who was imprisoned for his prank, with far less coverage.) And I do understand the logic of a populist press as the paper ship sinks. What cannot be excused is the omission of any mention of the substantive content of the debate from the majority of coverage — 50 out of 70 articles said nothing at all about anything that will last.

We are reporting on court theatrics while the citizens starve.

No responses yet

Jun 02 2009

Twitter is Not Reality, Even in Guatemala

Guatemalans took to the streets in protest over the alleged murder of a prominent attorney by the country’s president, and an unrelated man was arrested for tweeting about it. The protests were reportedly organized on Facebook and other social networking sites, and streamed live to the world by laptop. Xeni Jardin of Boing Boing has been reporting from Guatemala directly for the past two weeks, and in an essay two days ago she calls this the “Twitter Revolution“. I love the story of new technology enabling mass social dissent and change, but I’m not at all sure it’s true.

The background: Attorney Rodrigo Rosenberg was shot while riding his bicycle on May 10th, just a few days after recording a video message which begins,

If you are watching this message, it is because I was assassinated by President Álvaro Colom.

The video implicates not only the president but the major state-owned bank, and indeed much of the current government, and there were mass protests in the capital city. Xeni has been covering the story from Guatemala since the 20th, and I can only commend her for actually being there. However, her coverage has focussed on the role of the internet in these protests.

Google is not reality and Twitter is not reality in exactly the same way that television is not reality. Part of the reason that Middle-Eastern peasants have such a warped view of America is that they too watch Desperate Housewives (via satellite or bootleg VCD), but never get the chance to actually meet some Americans. To them, all American women are blonde and slutty. There’s no reason to believe that we’re not getting a similarly warped view of other cultures when we watch their internet.

In other words, Twitter and Facebook aren’t the point, and we can’t yet interpret what they might be telling us. I fear that all of us who have seen the social potential of new media are just making up stories of our own; there are pointed questions that I haven’t yet seen asked in the coverage of the Revolution.

Continue Reading »

2 responses so far

Mar 05 2009

Escaping the News Hall of Mirrors

We live in a cacaphony of news, but most of it is just echoes. Generating news is expensive; collecting it is not. This is the central insight of the news aggregator business model, be it a local paper that runs AP Wire and Reuters stories between ads, or web sites like Topix, Newser, and Memeorandum, or for that matter Google News. None of these sites actually pay reporters to research and write stories, and professional journalism is in financial crisis. Meanwhile there are more bloggers, but even more re-blogging. Is there more or less original information entering the web this year than last year? No one knows.

A computer could answer this question. A computer could trace the first, original source of any particular article or statement. The effect would be like donning special glasses in the hall of mirrors that is current news coverage, being able to spot the true sources without distraction from reflections. The required technology is nearly here.

This is more than geekery if you’re in a position of needing to know the truth of something. Last week I was researching a man named Michael D. Steele, after reading a newly leaked document containing his name. Steele gained fame as one of the stranded commanders in Black Hawk Down, but several of his soldiers later killed three unarmed Iraqi men. I rapidly discovered many news stories (1, 2, 3, 4, 5, 6, 7, etc.) claiming that Steele had ordered his men to “kill all military-age males.” This is a serious accusation, and widely reprinted — but no number of news articles, blog posts, and reblogs can make a false statement more true. I needed to know who first reported this statement, and its original source.

Continue Reading »

3 responses so far

Feb 04 2009

How Many World Wide Webs Are There?

newblog-crop

How much overlap is there between the web in different languages, and what sites act as gateways for information between them? Many people have constructed partial maps of the web (such as the  blogosphere map by Matthew Hurst, above) but as far as I know, the entire web has never been systematically mapped in terms of language.

Of course, what I actually want to know is, how connected are the different cultures of the world, really? We live in an age where the world seems small, and in a strictly technological sense it is. I have at my command this very instant not one but several enormous international communications networks; I could email, IM, text message, or call someone in any country in the world. And yet I very rarely do.

Similarly, it’s easy to feel like we’re surrounded by all the international information we could possibly want, including direct access to foreign news services, but I can only read articles and watch reports in English. As a result, information is firewalled between cultures; there are questions that could very easily be answered by any one of tens or hundreds of millions of native speakers, yet are very difficult for me to answer personally. For example, what is the journalistic slant of al-Jazeera, the original one in Arabic, not the English version which is produced by a completely different staff?  Or, suppose I wanted to know what the average citizen of Indonesia thinks of the sweatshops there, or what is on the front page of the Shanghai Times today– and does such a newspaper even exist? What is written on the 70% of web pages that are not in English?

Continue Reading »

One response so far

Sep 04 2008

Intelligent News Agents, With Real New

You cannot read all of the news, every day. There is simply too much information for even a dedicated and specialized observer to consume it all, so someone or something has to make choices. Traditionally, we rely on some other person to tell us what to see: the editor of a newspaper decides what goes on the front page, the reviewer tells us what movies are worth it. Recently, we have been able to distribute this mediation process across wider communities: sites like Digg, StumbleUpon, or Slashdot all represent the collective opinions of thousands of people.

The next step is intelligent news agents. Google (search, news, reader, etc.) can already be configured to deliver to us only that information we think we might want to see. It’s not hard to imagine much more sophisticated agents that would scour the internet for items of interest.

In today’s context, it’s easy to see how such agents could actually be implemented. Sophisitacted customer preference engines are already capable of telling us what products we might like to consume — the best example is Amazon’s recommendation engine. It’s not a big leap to imagine using the same sort of algorithms to model the kinds of blog articles, web pages, youtube videos, etc. that we might enjoy consuming, and then deliver these things to us.

There is a serious problem with this. You’re going to get exactly what you ask for, and only that.

True, we all do this already. We read books and consume media which more or less confirm our existing opinions. This effect is visible as clustering in what we consume, as in this example of Amazon sales data for political books in 2008.

Social network graph of Amazon sales of political books, 2008

This image is from a beautiful analysis by orgnet.com. Basically, people buy either the red books or the blue books, but usually not both. The same sorts of patterns hold for movies, blogs, newspapers, ideologies, religions, and human beliefs of all kinds. This is a problem; but at least you can usually see the other color of books when you walk into Borders. If we end up relying on trainable agents for all of our information, we risk completely blacking out anything that disagrees with what we already believe.

I propose a simple solution. Automatic network analyses like the one above — of books, or articles, or web pages — could easily pinpoint the information sources that would expose me to the maximum novelty in the minimum time. If my goal is to gain a deep understanding of the entire scope of human discourse, rather than just the parts of it I already agree with, then it would be very simple to program my agent to bring to me exactly those things that would most rapidly give me insight into those regions of information space which are most vital and least known to me. I imagine some metric like “highest degree node most distant from the nodes I’ve already visited” would would work handily.

You can infer a lot about somewhat from the information they currently consume. If my agent noticed that I was a liberal, it could make me understand the conservative world-view, and vice-versa. If my agent detected that I was ignorant of certain crucial aspects of Chinese culture and politics, it could reccomend a primer article. Or it might deduce that I needed to understand just slightly more physics to participate meaningfully in the climate change debate, or decide (based on my movie viewing habits) that it was high time I review the influential films of Orson Welles. Of course, I might in turn decide that I actually, truly, don’t care about film at all; but the very act of excluding specific subjects or categories of thought would force us, consciously, to admit to the boundaries of our mental worlds.

We could program our information gathering systems to challenge us, concisely and effectively, if we so want. Intelligent agents could be mere sycophants, or they could be teachers.

One response so far

Next »