Jonathan Stray

Journalism for makers

September 22, 2011December 10, 2011financial crisis, journalism, making30 Comments

I find myself wondering what it would take to fix the global financial system, but most financial journalism doesn’t help me to answer this question. Something seems wrong here. The modern world is built on a series of vast systems, intricate combinations of people and machines, but our journalism isn’t really built to help us understand them. It’s not a journalism for the people who will put together the next generation of civic institutions.

My friend Maha Atal — whose profile of incoming IMF chief Christine Lagarde recently graced the cover of Forbes — tells me there are two histories of financial journalism, two types that have been practiced since the dawn of newsprint. One tradition began with lists of market prices and insurance rates and evolved into the financial data services and newswires we have today, a journalism of utility for people who want to make money. The other tradition she called “muckraking,” a journalism which interests itself in shady deals, insider trading, and undue influence. It looks for hypocrisy and offenses against the interests of the broader public.

Service to the status quo, and zealous suspicion of power. Are these really the only two stands that a journalist can take? When I imagine the global financial system improving, actually improving in the sense of changing in a way that makes the lives of very many people better — say, becoming less prone to the sort of systemic collapse that puts tens of millions out of work — I don’t see it much assisted by either of these approaches to reporting, necessary though they might be.

The financial system is just that: a system, sprawling, messy, very complex, built of people and laws and machines. It serves a great many ends, both humane and monstrously avaricious. It won’t be much improved by forcing a few traders to resign in disgrace, or focusing public fury on the bonuses of bank executives (which, obscene though they may be, remain just a drop in the bucket.) It seems rather that improvement will require international agreement on arcane solutions both political and technical, things like risk models, capital reserve requirements, and trading platforms. This is regulation both in the sense of law and in the sense of code-as-law, because software is a deep part of the infrastructure of modern finance. Markets don’t just happen; they are human desires channeled by what we have agreed to allow, and by what our technology has been built to support. Markets are designed things.

So maybe what we need are designers. Geeks who like to understand very complex systems, and tinker with them. I want to borrow from the culture of “makers,” because maker culture plants a flag on this idea. It draws on the hacker tradition of technical mastery, the DIY aesthetic perfected by the punks, and the best disruptive tendencies of global counter-culture. It lives in online forums and nerdy meetups and on the dingy couches of hack spaces. This is the chaotic ecosystem that powers Silicon Valley, and I bet it’s the secret ingredient that government planners miss when they build huge technology parks that end up empty.

But most of all, makers are deeply participatory. Where the political activist sees persuasion as the ultimate goal, the maker wants to personally rewire the system. This requires a deep love of the inner workings of things, the finicky, empirical details of how the real world is put together. A maker combines the democratic instinct with the technologist’s hands-on ability. And increasingly, makers are directing their attention to social problems. Efforts such as crisis mapping and Code For America and the whole information and communication technologies for development (ICT4D) movement are evidence of this. Maker language has recently been spotted at the White House and the United Nations.

The global financial system is just the sort of complex, intricate, part technical and part social system that makers would love, if only they could open it up and look inside. There are textbooks, but you can’t learn how the world actually works from textbooks. What would it take to open the global financial system to independent minds? Because it will be these independent minds — smart, deeply informed, creative — who will pore over the arcania of today in order to conceive of the better worlds to come.

Consider the latest draft of the Basel III standards for international banking. Who reads such dense and technical stuff? The professional regulator is obliged to sit at their desk and study this document. The financier wants only to understand how these rules will make or cost them money. The muckraker might ask who is making the rules and why. Another journalist will look for headlines of broad interest, but almost certainly won’t have the technical background to trace the subtle implications. But a maker would read these standards because they are changes in the operating system of global finance. And of these, it might be the maker, the specialized outsider, who is most qualified to understand the detailed, systemic effects on everyone else. The systems that underlie finance have become so fast and so complex that we don’t really understand the interactions. The people who know it best are mostly too busy making money to explain it to the rest of us. The public interest is in dire need of geeks who are not on the payroll.

There is a journalism to be done here, but it’s not the journalism of making people money, penning morality tales, or interesting articles in the Sunday paper. It’s a techno-social investigative journalism for those who have chosen to use their specialized knowledge in the interests of the rest of us. It’s a journalism that generalist reporters may be ill equipped to do.

We already have models for this. Dowser.org practices “solutions journalism,” writing about how best to solve societal problems. I appreciate that, but I don’t think they’ve conceived of their audience as the policy and technology geeks who will one day flesh out and implement those solutions. The contemporary science journalism ecosystem might be a better example. There are science reporters at news organizations, but the best science reporting now tends to come from elsewhere. Science, like finance, is absurdly specialized, and so its chronicling has been taken over by networks of specialists — very often scientists themselves, the ones who have learned to write. Science blogging is thriving. Its audience is the general public, yes, but also other scientists, because it’s the real thing. Even better, science writing exists in a web of knowledge: you can follow the links and go arbitrarily deep into the original research papers. And if you still have questions, the experts are already active online. Compare this to the experience of reading an economics article in the paper.

We don’t have much truly excellent journalism on deep, intricate topics, issues with enormous technical and institutional complexity. There’s some, but it’s mostly in trade publications with little sense of the social good, or tucked away in expensive journals which speak to us in grown-up tones and don’t know how to listen for the questions of the uninitiated. And yet our world abounds in complex problems! Sustainability, climate change, and energy production. Security, justice, and the delicate tradeoffs of civic freedoms. Health care for the individual, and for entire countries. The policies of government from the international to the municipal. And governments themselves, in all their gruesome operational detail. These things are not toys. But when journalists write about such issues, they satisfy themselves with discovering some flavor of corruption, or they end up removing so much of the substance that readers cannot hope to make a meaningful contribution. Perhaps this is because it has always been assumed that there is no audience for wonkish depth. And perhaps that’s true. Perhaps there won’t ever be a “mainstream” audience for this type of reporting, because the journalism of makers is directed to those who have some strange, burning desire to know the gory details, and are willing to invest years of their life acquiring background knowledge and building relationships. Can we not help these people? Could we encourage more of them to exist, if we served them better?

This is a departure from the broadcast-era idea of “the public.” It gives up on the romantic notion of great common narratives and tries instead to serve particular pieces of the vast mosaic of communities that comprise a society. But we are learning that when done well, this kind of deep, specialist journalism can strike surprising chords in a global population that is more educated than it has ever been. And the internet is very, very good at routing niche information to communities of interest. We have the data to show this. As Atlantic editor Alexis Madrigal put it, “I love analytics because I owe them my ability to write weird stories on the Internet.”

Where is the journalism for the idealist doer with a burning curiosity? I don’t think we have much right now, but we can imagine what it could be. The journalism of makers aligns itself with the tiny hotbeds of knowledge and practice where great things emerge, the nascent communities of change. Its aim is a deep understanding of the complex systems of the real world, so that plans for a better world may constructed one piece at a time by people who really know what they’re talking about. It never takes itself too seriously, because it knows that play is necessary for exploration and that a better understanding will come along tomorrow. It serves the talent pools that give rise to the people who are going to do the work of bringing us a potentially better world — regardless of where in society these people may be found, and whether or not they are already within existing systems of power. This is a theory of civic participation based on empowering the people who like to get their hands dirty tinkering with the future. Maybe that’s every bit as important as informing voters or getting politicians fired.

Visualizing communities

August 1, 2011November 21, 2020belief, community, knowledge, visualization30 Comments

There are in fact no masses; there are only ways of seeing people as masses.
–Raymond Williams

Who are the masses that the “mass media” speaks to? What can it mean to ask what “teachers” or “blacks” or “the people” of a country think? These words are all fiction, a shorthand which covers over our inability to understand large groups of unique individuals. Real people don’t move in homogeneous herds, nor can any one person be neatly assigned to a single category. Someone might view themselves simultaneously as the inhabitant of a town, a new parent, and an active amateur astronomer. Now multiply this by a million, and imagine trying to describe the overlapping patchwork of beliefs and allegiances.

But patterns of association leave digital traces. Blogs link to each other, we have “friends” and “followers” and “circles,” we share interesting tidbits on social networks, we write emails, and we read or buy things. We can visualize this data, and each type of visualization gives us a different answer to the question “what is a community?” This is different from the other ways we know how to describe groups. Anecdotes are tiny slices of life that may or may not be representative of the whole, while statistics are often so general as to obscure important distinctions. Visualizations are unique in being both universal and granular: they have detail at all levels, from the broadest patterns right down to individuals. Large scale visualizations of the commonalities between people are, potentially, a new way to represent and understand the public — that is, ourselves.

I’m going to go through the major types of community visualizations that I’ve seen, and then talk about what I’d like to do with them. Like most powerful technologies, large scale visualization is a capability that can also be used to oppress and to sell. But I imagine social ends, worthwhile ways of using visualization to understand the “public” not as we imagine it, but as something closer to how we really exist.

Continue reading Visualizing communities

The new structure of stories: a reading list

July 26, 2011July 26, 2011inter, journalism, linked data, storytelling21 Comments

Different medium, different story form. It’s clear that each new technology — photography, radio, television — has brought with it different ways of constructing a narrative, and different ways those narratives fit into the audience’s lives. Online media are no different, and the step between analog and digital is in many ways much larger than any that has come before, because the internet connects the audience to each other as well as to the newsroom.

Here’s my attempt at a little reading list of recent work on the structure of stories. Pointers to additional material are welcome!

The Big Picture
What’s wrong with the article, anyway? Jeff Jarvis explores this question in “The article as luxury or by-product.” This essay provoked lots of interesting reaction, such as from Mathew Ingram.

So how do we understand ways to fix this? Vadim Lavrusik takes a shot at this question and comes up with the building blocks of context, social, personalization, mobile, participation. It’s a good taxonomy so I’m going to partially steal it for this post.

At the NYC Hacks Hackers meetup last week, Trei Brundrett took us through SB Nation’s “story stream” product, and Gideon Lichfield of The Economist gave a really nice run through of the “news thing” concept that was fleshed-out collaboratively last month at Spark Camp by Gideon, Matt Thompson, and a room full of others. Very meaty, detailed, up-to-the-minute discussions, for serious news nerds. Video here.

Context
You just can’t do better than Matt Thompson’s “An antidote for web overload.” I also recommend Matt’s wonderful “The three key parts of news stories that are usually missing.” Another good primer is Jay Rosen’s “Future of Context” talk at SXSW.

See also my “Short doesn’t mean shallow,” about hyperlinks as a contextual storytelling form.

For an example of these ideas in action, consider Mother Jone’s Egypt Explainer page — which Gideon Lichfield critiques in the video linked above.

Social
What does it mean for news to be social anyway? Henry Jenkins argues for the power of “spreadable media” as a new distribution model.

In “What’s the point of social news?” I discuss two areas where social media have a huge impact on news: the use of social networks as a personalized filter, and distributed sourcing of tips and material.

Personalization
News is now personalized by a variety of filters, both social and algorithmic. Eli Pariser argues this puts us in a “filter bubble.” He may be right, but research by Pew and others [1,2] consistently shows that when users are allowed to recommend any URL to one other, the “news agenda” that the audience constructs has only 5%-30% of stories in common with mainstream media.

A comparison of questions asked of the White House by a Twitter audience vs. by journalists shows a remarkable difference in focus. All of this this suggests to me that whatever else is happening, personalization meets an audience need that traditional broadcast journalism does not.

Besides, maybe not every person needs to see every story, if we view the goal of journalism as empowerment.

Participation
What do we know and what don’t we know about public participation in the journalism project, and what has worked or failed so far? Jay Rosen has an invaluable summary.

I also recommend the work of Amanda Michel as someone who does crowd-based reporting every day, and my own speculations on distributed investigative reporting.

Structured information
Is the product of journalism narratives or (potentially machine-readable) facts? Adrian Holovaty seems to be the first to have explored this in his 2006 essay “A fundamental way newspaper websites need to change.” This mantle has been more recently taken up by Stijn Debrouwere in his “Information Architecture for News Websites” series, and in Reg Chua’s “structured journalism,” and in a wide-ranging series at Xark.

There are close connections here to semantic web efforts, and occasional overlap between the semweb and journalism communities.

Mobile
I haven’t seen any truly good roundup posts on what mobile will mean for news story form, but there are some bits and pieces. Mobile is by-definition location aware, and Mathew Ingram examines how location is well used by Google News (and not by newsrooms.)

Meanwhile, Zach Seward of the Wall Street Journal has done some interesting news-related things with Foursquare.

Real time
Emily Bell, formerly of the Guardian and now at Columbia, explains why every news organization needs to be real-time.

For a granular look at how informations spreads in real time, consider Mathew Ingram on “Osama bin Laden and the new ecosystem of news.” For a case study of real-time mobile reporting, we have Brian Stelter’s “What I learned in Joplin.”

A job posting that really doesn’t suck

July 8, 2011July 12, 2011computational journalism, computational linguistics, journalism, overview, visualization8 Comments

I just got a pile of money to build a piece of state-of-the-art open-source visualization software, to allow journalists and curious people everywhere to make sense of enormous document dumps, leaked or otherwise.

Huzzah!

Now I am looking for a pair of professional developers to make it a reality. It won’t be hard for the calibre of person I’m trying to find to get some job, but I’m going to try to convince you that this is the best job.

The project is called Overview. You can read about it at overview.ap.org. It’s going to be a system for the exploration of large to very large collections of unstructured text documents. We’re building it in New York in the main newsroom of The Associated Press, the original all-formats global news network. The AP has to deal with document dumps constantly. We download them from government sites. We file over 1000 freedom of information requests each year. We look at every single leak from Wikileaks, Anonymous, Lulzsec. We’re drowning in this stuff. We need better tools. So does everyone else.

So we’re going make the killer app for document set analysis. Overview will start with a visual programming language for computational linguistics algorithms. Like Max/MSP for text. The output of that will be connected to some large-scale visualization. All of this will be backed by a distributed file store and computed through map-reduce. Our target document set size is 10 million. The goal is to design a sort of visualization sketching system for large unstructured text document sets. Kinda like Processing, maybe, but data-flow instead of procedural.

We’ve already got a prototype working, which we pointed at the Wikileaks Iraq and Afghanistan data sets and learned some interesting things. Now we have to engineer an industrial-strength open-source product. It’s a challenging project, because it requires production implementation of state-of-the-art, research-level algorithms for distributed computing, statistical natural language processing, and high-throughput visualization. And, oh yeah, a web interface. So people can use it anywhere, to understand their world.

Because that’s what this is about: a step in the direction of applied transparency. Journalists badly need this tool. But everyone else needs it too. Transparency is not an end in itself — it’s what you can do with the data that counts. And right now, we suck at making sense of piles of documents. Have you ever looked at what comes back from a FOIA request? It’s not pretty. Governments have to give you the documents, but they don’t have to organize them. What you typically get is a 10,000 page PDF. Emails mixed in with meeting minutes and financial statements and god-knows what else. It’s like being let into a decrepit warehouse with paper stacked floor to ceiling. No boxes. No files. Good luck, kiddo.

Intelligence agencies have the necessary technology, but you can’t have it. The legal profession has some pretty good “e-discovery” software, but it’s wildly expensive. Law enforcement won’t share either. There are a few cheapish commercial products but they all choke above 10,000 documents because they’re not written with scalable, distributed algorithms. (Ask me how I know.) There simply isn’t an open, extensible tool for making sense of huge quantities of unstructured text. Not searching it, but finding the patterns you didn’t know you were looking for. The big picture. The Overview.

So we’re making one. Here are the buzzwords we are looking for in potential hires:

We’re writing this in Java or maybe Scala. Plus JavaScript/WebGL on the client side.
Be a genuine computer scientist, or at least be able to act like one. Know the technologies above, and know your math.
But it’s not just research. We have to ship production software. So be someone who has done that, on a big project.
This stuff is complicated! The UX has to make it simple for the user. Design, design, design!
We’re open-source. I know you’re cool with that, but are you good at leading a distributed development community?

And that’s pretty much it. We’re hiring immediately. We need two. It’s a two-year contract to start. We’ve got a pair of desks in the newsroom in New York, with really nice views of the Hudson river. Yeah, you could write high-frequency trading software for a hedge fund. Or you could spend your time analyzing consumer data and trying to get people to click on ads. You could code any of a thousand other sophisticated projects. But I bet you’d rather work on Overview, because what we’re making has never been done before. And it will make the world a better place.

For more information, see :

Writeups in Nieman Journalism Lab, O’Reilly Radar, Journalism.co.uk
Video of a talk and live demo of the prototype.
The official job posting.

Thanks for your time. Please contact jstray@ap.org if you’d like to work on this.

The challenges of distributed investigative journalism

May 25, 2011May 26, 2011intelligence, investigation, journalism, social news, social software102 Comments

One of the clearest ideas to emerge from the excitement around the new media transformation of journalism is the notion that the audience should participate in the process. This two way street has been nicely described by Guardian editor Alan Rusbridger as the “mutualization of journalism.” But how to do it? What’s missing from what has been tried so far? Despite many experiments, the territory is still so unexplored that it’s almost impossible to say what will work without trying it. With that caveat, here are some more or less wild speculations about the sorts of tools that “open” investigative journalism might need to work.

There have been many collaborative journalism projects, from the Huffington Post’s landmark “Off The Bus” election campaign coverage to the BBC’s sophisticated “user-generated content hub” to CNN’s iReport. One lesson in all of this is that form matters. Take the lowly comment section. News site owners have long complained, often with good reason, that comments are a mess of trolls and flame wars. But the prompt is supremely important in asking for online collaboration. Do journalists really want “comments”? Or do they want error corrections, smart additions, leads, and evidence that furthers the story?

Which leads me to investigative reporting. It’s considered a specialty within professional journalism, dedicated to getting answers to difficult questions — often answers that are embarrassing to those in power. I don’t claim to be very good at journalistic investigations, but I’ve done enough reporting to understand the basics. Investigative reporting is as much about convincing a source to talk as it is about filing a FOIA request, or running a statistical analysis on a government data feed. At heart, it seems to be a process of assembling widely dispersed pieces of information — connecting the distributed dots. Sounds like a perfect opportunity for collaborative work. How could we support that?

A system for tracking what’s already known
Reporters keep notes. They have files. They write down what was said in conversations, or make recordings. They collect documents. All of this material is typically somewhere on or around a reporter’s desk or sitting on their computer. That means it’s not online, which means no one else can build on it. Even within the same newsroom, notes and source materials are seldom shared. We have long had customer relationship management systems that track every contact with a customer. Why not a “source relationship management” system that tracks every contact with every source by every reporter in the newsroom? Ideally, such a system would be integrated into the reporter’s communications tools: when I make a phone call and hit record (after getting the source’s permission of course) that recording could be automatically entered into system’s files, stamped by time, date, and source, then transcribed by machine to make it searchable. Primary documents would be also be filed in the system, along with notes and links and comments from everyone working on the story. The entire story of the story could be in one place.

There have been experiments in collaborative journalistic files, such as OpenFile.ca or even good local wikis. But I don’t believe there has yet been a major professional newsroom which operated with open files. For that matter, I am not aware of this type of information filing system in existence anywhere in journalism, though I suspect it’s what intelligence services do.

Public verification processes
Journalism aims to be “true,” a goal which requires elaborate verification processes. But in every newsroom I’ve worked with, essential parts of the verification standards are not codified. “At least two sources” is a common maxim, but are there any situations where one is enough? For that matter, who counts as a definitive source? When is a conflict of interest serious enough to disqualify what someone is telling you? The answers to these questions and many more are a matter of professional practice and culture. This is confusing enough for a new reporter joining staff, let alone outsiders who might want to help.

Verification is necessarily contextual. Both the costs of verification and the consequences of being in error vary widely with circumstance, so journalists must make situational choices. How sure do we have to be before we say something is true, how do we measure that certainty, and what would it take to be more sure? Until this sort of nuanced guidance is made public, and the public is provided with experienced support to encourage good calls in complex or borderline cases, it won’t be possible to bring enthusiastic outsiders fully into the reporting process. They simply won’t know what’s expected of them, to be able to participate in the the production of a product to certain standards. Those standards depend on what accuracy/cost/speed tradeoffs best serve the communities that a newsroom writes for, which means that there is audience input here too.

What is secret, or, who gets to participate?
Traditionally, a big investigative story is kept completely secret until it’s published. This is shifting, as some journalists begin to view investigation as more of a process than a product. However, you may not want the subject of an investigation to know what you already know. It might, for example, make your interview with a bank CEO tricky if they know you’ve already got the goods on them from a former employee. There are also off-the-record interviews, embargoed material, documents which cannot legally be published, and a multitude of concerns around the privacy rights of individuals. I agree with Jay Rosen when he says that “everything a journalist learns that he cannot tell the public alienates him from the public,” but that doesn’t mean that complete openness is the solution in all cases. There are complex tradeoffs here.

So access to at least some files must be controlled, for at least some period of time. Ok then — who gets to see what, when? Is there a private section that only staff can see and a public section for everyone else? Or, what about opening some files up to trusted outsiders? That might be a powerful way to extend investigations outside the boundaries of the newsroom, but it brings in all the classic problems of distributed trust, and more generally, all the issues of “membership” in online communities. I can’t say I know any good answers. But because the open flow of information can be so dramatically productive, I’d prefer to start open and close down only where needed. In other words, probably the fastest way to learn what truly needs to be secret is to blow a few investigations when someone says something they shouldn’t have, then design processes and policies to minimize those failure modes.

There is also a professional cultural shift required here, towards open collaboration. Newsrooms don’t like to get scooped. Fair enough, but my answer to this is to ask what’s more important: being first, or collectively getting as much journalism done as possible?

Safe places for dangerous hypotheses
Investigative journalism requires speculation. “What if?” the reporter must say, then go looking for evidence. (And equally, “what if not?” so as not to fall prey to confirmation bias.) Unfortunately, “what if the district attorney is a child molester?” is not a question that most news organizations can tolerate on their web site. In the worst case, the news organization could be sued for libel. How can we make a safe and civil space — both legally and culturally — for following speculative trains of thought about the wrongdoings of the powerful? One idea, which is probably a good idea for many reasons, is to have very explicit marking of what material is considered “confirmed,” “vetted,” “verified,” etc. and what material is not. For example, iReport has such an endorsement system. A report marked “verified” would of course have been vetted according to the public verification process. In the US, that marking plus CDA section 230 might solve the legal issues.

A proposed design goal: maximum amplification of staff effort
There are very many possible stories, and very few paid journalists. The massive amplification of staff effort that community involvement can provide may be our only hope for getting the quantity and quality of journalism that we want. Consider, for example, Wikipedia. With a paid staff of about 35 they produce millions of near-real time topic pages in dozens of languages.

But this is also about the usability of the social software designed to facilitate collaborative investigations. We’ll know we have the design right when lots of people want to use it. Also: just how much and what types of journalism could volunteers produce collaboratively? To find out, we could try to get the audience to scale faster than newsroom staff size. To make that happen, communities of all descriptions would need to find the newsroom’s public interface a useful tool for uncovering new information about themselves even when very little staff time is available to help them. Perhaps the best way to design a platform for collaborative investigation would be to imagine it as encouraging and coordinating as many people as possible in the production of journalism in the broader society, with as few full time staff as possible. These staff would be experts in community management and information curation. I don’t believe that all types of journalism can be produced this way or that anything like a majority of people will contribute to the process of journalism. Likely, only a few percent will. But helping the audience to inform itself on the topics of its choice on a mass scale sounds like civic empowerment to me, which I believe to be a fundamental goal of journalism.

Measuring and improving accuracy in journalism

April 20, 2011December 5, 2020accuracy, journalism, metrics, truth35 Comments

Professional journalism is supposed to be “factual,” “accurate,” or just plain true. Is it? Has news accuracy been getting better or worse in the last decade? How does it vary between news organizations, and how do other information sources rate? Is professional journalism more or less accurate than everything else on the internet? These all seem like important questions, so I’ve been poking around, trying to figure out what we know and don’t know about the accuracy of our news sources. Meanwhile, the online news corrections process continues to evolve, which gives us hope that the news will become more accurate in the future.

Accuracy is a hard thing to measure because it’s a hard thing to define. There are subjective and objective errors, and no standard way of determining whether a reported fact is true or false. But a small group of academics has been grappling with these questions since the early 20th century, and undertaking periodic news accuracy surveys. The results aren’t encouraging. The last big study of mainstream reporting accuracy found errors (defined below) in 59% of 4,800 stories across 14 metro newspapers. This level of inaccuracy — where about one in every two articles contains an error — has persisted for as long as news accuracy has been studied, over seven decades now.

With the explosion of available information, more than ever it’s time to get serious about accuracy, about knowing which sources can be trusted. Fortunately, there are emerging techniques that might help us to measure media accuracy cheaply, and then increase it. We could continuously sample a news source’s output to produce ongoing accuracy estimates, and build social software to help the audience report and filter errors. Meticulously applied, this approach would give a measure of the accuracy of each information source, and a measure of the efficiency of their corrections process (currently only about 3% of all errors are corrected.) The goal of any newsroom is to get the first number down and the second number up. I am tired of editorials proclaiming that a news organization is dedicated to the truth. That’s so easy to say that it’s meaningless. I want an accuracy process that gives us something more than a rosy feeling.

This is a long post, but there are lots of pretty pictures. Let’s begin with what we know about the problem.

Continue reading Measuring and improving accuracy in journalism

On “Balance”

April 12, 2011April 20, 2011editorial judgement, journalism, polarization2 Comments

What does it mean to say that someone’s reporting is “balanced”? I think it’s supposed to suggest something like “not one-sided,” which is really supposed to imply “fair.” I really do believe in fairness in journalism, but the whole “balance” metaphor seems completely wrong to me. Anyway it’s become clear to me that this word means different things to different people. Continue reading On “Balance”

The publishing industry needs a lightweight, open, paid syndication technology

April 3, 2011April 4, 2011business models, open standards, syndication19 Comments

I see a key open standard missing, crucial to the development of both user reading experience and publisher business models. Users want the “best” content from wherever it may have been published, presented beautifully and seamlessly within a single interface. Publishers want users to go to their site/app so that they can get paid, whether that’s through a subscription or by showing the user ads.

This tension is hugely visible in the content economy of today. It’s why Flipboard, Zite, and Google News are so loved by consumers yet so hated by publishers. It’s a manifestation of the “producers vs. aggregators” spat, a senseless culture war which reflects a legacy publishing industry structure that is ill-equipped to serve a digital public. This has spawned many lawsuits. These battles make no sense to the consumer, and indeed, the content supply chain is not the consumer’s problem. Nonetheless there is a real problem here, and lawyers alone cannot solve it.

What I start from is user experience. My user experience. Your user experience. I want whatever I want, all in one convenient cross-platform place. The product itself might be an expertly curated selection of articles, an algorithmic aggregator (Google News), a socially-filtered stream of content (Flipboard), or a system that tries to learn my content preferences over time (Zite.) The best method of content selection is far from settled, but it’s clear that it’s going to be very hard for a general-interest publishing brand to reliably attract my attention if all they can offer me is what they can create in-house. To adapt Bill Joy’s maxim, “most of the best content is on someone else’s site.”

The practice of pointing to content created by someone else within your product has come to be known, for better or for worse, as “aggregation”, though “curation” has also been used to describe the more manual version. (Personally I suspect distinction is meaningless, because algorithms embody editorial judgement, and there are strong hybrid approaches.) Because of the way the internet developed, many people have conflated aggregated content with free content. But this is not necessarily so. Aggregation has mostly been done by using links, and it’s not the aggregator who decides if the page on the other end of the link is free to view.

In an era of massive information abundance, filtering creates enormous value, and that’s what aggregation is. Aggregation in all its guises is really, really useful to all of us, and it’s here to stay. But linking as an aggregation method is starting to fall apart in important ways. It’s doesn’t provide a great user experience, and it doesn’t get producers paid. I strongly believe that we don’t want to discourage the linked, open nature of the internet, because widespread linking is an important societal good. Linking is both embodied knowledge and a web navigation system, and linking is incredibly valuable to journalism. Nonetheless, I see an alternative to linking that aligns the interests of publishers and consumers.

When Google News sends you to read an article, that article has a different user interface on each publisher’s site. When the Twitter or Flipboard apps show you an article they display only a stub, then require you to open a Safari window for the rest. This is a frustrating user experience, which Zite tried to remedy by using the readability engine to show you the clean article text right within the application. But of course this strips the ads from the original page, so the publisher doesn’t get paid, hence this cease and decist letter. For many kinds of content, somebody needs to get paid somewhere. (I’m not going to step today into the minefield of amateur-free vs. professional-paid content, except to say that both are valuable and will always be with us.) Payment means taking either some cash or some attention from the consumer. Lots of companies are working on payment systems to collect money from the consumer, and there have long been ad networks that distribute advertising to wherever it might be most valuable. What is missing is a syndication technology that moves content to where the user is, and money to the producer. The user gets an intergrated, smooth user experience that puts content from anywhere within one UI, and the publisher gets paid wherever their content goes.

This would be a B2B technology; payment would be between “aggregators” and “content creators,” though of course both roles are incredibly fluid and lots of companies do both at different times. To succeed, it needs to be a simple, open standard. Both complexity and legal encumbrances can be hurdles to wide adoption, and without wide adoption you can’t give consumers a wide choice of integrated sources. I’m imagining something like RSS, but with purchased authentication tokens. For the sake of simplicity and flexibility, both payment and rights enforcement need to be external to the standard. A publisher can use whatever e-business platform they like to sell authentication tokens at whatever pricing structure suits them, while merely expressing online rights — let alone enforcing them — is an incredibly complicated unsolved problem. Those problems will have to be worked on, but meanwhile, there’s no reason we can’t leverage our existing payment and rights infrastructures and solve just the technical problem of a simple, open, authenticated B2B content syndication system.

What I am trying to create is a fluid content marketplace which allows innovation in content packaging, filtering, and presentation. There is no guarantee that such a market will pay publishers anything like what they used to get for their content, and in fact I expect it won’t. But nothing can change the fact that there is way, way more content than there used to be, and much of it is both high-quality and legally free. If publishers want to extract the same level of revenue from you and I, they’re going to have to offer us something better than what we had before — such as, for example, an app that learns what I like to read and assembles that material into one clean interface. But it’s clear by now that no one content creator can ever satisfy the complete spectrum of consumer demand, so we need a mechanism to separate creation and packaging, while allowing revenue to flow from the companies that build the consumer-facing interfaces to the companies that supply the content. That means a paid syndication marketplace, which requires a paid syndication standard.

This idea has close links with the notion of content APIs, and what I am proposing amounts to an industry-standard paid-content API. Let’s make it possible for those who know what consumers want to give it to them, while getting producers paid.

Knight News Challenge 2011 Finalists

April 1, 2011April 6, 2011knight news challenge2 Comments

Knight says there are 28. These are the ones I am aware of who have publicly declared their finalist status, so far. In no particular order:

Scraperwiki – announcement, proposal, site
Overview (that’s me and colleagues) – announcement, proposal, demo
Recast – announcement, proposal
Newscloud – announcement, proposal, site
The Tiziano Project – announcement, proposal, site
PANDA – announcement, proposal
BlankOnBlank – announcement, can’t find the proposal (closed?), site
Mobile news submission/verification/delivery system for the Palestinian Territories – announcement, proposal, site
Plan Philly – announcement, proposal, site
Bringing OpenBlock to Rural Amercia – announcement, can’t find the proposal (closed?)
@waldojaquith says he got in, – announcement, closed proposal but he describes it in the comments below
@fin got in too – announcement, close proposal, but he gave hints: 1 2 3

I mostly started making this list as a way to learn about cool projects — and you should definitely check these out — but I admit it’s also a sort of obsessive-compulsive curation instinct at work.

Last updated 6 April. Do let me know of others.

The editorial search engine

March 26, 2011April 3, 2011computational journalism, reporting, search, storytelling51 Comments

It’s impossible to build a computer system that helps people find or filter information without at some point making editorial judgements. That’s because search and collaborative filtering algorithms embody human judgement about what is important to know. I’ve been pointing this out for years, and it seems particularly relevant to the journalism profession today as it grapples with the digital medium. It’s this observation which is the bridge between the front page and the search results page, and it suggests a new generation of digital news products that are far more useful than just online translations of a newspaper.

It’s easy to understand where human judgement enters into information filtering algorithms, if you think about how such things are built. At some point a programmer writes some code for, say, a search engine, and tests it by looking at the output on a variety of different queries. Are the results good? In what way do they fall short of the social goals of the software? How should the code be changed? It’s not possible to write a search engine without a strong concept of what “good” results are, and that is an editorial judgement.

I bring this up now for two reasons. One is an ongoing, active debate over “news applications” — small programs designed with journalistic intent — and their role in journalism. Meanwhile, for several years Google’s public language has been slowly shifting from “our search results are objective” to “our search results represent our opinion.” The transition seems to have been completed a few weeks ago, when Matt Cutts spoke to Wired about Google’s new page ranking algorithm:

In some sense when people come to Google, that’s exactly what they’re asking for — our editorial judgment. They’re expressed via algorithms. When someone comes to Google, the only way to be neutral is either to randomize the links or to do it alphabetically.

There it is, from the mouth of the bot. “Our editorial judgment” is “expressed via algorithms.” Google is saying that they have and employ editorial judgement, and that they write algorithms to embody it. They use algorithms instead of hand-curated lists of links, which was Yahoo’s failed web navigation strategy of the late 1990s, because manual curation doesn’t scale to whole-web sizes and can’t be personalized. Yet hand selection of articles is what human editors do every day in assembling the front page. It is valuable, but can’t fulfill every need.

Informing people takes more than reporting
Like a web search engine, journalism is about getting people the accurate information they need or want. But professional journalism is built upon pre-digital institutions and economic models, and newsrooms are geared around content creation, not getting people information. The distinction is important, and journalism’s lack of attention to information filtering and organization seems like a big omission, an omission that explains why technology companies have become powerful players in news.

I don’t mean to suggest that going out and getting the story — aka “reporting” — isn’t important. Obviously, someone has to provide the original report that then ricochets through the web via social media, links, and endless reblogging. Further, there is evidence that very few people do original reporting. Last year I counted the percentage of news outlets did their own reporting on one big story, and found that only 13 of 121 stories listed on Google News did not simply copy information found elsewhere. A contemporaneous Pew study of the news ecosystem of Baltimore found that most reporting was still done by print newspapers, with very little contributed by “new media,” though this study has been criticized for a number of potentially serious category problems. I’ve also repeatedly experienced the power that a single original report can have, as when I made a few phone calls to discover that Jurgen Habermas is not on Twitter, or worked with AP colleagues to get the first confirmation from network operators that Egypt had dropped off the internet. Working in a newsroom, obsessively watching the news propagate through the web, I see this every day: it’s amazing how few people actually pump original reports into the ecosystem.

But reporting isn’t everything. It’s not nearly enough. Reporting is just one part of ensuring that important public information is available, findable, and known. This is where journalism can learn something from search engines, because I suspect what we really want is a hybrid of human and algorithmic judgement.

As conceived in the pre-digital era, news is a non-personalized, non-interactive stream of updates about a small number of local or global stories. The first and most obvious departure from this model would be the ability to search within a news product for particular stories of interest. But the search function on most news websites is terrible, and mostly fails at the core task of helping people find the best stories about a topic of interest. If you doubt this, try going to your favorite news site and searching for that good story that you read there last month. Partially this is technical neglect. But at root this problem is about newsroom culture: the primary product is seen to be getting the news out, not helping people find what is there. (Also, professional journalism is really bad at linking between stories, and most news orgs don’t do fine-grained tracking of social sharing of their content, which are two of primary signals that search engines use to determine which articles are the most relevant.)

Story-specific news applications
We are seeing signs of a new kind of hybrid journalism that is as much about software as it is about about reporting. It’s still difficult to put names to what is happening, but terms like “news application” are emerging. There has been much recent discussion of the news app, including a session at the National Institute of Computer-Assisted Reporting conference in February, and landmark posts on the topic at Poynter and NiemanLab. Good examples of the genre include ProPublica’s dialysis facility locator, which combines investigative reporting with a search engine built on top of government data, and the Los Angeles Time’s real-time crime map, which plots LAPD data across multiple precincts and automatically detects statistically significant spikes. Both can be thought of as story-specific search engines, optimized for particular editorial purposes.

Yet the news apps of today are just toes in the water. It is no disrespect to all of the talented people currently working in the field say this, because we are at the beginning of something very big. One common thread in recent discussion of news apps has been a certain disappointment at the slow rate of adoption of the journalist-programmer paradigm throughout the industry. Indeed, with Matt Waite’s layoff from Politifact, despite a Pulitzer Prize for his work, some people are wondering if there’s any future at all in the form. My response is that we haven’t even begun to see the full potential of software combined with journalism. We are under-selling the news app because we are under-imagining it.

I want to apply search engine technology to tell stories. “Story” might not even be the right metaphor, because the experience I envision is interactive and non-linear, adapting to the user’s level of knowledge and interest, worth return visits and handy in varied circumstances. I don’t want a topic page, I want a topic app. Suppose I’m interested in — or I have been directed via headline to — the subject of refugees and internal migration. A text story about refugees due to war and other catastrophes is an obvious introduction, especially if it includes maps and other multimedia. And that would typically be the end of the story by today’s conventions. But we can do deeper. The International Organization for Migration maintains detailed statistics on the topic. We could plot that data, make it searchable and linkable. Now we’re at about the level of a good news app today. Let’s go further by making it live, not a visualization of a data set but a visualization of a data feed, an automatically updating information resource that is by definition evergreen. And then let’s pull in all of the good stories concerning migration, whether or not our own newsroom wrote them. (As a consumer, the reporting supply chain is not my problem, and I’ve argued before that news organizations need to do much more content syndication and sharing.) Let’s build a search engine on top of every last scrap of refugee-related content we can find. We could start with classic keyword search techniques, augment them by link analysis weighted toward sources we trust, and ingest and analyze the social streams of whichever communities deal with the issue. Then we can tune the whole system using our editorial-judgment-expressed-as-algorithms to serve up the most accurate and relevant content not only today, but every day in the future. Licensed content we can show within our product, and all else we can simply link to, but the search engine needs to be a complete index.

Rather than (always, only) writing stories, we should be trying to solve the problem of comprehensively informing the user on a particular topic. Web search is great, and we certainly need top-level “index everything” systems, but I’m thinking of more narrowly focussed projects. Choose a topic and start with traditional reporting, content creation, in-house explainers and multimedia stories. Then integrate a story-specific search engine that gathers together absolutely everything else that can be gathered on that topic, and applies whatever niche filtering, social curation, visualization, interaction and communication techniques are most appropriate. We can shape the algorithms to suit the subject. To really pull this off, such editorially-driven search engines need to be both live in the sense of automatically incorporating new material from external feeds, and comprehensive in the sense of being an interface to as much information on the topic as possible. Comprehensiveness will keep users coming back to your product and not someone else’s, and the idea of covering 100% of a story is itself powerful.

Other people’s content is content too
The brutal economics of online publishing dictate that we meet the needs of our users with as little paid staff time as possible. That drives the production process toward algorithms and outsourced content. This might mean indexing and linking to other people’s work, syndication deals that let a news site run content created by other people, or a blog network that bright people like to contribute to. It’s very hard for the culture of professional journalism to accept this idea, the idea that they should leverage other people’s work as far as they possibly can for as cheap as they can possibly get it, because many journalists and publishers feel burned by aggregation. But aggregation is incredibly useful, while the feelings and job descriptions of newsroom personnel are irrelevant to the consumer. As Sun Microsystems founder Bill Joy put it, “no matter who you are, most of the smartest people work for someone else,” and the idea that a single newsroom can produce the world’s best content on every topic is a damaging myth. That’s the fundamental value proposition of aggregation — all of the best stuff in one place. The word “best” represents editorial judgement in the classic sense, still a key part of a news organization’s brand, and that judgement can be embodied in whatever algorithms and social software are designed to do the aggregation. I realize that there are economic issues around getting paid for producing content, but that’s the sort of thing that needs to be solved by better content marketplaces, not lawsuits and walled gardens.

None of this means that reporters shouldn’t produce regular stories on their beats, or that there aren’t plenty of topics which require lots of original reporting and original content. But asking who did the reporting or made the content misses the point. A really good news application/interactive story/editorial search engine should be able to teach us as much as we care to learn about the topic, regardless of the state of our previous knowledge, and no matter who originally created the most relevant material.

What I am suggesting comes down to this: maybe a digital news product isn’t a collection of stories, but a system for learning about the world. For that to happen, news applications are going to need to do a lot of algorithmically-enhanced organization of content originally created by other people. This idea is antithetical to current newsroom culture and the traditional structure of the journalism industry. But it also points the way to more useful digital news products: more integration of outside sources, better search and personalization, and story-specific news applications that embody whatever combination of original content, human curation, and editorial algorithms will best help the user to learn.

[Updated 27 March with more material on social signals in search, Bill Joy’s maxim, and other good bits.]
[Updated 1 April with section titles.]