The challenges of distributed investigative journalism

One of the clearest ideas to emerge from the excitement around the new media transformation of journalism is the notion that the audience should participate in the process. This two way street has been nicely described by Guardian editor Alan Rusbridger as the “mutualization of journalism.” But how to do it? What’s missing from what has been tried so far? Despite many experiments, the territory is still so unexplored that it’s almost impossible to say what will work without trying it. With that caveat, here are some more or less wild speculations about the sorts of tools that “open” investigative journalism might need to work.

There have been many collaborative journalism projects, from the Huffington Post’s landmark “Off The Bus” election campaign coverage to the BBC’s sophisticated “user-generated content hub” to CNN’s iReport. One lesson in all of this is that form matters. Take the lowly comment section. News site owners have long complained, often with good reason, that comments are a mess of trolls and flame wars. But the prompt is supremely important in asking for online collaboration. Do journalists really want “comments”? Or do they want error corrections, smart additions, leads, and evidence that furthers the story?

Which leads me to investigative reporting. It’s considered a specialty within professional journalism, dedicated to getting answers to difficult questions — often answers that are embarrassing to those in power. I don’t claim to be very good at journalistic investigations, but I’ve done enough reporting to understand the basics. Investigative reporting is as much about convincing a source to talk as it is about filing a FOIA request, or running a statistical analysis on a government data feed. At heart, it seems to be a process of assembling widely dispersed pieces of information — connecting the distributed dots. Sounds like a perfect opportunity for collaborative work. How could we support that?

A system for tracking what’s already known
Reporters keep notes. They have files. They write down what was said in conversations, or make recordings. They collect documents. All of this material is typically somewhere on or around a reporter’s desk or sitting on their computer. That means it’s not online, which means no one else can build on it. Even within the same newsroom, notes and source materials are seldom shared. We have long had customer relationship management systems that track every contact with a customer. Why not a “source relationship management” system that tracks every contact with every source by every reporter in the newsroom? Ideally, such a system would be integrated into the reporter’s communications tools: when I make a phone call and hit record (after getting the source’s permission of course) that recording could be automatically entered into system’s files, stamped by time, date, and source, then transcribed by machine to make it searchable. Primary documents would be also be filed in the system, along with notes and links and comments from everyone working on the story. The entire story of the story could be in one place.

There have been experiments in collaborative journalistic files, such as or even good local wikis. But I don’t believe there has yet been a major professional newsroom which operated with open files. For that matter, I am not aware of this type of information filing system in existence anywhere in journalism, though I suspect it’s what intelligence services do.

Public verification processes
Journalism aims to be “true,” a goal which requires elaborate verification processes. But in every newsroom I’ve worked with, essential parts of the verification standards are not codified. “At least two sources” is a common maxim, but are there any situations where one is enough? For that matter, who counts as a definitive source? When is a conflict of interest serious enough to disqualify what someone is telling you? The answers to these questions and many more are a matter of professional practice and culture. This is confusing enough for a new reporter joining staff, let alone outsiders who might want to help.

Verification is necessarily contextual. Both the costs of verification and the consequences of being in error vary widely with circumstance, so journalists must make situational choices. How sure do we have to be before we say something is true, how do we measure that certainty, and what would it take to be more sure? Until this sort of nuanced guidance is made public, and the public is provided with experienced support to encourage good calls in complex or borderline cases, it won’t be possible to bring enthusiastic outsiders fully into the reporting process. They simply won’t know what’s expected of them, to be able to participate in the the production of a product to certain standards. Those standards depend on what accuracy/cost/speed tradeoffs best serve the communities that a newsroom writes for, which means that there is audience input here too.

What is secret, or, who gets to participate?
Traditionally, a big investigative story is kept completely secret until it’s published. This is shifting, as some journalists begin to view investigation as more of a process than a product. However, you may not want the subject of an investigation to know what you already know. It might, for example, make your interview with a bank CEO tricky if they know you’ve already got the goods on them from a former employee. There are also off-the-record interviews, embargoed material, documents which cannot legally be published, and a multitude of concerns around the privacy rights of individuals. I agree with Jay Rosen when he says that “everything a journalist learns that he cannot tell the public alienates him from the public,” but that doesn’t mean that complete openness is the solution in all cases. There are complex tradeoffs here.

So access to at least some files must be controlled, for at least some period of time. Ok then — who gets to see what, when? Is there a private section that only staff can see and a public section for everyone else? Or, what about opening some files up to trusted outsiders? That might be a powerful way to extend investigations outside the boundaries of the newsroom, but it brings in all the classic problems of distributed trust, and more generally, all the issues of “membership” in online communities. I can’t say I know any good answers. But because the open flow of information can be so dramatically productive, I’d prefer to start open and close down only where needed. In other words, probably the fastest way to learn what truly needs to be secret is to blow a few investigations when someone says something they shouldn’t have, then design processes and policies to minimize those failure modes.

There is also a professional cultural shift required here, towards open collaboration. Newsrooms don’t like to get scooped. Fair enough, but my answer to this is to ask what’s more important: being first, or collectively getting as much journalism done as possible?

Safe places for dangerous hypotheses
Investigative journalism requires speculation. “What if?” the reporter must say, then go looking for evidence. (And equally, “what if not?” so as not to fall prey to confirmation bias.) Unfortunately, “what if the district attorney is a child molester?” is not a question that most news organizations can tolerate on their web site. In the worst case, the news organization could be sued for libel. How can we make a safe and civil space — both legally and culturally — for following speculative trains of thought about the wrongdoings of the powerful? One idea, which is probably a good idea for many reasons, is to have very explicit marking of what material is considered “confirmed,” “vetted,” “verified,” etc. and what material is not. For example, iReport has such an endorsement system. A report marked “verified” would of course have been vetted according to the public verification process. In the US, that marking plus CDA section 230 might solve the legal issues.

A proposed design goal: maximum amplification of staff effort
There are very many possible stories, and very few paid journalists. The massive amplification of staff effort that community involvement can provide may be our only hope for getting the quantity and quality of journalism that we want. Consider, for example, Wikipedia. With a paid staff of about 35 they produce millions of near-real time topic pages in dozens of languages.

But this is also about the usability of the social software designed to facilitate collaborative investigations. We’ll know we have the design right when lots of people want to use it. Also: just how much and what types of journalism could volunteers produce collaboratively? To find out, we could try to get the audience to scale faster than newsroom staff size. To make that happen, communities of all descriptions would need to find the newsroom’s public interface a useful tool for uncovering new information about themselves even when very little staff time is available to help them. Perhaps the best way to design a platform for collaborative investigation would be to imagine it as encouraging and coordinating as many people as possible in the production of journalism in the broader society, with as few full time staff as possible. These staff would be experts in community management and information curation. I don’t believe that all types of journalism can be produced this way or that anything like a majority of people will contribute to the process of journalism. Likely, only a few percent will. But helping the audience to inform itself on the topics of its choice on a mass scale sounds like civic empowerment to me, which I believe to be a fundamental goal of journalism.

Identity, Anonymity, and Controlling Trolls

Multiple personalities

Flame wars and jihadist rants and generally worthless behavior in the comments: that’s the problem I’m trying to solve here.

And I’m trying to do it while preserving anonymity. Internet conversation can get nasty when the participants are anonymous, which has led to proposals of tying all online identities to “real” identities. This is the wrong solution to the troll problem, because it destroys privacy in a serious way. I want to build discussion systems that allow anonymous comments, yet remain orderly, civil, and enlightening. I think this can be done with filtering systems based on reputation.

Reputation is a thing that sticks to an identity. Historically most people had only one identity, closely tied to their physical presence. But now, online, every one of us has multiple identities: think of how many user names and logins you have. There’s some consolidation going on, in the increasing acceptance of Google, Twitter, and Facebook logins across the web, and this is mostly a good thing.  But I don’t think we want to aim for a world where each person has only one online identity. Multiple identities are good and useful.

Multiple identities are closely related to anonymity. Anonymity doesn’t mean having no identity, it means not being able to tie one of my identities to the others. I want to be very careful about who gets to tie the different parts of me together. I’m going to give two arguments for this, which I’ll call the “does your mother know” and “totalitarian state” arguments. They’re both really important. I’d be really if sad if we lost anonymity in either case. And after I’ve convinced you that we need anonymity, I’ll talk about how we get people to behave even if they don’t leave a name.

Keeping the different facets of ourselves apart is the essence of privacy. We’ve always been different people in different contexts, but this was only possible because we could expect that word of what we did with our friends last night would not get back to our mother. This expectation depends upon the ability to separate our actions in different contexts;  your mom or your boss knows that someone in the community is going on a bender/having kinky sex/voting Republican, but she doesn’t know it’s you. The ability to have different identities in different contexts is intricately tied to privacy, and in my mind no different than setting a post to “friends only” or denying the details of your personal life. Although the boundaries around what is “personal” are surely changing, if you really think we’re heading toward a world where everybody knows everything about everyone, you’re mad. For one thing, secrets are immensely valuable to the business world.

And then there’s China. I live right next door to the most invasive regime in the world. The Chinese government, and certain others such as Korea, are trying very hard to tie online and corporeal identities together by instituting real name policies. This makes enforcement of legal and social norms easier. Which is great until you disagree. Every damn blog comment everywhere is traceable to you. Every Wikipedia edit. Everything. China is trying as hard as it can to make opposing speech literally impossible. This is not theoretical. As of last week, you can’t send dirty words through SMS.

When the digital panopticon is a real possibility, I think that the ability to speak without censure is vital to the balance of power in all sectors. Anonymity is important to a very wide range of interests, as the diversity of the Tor project shows us. Tor is a tool and a network for anonymity online, and it is sponsored by everyone from rights activist groups to the US Department of Defense to journalists and spies. Anonymity is very, very useful, and is deeply tied to the human right of privacy.

Right, but… how do we get sociopaths to play nice in the comments section if they can say anything they want without repercussions?

The general answer is that we encourage social behavior online in exactly the way we encourage it offline: social norms and peer pressure. We can build social tools into our online systems, just like we already do. A simple example is the “flag this” link on many commenting systems. Let’s teach people to click it when they mean “this is a useless post by troll.” Collaborative moderation systems — such as “rate this post” features of all kinds — work similarly.

Collaborative moderation is a really big, important topic, and I’ll write more about it later. There are voting systems of all kinds, and the details matter. Compare Slashdot versus Digg versus Reddit. But all of these systems rate comments, not users, and I think this makes them weaker than they could be at suppressing trolls and spam. Identities matter, because identities have reputations.

Reputation is an expectation about how an identity will behave. It is built up over time. Crucially, a throw-away “anonymous” identity doesn’t have it. That’s why systems based on reputation in various forms work to produce social behavior. There are “currency” systems like StackOverflow‘s karma where one user can give another credit for answering a question. There are voting systems such as the Huffington Post‘s “I’m a fan of (comment poster)” which are designed to identity trustworthy users. Even Twitter Lists are a form of reputation system, where one user can choose to continuously rebroadcast someone else’s tweets.

And in the context of online discussion, you use reputation to direct attention.

That’s what filtering is: directing attention. And this is how you deal with trolls without restricting freedom of speech: you build collaborative filters based on reputation. Reputation is powerful precisely because it predicts behavior. New or “anonymous” identities would have no reputation and thus command little attention (at least until they said a few interesting things) while repeat offenders would sink to the bottom. Trolls would still exist, but they simply wouldn’t be heard.

NB, none of this requires tying online identities to corporeal people. Rather than being frightened of anonymity and multiple identities, I think we need to embrace them. We need to trust that we can evolve the right mixes of software and norms so that collaboration overwhelms vandalism, just as Wikipedia did. This field is mostly unexplored. We need to learn how identity relates to trust and reputation and action. And we need to think of social software as architecture, a space that shapes and channels the behavior of the people in it.

Simply trying to make it impossible to do anything bad will destroy much that is great about the internet. And it lacks imagination.