Dec 28 2011

Wiki variations

In the beginning there was Wikipedia, and it was brilliant. Somehow, making a set of pages that anyone could edit worked. The result was not cacophony but the greatest public collection of knowledge that the world has ever known. And that’s pretty much where we’ve left things, which is a great shame, because there’s so much more to be explored here.

A set of revision-controlled, hyperlinked topic pages is a stupidly useful form. It seems too simple to improve. What we can experiment with is how the pages are produced — which really seems like a far more interesting problem anyway. We can also look at novel ways to use a wiki. Here’s a brain dump of all the different directions I can imagine pushing the classic concept.

Who can edit? Just because Wikipedia is open to all doesn’t mean that all wikis must be. Actually, not even Wikipedia is open to everyone; admins can “protect” pages, restricting editing in various ways temporarily or permanently, or in extreme cases ban users entirely. But the presumption is openness. There are other wikis that start the other way around, such as news organizations’ “topic pages” which are only editable by staff. This control often results a much more consistent product and may also serve to minimize errors, though I’ve never been able to find a quantitative comparison versus pro journalism’s error rate. But the cost of being closed is that no one else can contribute. And sure enough, on most topics I find Wikipedia to be more comprehensive and up-to-date. Compare NYT vs Wikipedia on global warming.

Between entirely closed and entirely open there is a huge unexplored design space. The Washington Post’s WhoRunsGov, a directory of American government personell, was an example of what I’m going to call a “moderated wiki.” Anyone could submit an edit, but the changes had to be approved by staff before going up. WhoRunsGov is no longer up, so perhaps it was not considered a success, but I don’t know anything about why.

There are lots of other in-between possibilities. We could have a post-moderated wiki where changes are visible immediately but checked later, or employ any of the various reputation systems that are commonly used in community moderation; the basic idea is that proven editors have greater privilege and control. I can also imagine a system where all content is written by a small closed group, perhaps the staff of some organization, but the community votes on what articles need to be updated, and submits suggestions, links, etc. The staff then updates the pages according to the community priority. Openfile.ca embodies certain aspects of this.

Another simple variation: I have not yet seen a publicly visible wiki that is editable by everyone within a large organization (as opposed to a few sanctioned authors.) Organizations and communities already have elaborate structures for deciding who is “in” and who is “out,” and this could translate very naturally into editing rights.

Specialized Wikis. It’s going to be extraordinarily hard to produce a better general reference work than Wikipedia, with its millions of articles in dozens of languages and tens of thousands of editors. But your organization or community might know far more about finance, or green roofs, or global media law, or… Each topic potentially has its own community and its own dynamics that could lend itself to different types of editing schemes.

For that matter, Wikipedia’s content is freely re-usable under its Creative Commons CC-BY-SA license. It would be perfectly permissible to build a wiki interface that displayed specialized pages where available, and used Wikipedia content where it is not. Essentially, this is the choice to take editorial control of a certain small set of pages, while retaining the broad utility of a general reference.

Combine wiki and news content. For most people, the news isn’t really comprehensible without detailed background information. And vice versa: after reading a wiki article, I’m probably far more interested in the most recent news on that topic. It seems natural to build a user interface that combines a wiki page with a news stream on that topic, and several news organizations have tried this. But I haven’t found an example that really sings. For me, this is largely because they don’t leverage the broader world of available content. Where is the Wikipedia/Google News mashup?

The revision history of a page, the list of every edit over time, is also a form of  recorded news. James Bridle’s 12 volume edit history of “The Iraq War” makes this point beautifully. His work is paper performance art, but the concept has a natural online interpretation: a wiki that automatically highlights the sentences that have changed since the reader last visited that page. Rather than asking readers to construct the whole story from the updates, we would be showing them where the updates fit into the whole story. At least one experimental news site has tried this.

Authorship tracking. Although it is possible in principle to use the revision history on any Wikipedia article to determine who wrote what, both the culture and the user interface discourage this. This is not the only option. The U.S. intelligence community has Intellipedia, which logs authorship:

It’s the Wikipedia on a classified network, with one very important difference:  it’s not anonymous.  We want people to establish a reputation.  If you’re really good, we want people to know you’re good.  If you’re making contributions, we want that known.  If you’re an idiot, we want that known too.

This also works the other way around, where reputation of the author translates into credibility of the text. I’m not clear on exactly how Intellipedia’s attribution system works; perhaps it simply requires authenticated user logins, or maybe it includes UI features such as appending a user name to each contributed paragraph. One could also imagine systems that constructed a list of bylines based on who wrote how much in the current article. The “blame” function of software version control systems is technical precedent for automatically tracking individual contributions in a collaboratively edited file.

Sourcing and attribution. Wikipedia has three core content policies: neutral point of view (NPOV), no original research, and verifiability. Together, these policies describe what counts as “truth” in the Wikipedia world. NPOV is roughly equivalent to the classic notion of journalistic “objectivity,” no original research says that Wikipedia can never be a primary source, and verifiability says that all statements of fact must be cited (and defines, loosely, what counts as a reputable source for citations.)

The citation system used to enforce verifiability has its roots in age-old scholarship practices, while the no original research policy was originally drafted to exclude kooks with fringe theories. Together they have another extremely important effect: they offload the burden of credibility. Without these policies, the credibility of information on Wikipedia would have to lean far more heavily on the reputation of its authors; difficult to establish, since neither authorship nor authors are well tracked. By depending on the credibility of outside sources, Wikipedia was able to bootstrap from existing systems for authoritative knowledge, while maintaining the flexibility to incorporate any reasonable source.

There’s no reason that an already-credible organization couldn’t choose differently. Scientific journals, news organizations, government agencies etc. routinely act as the original publisher of crucial information, and it seems a small step to say that they could put that information in a wiki. The wiki would be credible to the extent that the organization is considered a credible author, which means that authorial tracking would also be required; perhaps certain “source” pages could designated read only, or all edits could be moderated, or there could be fine-grained attribution of text. They key point is that the user interface clearly distinguishes text that has been authoritatively vetted from text that has not.

Shared text. We need shared texts because we need shared understandings of the world. Without them, collective action becomes impossible and we all suffer. Wikipedia is an ambitious project to create a global knowledge system that is more or less acceptable to all people. The neutral point of view policy is important here, but the wide-open nature of Wikipedia is perhaps more essential to this vision. By definition, a consensus article is something that everyone is happy with; if an article ever reaches a state where it is amenable to all factions, there is no motivation for anyone to edit it further. That this happens for so many pages, even on contentious topics, is remarkable. The mechanics of this process are actually fairly extensive, including an elaborate tiered volunteer dispute resolution process that usually stabilizes edit wars.

There are variations here too. We could explore other methods of dispute resolution, or we could get more sophisticated about Wikipedia’s policy of representing multiple points of view. We could try to map the viewpoints of different authors directly, or we could have multiple versions of a page, each open to a different faction, and then compare the resulting texts to better understand where the differences lie. As always, there is no reason to imagine that “completely open” is the only option; but some openness seems essential.

And this cuts to the heart of what is unique about the wiki form. Open texts have a special legitimacy precisely because they are fragile: they can only exist when all who have an interest in the outcome manage to work together to create and preserve them. Wikipedia shows that this is possible in many more cases than we thought, but it is hardly the final word.

 

Share This:
  • Twitter
  • Facebook
  • Digg
  • del.icio.us
  • Google Bookmarks
  • Tumblr
  • email
  • StumbleUpon
  • Yahoo! Bookmarks
  • Ping.fm

One response so far

Nov 29 2011

What should the digital public sphere do?

Earlier this year, I discovered there wasn’t really a name for the thing I wanted to talk about. I wanted a word or phrase that includes journalism, social media, search engines, libraries, Wikipedia, and parts of academia, the idea of all these things as a system for knowledge and communication. But there is no such word. Nonetheless, this is an essay asking what all this stuff should do together.

What I see here is an ecosystem. There are narrow real-time feeds such as expertly curated Twitter accounts, and big general reference works like Wikipedia. There are armies of reporters working in their niches, but also colonies of computer scientists. There are curators both human and algorithmic. And I have no problem imagining that this ecosystem includes certain kinds of artists and artworks. Let’s say it includes all public acts and systems which come down to one person trying to tell another, “I didn’t just make this up. There’s something here of the world we share.”

I asked people what to call it. Some said “media.” That captures a lot of it, but I’m not really talking about the art or entertainment aspects of media. Also I wanted to include something of where ideas come from, something about discussions, collaborative investigation, and the generation of new knowledge. Other people said “information” but there is much more here than being informed. Information alone doesn’t make us care or act. It is part of, but only part of, what it means to connect to another human being at a distance.  Someone else said “the fourth estate” and this is much closer, because it pulls in all the ideas around civic participation and public discourse and speaking truth to power, loads of stuff we generally file under “democracy.” But the fourth estate today means “the press” and what I want to talk about is broader than journalism.

I’m just going to call this the “digital public sphere”, building on Jürgen Habermas’ idea of a place for the discussion of shared concerns, public yet apart from the state. Maybe that’s not a great name — it’s a bit dry for my taste — but perhaps it’s the best that can be done in three words, and it’s already in use as a phrase to refer to many of the sorts of things I want to talk about. “Public sphere” captures something important, something about the societal goals of the system, and “digital” is a modifier that means we have to account for interactivity, networks, and computation. Taking inspiration from Michael Schudson’s essay “Six or seven things that news can do for democracy,” I want to ask what the digital public sphere can do for us. I think I see three broad categories, which are also three goals to keep in mind as we build our institutions and systems.

1. Information. It should be possible for people to find things out, whatever they want to know. Our institutions should help people organize to produce valuable new knowledge. And important information should automatically reach each person at just the right moment.

2. Empathy. The vast majority of people in the world, we will only know through media. We must strive to represent the “other” to each-other with compassion and reality. We can’t forget that there are people on the other end of the wire.

3. Collective action. What good is public deliberation if we can’t eventually come to a decision and act? But truly enabling the formation of broad agreement also requires that our information systems support conflict resolution. In this age of complex overlapping communities, this role spans everything from the local to the global.

Each of these is its own rich area, and each of these roles already cuts across many different forms and institutions of media.

Information
I’d like to live in a world where it’s cheap and easy for anyone to satisfy the following desires:

  1. “I want to learn about X.”
  2. “How do we know that about X?”
  3. “What are the most interesting things we don’t know about X?”
  4. “Please keep me informed about X.”
  5. “I think we should know more about X.”
  6. “I know something about X and want to tell others.”

These desires span everything from mundane queries (“what time does the store close?”) to complex questions of fact (“what will be the effects of global climate change?”) And they apply at all scales; I might have a burning desire to know how the city government is going to deal with bike lanes, or I might be curious about the sum total of humanity’s knowledge of breast cancer — everything we know today, plus all the good questions we can’t yet answer. Different institutions exist to address each of these needs in various ways. Libraries have historically served the need to answer specific questions, desires number #1 and #2, but search engines also do this. Journalism strives to keep people abreast of current events, the essence of #4. Academia has focused on how we know and what we don’t yet know, which is #2 and #3.

This list includes two functions related to the production of new knowledge, because it seems to me that the public information ecosystem should support people working together to become collectively smarter. That’s why I’ve included #5, which is something like casting a vote for an unanswered question, and #6, the peer-to-peer ability to provide an answer. These seem like key elements in the democratic production of knowledge, because the resources which can be devoted to investigating answers are limited. There will always be a finite number of people well placed to answer any particular question, whether those people are researchers, reporters, subject matter experts, or simply well-informed. I like to imagine that their collective output is dwarfed by human curiosity. So efficiency matters, and we need to find ways to aggregate the questions of a community, and route each question to the person or people best positioned to find out the answer.

In the context of professional journalism, this amounts to asking what unanswered questions are most pressing to the community served by a newsroom. One could devise systems of asking the audience (like Quora and StackExchange) or analyze search logs (ala Demand Media.) That newsrooms don’t frequently do these things is, I think, an artifact of industrial history — and an unfilled niche in the current ecosystem. Search engines know where the gaps between supply and demand lie, but they’re not in the business of researching new answers. Newsrooms can produce the supply, but they don’t have an understanding of the demand. Today, these two sides of the industry do not work together to close this loop. Some symbiotic hybrid of Google and The Associated Press might be an uncannily good system for answering civic questions.

When new information does become available, there’s the issue of timing and routing. This is #4 again, “please keep me informed.” Traditionally, journalism has answered the question “who should know when?” with “everyone everything as fast as possible” but this is ridiculous today. I really don’t want my phone to vibrate for every news article ever written, which is why only “important” stories generate alerts. But taste and specialization dictate different definitions of “important” for each person, and old answers delivered when I need them might be just as valuable as new information delivered hot and fresh. Google is far down this track with its thinking on knowing what I want before I search for it.

Empathy 
There is no better way to show one person to another, across a distance, than the human story. These stories about other people may be informative, sure, but maybe their real purpose is to help us feel what it is like to be someone else. This is an old art; one journalist friend credits Homer with the last major innovation in the form.

But we also have to show whole groups to each other, a very “mass media” goal. If I’ve never met a Cambodian or hung out with a union organizer, I only know what I see in the media. How can and should entire communities, groups, cultures, races, interests or nations be represented?

A good journalist, anthropologist, or writer can live with a community for a while, observing and learning, then articulate generalizations. This is important and useful. It’s also wildly subjective. But then, so is empathy. Curation and amplification can also be empathetic processes: someone can direct attention to the genuine voices of a community. This “don’t speak, point” role has been articulated by Ethan Zuckerman and practiced by Andy Carvin.

But these are still at the level of individual stories. Who is representative? If I can only talk to five people, which five people should I know? Maybe a human story, no matter how effective, is just a single sample in the sense of a tiny part standing for the whole. Turning this notion around, making it personal, I come to an ideal: If I am to be seen as part of some group, then I want representations of that group to include me in some way. This is an argument that mass media coverage of a community should try to account for every person in that community. This is absurd in practical terms, but it can serve as a signpost, a core idea, something to aim for.

Fortunately, more inclusive representations are getting easier. Most profoundly, the widespread availability of peer-to-peer communication networks makes it easier than ever for a single member of a community to speak and be heard widely.

We also have data. We can compile the demographics of social movements, or conduct polls to find “public opinion.” We can learn a lot from the numbers that describe a particular population, which is why surveys and censuses persist. But data are terrible at producing the emotional response at the core of empathy. For most people, learning that 23% of the children in some state live in poverty lacks the gut-punch of a story about a child who goes hungry at the end of every month. In fact there is evidence that making someone think analytically about an issue actually makes them less compassionate.

The best reporting might combine human stories with broader data. I am impressed by CNN’s interactive exploration of American casualties in Iraq, which links mass visualization with photographs and stories about each individual. But that piece covers a comparatively small population, only a few thousand people. There are emerging techniques to understand much larger groups, such as by visualizing the data trails of online life, all of the personal information that we leave behind. We can visualize communities, using aggregate information to see the patterns of human association at all scales. I suspect that mass data visualization represents a fundamentally new way of understanding large groups, a way that is perhaps more inclusive than anecdotes yet richer than demographics. Also, visualization forces us into conversations about who exactly is a member of the community in question, because each person is either included in a particular visualization or not. Drawing such a hard boundary is often difficult, but it’s good to talk about the meanings of our labels.

And yet, for all this new technology, empathy remains a deeply human pursuit. Do we really want statistically unbiased samples of a community? My friend Quinn Norton says that journalism should “strive to show us our better selves.” Sometimes, what we need is brutal honesty. At other times, what we need is kindness and inspiration.

Collective action

What a difficult challenge advances in communication have become in recent decades. On the one hand they are definitely bringing us closer to each other, but are they really bringing us together?

- Ryszard Kapuściński, The Other

I am sensitive to the idea of filter bubbles and concerns about the fragmentation of media, the worry that the personalization of information will create a series of insular and homogenous communities, but I cannot abide the implied nostalgia for the broadcast era. I do not see how one-size-fits-all media can ever serve a diverse and specialized society, and so: let a million micro-cultures bloom! But I do see a need for powerful unifying forces within the public sphere, because everything from keeping a park clean to tackling global climate change requires the agreement and cooperation of a community.

We have long had decision making systems at all scales — from the neighborhood to the United Nations — and these mechanisms span a range from very lightweight and informal to global and ritualized. In many cases decision-making is built upon voting, with some majority required to pass, such as 51% or 66%. But is a vicious, hard-fought 51% in a polarized society really the best we can do? And what about all the issues that we will not be voting on — that is to say, most of them?

Unfortunately, getting agreement among even very moderate numbers of people seems phenomenally difficult. People disagree about methods, but in a pluralistic society they often disagree even more strongly about goals. Sometimes presenting all sides with credible information is enough, but strongly held disagreements usually cannot be resolved by shared facts; experimental work shows that, in many circumstances, polarization deepens with more information. This is the painful truth that blows a hole in ideas like “informed public” and “deliberative democracy.”

Something else is needed here. I want to bring the field of conflict resolution into the digital public sphere. As a named pursuit with its own literature and community, this is a young subject, really only begun after World War II. I love the field, but it’s in its infancy; I think it’s safe to say that we really don’t know very much about how to help groups with incompatible values find acceptable common solutions. We know even less about how to do this in an online setting.

But we can say for sure that “moderator” is an important role in the digital public sphere. This is old-school internet culture, dating back to the pre-web Usenet days, and we have evolved very many tools for keeping online discussions well-ordered, from classic comment moderation to collaborative filtering, reputation systems, online polls, and various other tricks. At the edges, moderation turns into conflict resolution, and there are tools for this too. I’m particularly intrigued by visualizations that show where a community agrees or disagrees along multiple axes, because the conceptually similar process of “peace polls” has had some success in real-world conflict situations such as Northern Ireland. I bet we could also learn from the arduously evolved dispute resolution processes of Wikipedia.

It seems to me that the ideal of legitimate community decision making is consensus, 100% agreement. This is very difficult, another unreachable goal, but we could define a scale from 51% agreement to 100%, and say that the goal is  “as consensus as possible” decision making, which would also be “as legitimate as possible.” With this sort of metric — and always remembering that the goal is to reach a decision on a collective action, not to make people agree for the sake of it — we could undertake a systematic study of online consensus formation. For any given community, for any given issue, how fragmented is the discourse? Do people with different opinions hang out in different places online? Can we document examples of successful and unsuccessful online consensus formation, as has been done in the offline case? What role do human moderators play, and how can well-designed social software contribute? How do the processes of online agreement and disagreement play out at different scales and under different circumstances? How we do know when the process has converged to a “good” answer, and when it has degraded into hegemony or groupthink? These are mostly unexplored questions. Fortunately, there’s a huge amount of related work to draw on: voting systems and public choice theory, social network analysis, cognitive psychology, information flow and media ecosystems, social software design, issues of identity and culture, language and semiotics, epistemology…

I would like conflict resolution to be an explicit goal of our media platforms and processes, because we cannot afford to be polarized and grid-locked while there are important collective problems to be solved. We may have lost the unifying narrative of the front page, but that narrative was neither comprehensive nor inclusive: it didn’t always address the problems of concern to me, nor did it ask me what I thought. Effective collective action, at all relevant scales, seems a better and more concrete goal than “shared narrative.” It is also an exceptionally hard problem — in some ways it is the problem of democracy itself — but there’s lots to try, and our public sphere must be designed to support this.

Why now?
I began writing this essay because I wanted to say something very simple: all of these things — journalism, search engines, Wikipedia, social media and the lot — have to work together to common ends. There is today no one profession which encompasses the entirety of the public sphere. Journalism used to be the primary bearer of these responsibilities — or perhaps that was a well-meaning illusion sprung from near monopolies on mass information distribution channels. Either way, that era is now approaching two decades gone. Now what we have is an ecosystem, and in true networked fashion there may not ever again be a central authority. From algorithm designers to dedicated curators to, yes, traditional on-the-scene pro journalists, a great many people in different fields now have a part in shaping the digital public sphere. I wanted try to understand what all of us are working toward. I hope that I have at least articulated goals that we can agree are important.

 

Share This:
  • Twitter
  • Facebook
  • Digg
  • del.icio.us
  • Google Bookmarks
  • Tumblr
  • email
  • StumbleUpon
  • Yahoo! Bookmarks
  • Ping.fm

21 responses so far

Oct 05 2011

What’s with this programmer-journalist identity crisis?

I’ve felt it myself: somehow, people want me to declare an identity. Am I really a programmer or a journalist? And if people ask you something a lot, you can internalize it. But I think I just figured out my definitive personal answer.

Other people have been thinking about this too. Like this person and this person and just about most of the news nerds out there. Partially this is because there is recognizably a community of people who like to program with journalistic intent within the more or less traditional journalism industry. That community needed an identity to help it stick together, so we got language like programmer-journalist and hacks/hackers, and the hyphens are always awkward.  Makes people wonder about the “right” balance. For that matter, I spend lots of time doing things that wouldn’t fit either label, yet somehow go together with both.

I’ve realized how to articulate my answer to “what’s your profession?” and such vexing questions as “what’s the difference between being a programmer-journalist and an IT person?” It’s this: can you code, are you good at helping people learn about their world, and do you see how software as civic media might contribute to some sort of democratic or social good / making the world a better place? Excellent.

Now suppose you work as one of these hyphenated creatures. Your on-the-job mixture of more traditionally journalistic-y activities (like talking to people to get otherwise unobtainable information) and more traditionally geeky activities (like all-weekend coding binges) is a matter of personal preference. If you personally find that you’d rather be doing more of something, or believe that it might be the sort activity that will improve the press in a way you believe is important, then you should try to do that. Choose different projects or talk to your boss or convince other people this is a good idea or change jobs or something. Do any of the things people do when they want to try to change the kind of work they’re paid to do.

Don’t worry about what the “right” mixture is or how you describe your affiliations. Just worry about living your life in a way that changes you and the world in a way that is pleasing.

And please, let’s not tell news organization IT people they’re not “journalists” or reporters they’re not real programmers. Are they creatively contributing to the mission of the organization? Then why deny the credit?

Share This:
  • Twitter
  • Facebook
  • Digg
  • del.icio.us
  • Google Bookmarks
  • Tumblr
  • email
  • StumbleUpon
  • Yahoo! Bookmarks
  • Ping.fm

3 responses so far

Sep 28 2011

Learn to program, then and now

Learning to program a computer is hard. While you can learn to make useful things in a few months, mastery may take a decade. It’s not like learning to bake a cake or shoot a video. It’s more like learning to play a musical instrument. It takes years of practice to get really good — or in the programmer’s case, tens of thousands of lines of production code. Meanwhile, you’re going to make the neighbors’ ears bleed.

Why would anyone do this? I think the reasons people invest such insane amounts of time in such a specialized skill are shifting. And I think that shift is healthy. It’s a shift in what it means to be a technologist. And the culture of our technical communities is shifting with it.

Back then
I learned to program in high school, early 90s. Looking back, I think my formative experiences as a technologist were pretty typical for my generation of programmers. I had three or four close friends in high school who also liked computers. They were all male. This was the dawn of the internet, around the time of the very first web browsers, and long before the first tech bubble made geeks into rich visionaries. We were not remotely cool. Technical information was somewhat harder to come by than today, but not excessively so. My first C++ compiler shipped in a big box which included a thick paper reference manual and a really nice language tutorial. We subscribed to Byte and Doctor Dobbs’ Journal. We hacked on stuff at lunch time and after school and weekends, and traded shareware on floppies. The technology was different, but the substance of the experience was much the same as today. We spent a lot of time at the computer, and we were well-connected into a community of like minded people. The community provided technical help but also motivation and inspiration.

We weren’t trying to change the world.  We were driven by an intense curiosity about the inner workings of machines, and we wanted to be admired for being good at something. I wrote the Windows port of Netrek, one of the very first multiplayer online games, and the local geeks knew who I was when I arrived at the University of Toronto. This kind of experience persisted through my undergraduate years studying computer science. Long nights in the computer lab; cool hacks. There’s a wonderful book which captures this culture as it evolved starting in the late 1950s.

Enter women
There were no women in the communities where I learned to program. Or, almost none. I did a head count in one of my classes: four out of 150 students. Sadly, this kind of ratio persists today in many technical fields. I didn’t really know why this was. Us nerdy boys would have welcomed geeky girls. For all sorts of the right and wrong reasons.

It’s only in the last few years that I’ve started to understand why the dominant nerd culture drove women away in droves. Simply put: it was a club full of very poorly socialized boys, and our peer-based motivation was all about status. We all wanted to be the alpha geek. We would jump all over each other to point out errors. We would never miss a chance to demonstrate our superior, elegant technical minds. We were completely insufferable to anyone else.

Fortunately, there are now more women in tech. And they’re starting to tell their tale. While I don’t want to generalize too much from the experiences of a single person, I found the account of Rebekah Cox to be really enlightening (there are lots more great stories in the same thread):

So, if you enter this environment as a woman without any sort of agenda or understanding of this culture the first thing you find is that if you actually say something the most likely reaction is for a guy to verbally hit you directly in the face. To the guys this is perfectly normal, expected and encouraged behavior but to women this is completely out of nowhere and extremely discouraging.

As a technical woman, this is your introduction and the first thing you have to learn is how to get back up and walk right back into a situation where the likelihood of getting punished for participating is one. How you choose to react to this determines the rest of your career in technology.

Now, I don’t want to give the wrong impression. It wasn’t all one-upmanship and verbal assaults. These geek scenes could also be wonderfully supportive, and often served as social groups too. You have to remember that this was before computers were cool, and it was an awkward adolescence when you were interested in things you couldn’t begin explain to anyone else. Also, it was a great learning environment. Cox again:

Even the aforementioned nerd trash talk is actually a useful tool that can help you. The reason that culture exists is to make everyone in the group better. The fact that you are getting hit in the face means that someone is either wrong and you can hit back with a correct answer or that you are wrong and someone is letting you know that directly. Sticking that out means you are learning in an accelerated environment with instant correction.

Furthermore, if you stick around long enough, you can find people who aren’t completely insecure and are confident enough to not resort to insults to assert themselves. Those people make the tough environment actually tolerable. If you can help each other then you can establish a safer zone to talk through ideas. And since those more secure people are typically so secure because they are really, really good, you can find yourself in an informational jet-stream.

In this artificial high-pressure environment we got good fast. But it was certainly off-putting to women, and not just women. Lots and lots of people wanted no part of this, and for good reason. Yet for quite a long time it was these sorts of socially dysfunctional communities that produced the lion’s share of the best technologists.

Why program?
Learning to program is still ridiculously hard, and still requires a community of practice. And it still requires an absurd focus and motivation. But the sources of that motivation are broadening. I’ve been watching this shift for a while. The notion of programming for the social good has even crystalized into institutions such as Random Hacks of Kindness (for international development), Hacks/Hackers (for journalists), and Code for Amercia (civic platforms.) For that matter, there’s Wikipedia. There are services and data all over the web. We don’t have to wonder whether software can change the world — it already has!

So by my old-school standards, the burgeoning hackers of today are very applied. I grew up desperately curious about the insides of things. Many of the programmers getting started now are far more extroverted than that. Here’s MIT Media Lab fellow Lisa Williams:

I want to learn to code because a lot of things piss me off.  

I believe a program can stand in opposition to Things That Suck, just like a documentary, a work of art, or a protest march.

I wanna code because SHIT IS BROKEN.  I want to code because corruption is realbecause people are getting thrown out of their houses, because veterans aren’t getting what they deserve, because racism is real and has real effects, because yes it does matter when you cancel a bus linebecause it’s really hard to shut a computer program up, because you can’t say it’s an isolated incident when there’s a bigass Google Map in your face showing you it’s not.

This is Lisa demanding “computational journalism.” But pretty much every field of human endeavor uses lots and lots of software now. Software not only determines what is possible, in many ways, but what is not possible: code-as-law. It’s part of the system, and if you want to hack the system, well, at some point someone has to hack the code. That person could be you.

Today
At the Online News Association conference last week, I ran into Michelle Minkoff and Heather Billings standing in front of couple dozen enthusiastic young journalists who had gathered in the hallway to hear about programming. Michelle works with me in the Interactive department at the AP, while Heather just started at the Chicago Tribune. Both are fearsome technologists, though I don’t think either would be offended if I said they are still near the beginning of their journey. That makes them the perfect people to talk to about learning to program.

Most of the people attending had some programming experience, but not much. There were 24 people listening to Michelle and Heather, 9 of whom were female. A great improvement. I sat in on this conversation for a while. It wasn’t what I was expecting. No code. Very little technical discussion at all actually.  One woman said she knew enough Python to write a Hangman game. “Great!” said Michelle. “You’re ready to learn Django!”

I guess I’m surprised anyone has to be told that they are ready to learn to program. But inclusion and connection was a major theme in the discussion. Here are some of the snippets of conversation I wrote down:

“You can make an anonymous account on StackOverflow and ask stupid questions.”

“Connect in person, build that mentor relationship.”

“But that documentation is for developers!”

This was a group of people who needed to be told that they could learn to program. That they could be one of them. This is understandable. When you can’t begin decipher the supposed “instructions,” technology can seem like an occult priesthood. But you don’t need them. You just need to want to do it, really badly, and you need to find some other people who want to do it badly too (and obviously, expect to meet these people online.) Then one of them becomes one of us. Of course you can learn to program. It just takes a while, and a stupid amount of practice.

In fact it’s probably necessary to devote a few years of your life to it full time. That’s one of the advantages of a computer science degree — time to focus. Also, a CS degree is a fast track to the deep theory of computation; if you find yourself looking at programming languages and asking why they are the way they are, or staring hungrily across the awesome gap between your web apps and a search engine, you probably want to learn computer science, and formal education is one way to do that. But CS theory won’t make you a programmer. Only programming will do that.

Every truly good programmer I have known had some period of their life where they thought of nothing but code. Something around a year or two. It’s got to get under your skin at some point. I call this the hacker gestation period. You’ll know you’ve reached the other side of it, because software will stop being mysterious. Eventually code becomes clay.

And this formative period is why it’s so important to have a community. You’re going to need friends who are interested in talking about geeky stuff. You’ll be so excited about it for a while that you won’t be able to talk about much else. (Really. If this is not the case, go do something else. Programming takes so much soul that you’re going to hate your life if you don’t genuinely enjoy it.) Your community will help you when you get stuck, and they will help you develop your sense of style. Code is the most obscure art, because only another programmer can see all the layers of beauty in a truly lovely piece of code. But it’s very hard to become an artist alone, without influences and critics.

So it takes a village to make a programmer. I won’t say that our technical villages are now inhabited by “normal” people, by any stretch of the imagination, but the communities where programmers are now growing up seem far more diverse, supportive, and extroverted than in years past.

 

 

Share This:
  • Twitter
  • Facebook
  • Digg
  • del.icio.us
  • Google Bookmarks
  • Tumblr
  • email
  • StumbleUpon
  • Yahoo! Bookmarks
  • Ping.fm

8 responses so far

Sep 22 2011

Journalism for makers

I find myself wondering what it would take to fix the global financial system, but most financial journalism doesn’t help me to answer this question. Something seems wrong here. The modern world is built on a series of vast systems, intricate combinations of people and machines, but our journalism isn’t really built to help us understand them. It’s not a journalism for the people who will put together the next generation of civic institutions.

My friend Maha Atal – whose profile of incoming IMF chief Christine Lagarde recently graced the cover of Forbes – tells me there are two histories of financial journalism, two types that have been practiced since the dawn of newsprint. One tradition began with lists of market prices and insurance rates and evolved into the financial data services and newswires we have today, a journalism of utility for people who want to make money. The other tradition she called “muckraking,” a journalism which interests itself in shady deals, insider trading, and undue influence. It looks for hypocrisy and offenses against the interests of the broader public.

Service to the status quo, and zealous suspicion of power. Are these really the only two stands that a journalist can take? When I imagine the global financial system improving, actually improving in the sense of changing in a way that makes the lives of very many people better — say, becoming less prone to the sort of systemic collapse that puts tens of millions out of work — I don’t see it much assisted by either of these approaches to reporting, necessary though they might be.

The financial system is just that: a system, sprawling, messy, very complex, built of people and laws and machines. It serves a great many ends, both humane and monstrously avaricious. It won’t be much improved by forcing a few traders to resign in disgrace, or focusing public fury on the bonuses of bank executives (which, obscene though they may be, remain just a drop in the bucket.) It seems rather that improvement will require international agreement on arcane solutions both political and technical, things like risk models, capital reserve requirements, and trading platforms. This is regulation both in the sense of law and in the sense of code-as-law, because software is a deep part of the infrastructure of modern finance. Markets don’t just happen; they are human desires channeled by what we have agreed to allow, and by what our technology has been built to support. Markets are designed things.

So maybe what we need are designers. Geeks who like to understand very complex systems, and tinker with them. I want to borrow from the culture of “makers,” because maker culture plants a flag on this idea. It draws on the hacker tradition of technical mastery, the DIY aesthetic perfected by the punks, and the best disruptive tendencies of global counter-culture. It lives in online forums and nerdy meetups and on the dingy couches of hack spaces. This is the chaotic ecosystem that powers Silicon Valley, and I bet it’s the secret ingredient that government planners miss when they build huge technology parks that end up empty.

But most of all, makers are deeply participatory. Where the political activist sees persuasion as the ultimate goal, the maker wants to personally rewire the system. This requires a deep love of the inner workings of things, the finicky, empirical details of how the real world is put together. A maker combines the democratic instinct with the technologist’s hands-on ability. And increasingly, makers are directing their attention to social problems. Efforts such as crisis mapping and Code For America and the whole information and communication technologies for development (ICT4D) movement are evidence of this. Maker language has recently been spotted at the White House and the United Nations.

The global financial system is just the sort of complex, intricate, part technical and part social system that makers would love, if only they could open it up and look inside. There are textbooks, but you can’t learn how the world actually works from textbooks. What would it take to open the global financial system to independent minds? Because it will be these independent minds — smart, deeply informed, creative — who will pore over the arcania of today in order to conceive of the better worlds to come.

Consider the latest draft of the Basel III standards for international banking. Who reads such dense and technical stuff? The professional regulator is obliged to sit at their desk and study this document. The financier wants only to understand how these rules will make or cost them money. The muckraker might ask who is making the rules and why. Another journalist will look for headlines of broad interest, but almost certainly won’t have the technical background to trace the subtle implications. But a maker would read these standards because they are changes in the operating system of global finance. And of these, it might be the maker, the specialized outsider, who is most qualified to understand the detailed, systemic effects on everyone else. The systems that underlie finance have become so fast and so complex that we don’t really understand the interactions. The people who know it best are mostly too busy making money to explain it to the rest of us. The public interest is in dire need of geeks who are not on the payroll.

There is a journalism to be done here, but it’s not the journalism of making people money, penning morality tales, or interesting articles in the Sunday paper. It’s a techno-social investigative journalism for those who have chosen to use their specialized knowledge in the interests of the rest of us. It’s a journalism that generalist reporters may be ill equipped to do.

We already have models for this. Dowser.org practices “solutions journalism,” writing about how best to solve societal problems. I appreciate that, but I don’t think they’ve conceived of their audience as the policy and technology geeks who will one day flesh out and implement those solutions. The contemporary science journalism ecosystem might be a better example. There are science reporters at news organizations, but the best science reporting now tends to come from elsewhere. Science, like finance, is absurdly specialized, and so its chronicling has been taken over by networks of specialists — very often scientists themselves, the ones who have learned to write. Science blogging is thriving. Its audience is the general public, yes, but also other scientists, because it’s the real thing. Even better, science writing exists in a web of knowledge: you can follow the links and go arbitrarily deep into the original research papers. And if you still have questions, the experts are already active online. Compare this to the experience of reading an economics article in the paper.

We don’t have much truly excellent journalism on deep, intricate topics, issues with enormous technical and institutional complexity. There’s some, but it’s mostly in trade publications with little sense of the social good, or tucked away in expensive journals which speak to us in grown-up tones and don’t know how to listen for the questions of the uninitiated. And yet our world abounds in complex problems! Sustainability, climate change, and energy production. Security, justice, and the delicate tradeoffs of civic freedoms. Health care for the individual, and for entire countries. The policies of government from the international to the municipal. And governments themselves, in all their gruesome operational detail. These things are not toys. But when journalists write about such issues, they satisfy themselves with discovering some flavor of corruption, or they end up removing so much of the substance that readers cannot hope to make a meaningful contribution. Perhaps this is because it has always been assumed that there is no audience for wonkish depth. And perhaps that’s true. Perhaps there won’t ever be a “mainstream” audience for this type of reporting, because the journalism of makers is directed to those who have some strange, burning desire to know the gory details, and are willing to invest years of their life acquiring background knowledge and building relationships. Can we not help these people? Could we encourage more of them to exist, if we served them better?

This is a departure from the broadcast-era idea of “the public.” It gives up on the romantic notion of great common narratives and tries instead to serve particular pieces of the vast mosaic of communities that comprise a society. But we are learning that when done well, this kind of deep, specialist journalism can strike surprising chords in a global population that is more educated than it has ever been. And the internet is very, very good at routing niche information to communities of interest. We have the data to show this. As Atlantic editor Alexis Madrigal put it, “I love analytics because I owe them my ability to write weird stories on the Internet.”

Where is the journalism for the idealist doer with a burning curiosity? I don’t think we have much right now, but we can imagine what it could be. The journalism of makers aligns itself with the tiny hotbeds of knowledge and practice where great things emerge, the nascent communities of change. Its aim is a deep understanding of the complex systems of the real world, so that plans for a better world may constructed one piece at a time by people who really know what they’re talking about. It never takes itself too seriously, because it knows that play is necessary for exploration and that a better understanding will come along tomorrow. It serves the talent pools that give rise to the people who are going to do the work of bringing us a potentially better world — regardless of where in society these people may be found, and whether or not they are already within existing systems of power. This is a theory of civic participation based on empowering the people who like to get their hands dirty tinkering with the future. Maybe that’s every bit as important as informing voters or getting politicians fired.

Share This:
  • Twitter
  • Facebook
  • Digg
  • del.icio.us
  • Google Bookmarks
  • Tumblr
  • email
  • StumbleUpon
  • Yahoo! Bookmarks
  • Ping.fm

26 responses so far

Aug 01 2011

Visualizing communities

There are in fact no masses; there are only ways of seeing people as masses.
Raymond Williams

Who are the masses that the “mass media” speaks to? What can it mean to ask what “teachers” or “blacks” or “the people” of a country think? These words are all fiction, a shorthand which covers over our inability to understand large groups of unique individuals. Real people don’t move in homogeneous herds, nor can any one person be neatly assigned to a single category. Someone might view themselves simultaneously as the inhabitant of a town, a new parent, and an active amateur astronomer. Now multiply this by a million, and imagine trying to describe the overlapping patchwork of beliefs and allegiances.

But patterns of association leave digital traces. Blogs link to each other, we have “friends” and “followers” and “circles,” we share interesting tidbits on social networks, we write emails, and we read or buy things. We can visualize this data, and each type of visualization gives us a different answer to the question “what is a community?” This is different from the other ways we know how to describe groups. Anecdotes are tiny slices of life that may or may not be representative of the whole, while statistics are often so general as to obscure important distinctions. Visualizations are unique in being both universal and granular: they have detail at all levels, from the broadest patterns right down to individuals. Large scale visualizations of the commonalities between people are, potentially, a new way to represent and understand the public — that is, ourselves.

I’m going to go through the major types of community visualizations that I’ve seen, and then talk about what I’d like to do with them. Like most powerful technologies, large scale visualization is a capability that can also be used to oppress and to sell. But I imagine social ends, worthwhile ways of using visualization to understand the “public” not as we imagine it, but as something closer to how we really exist.

Continue Reading »

Share This:
  • Twitter
  • Facebook
  • Digg
  • del.icio.us
  • Google Bookmarks
  • Tumblr
  • email
  • StumbleUpon
  • Yahoo! Bookmarks
  • Ping.fm

26 responses so far

Jul 26 2011

The new structure of stories: a reading list

Different medium, different story form. It’s clear that each new technology — photography, radio, television — has brought with it different ways of constructing a narrative, and different ways those narratives fit into the audience’s lives. Online media are no different, and the step between analog and digital is in many ways much larger than any that has come before, because the internet connects the audience to each other as well as to the newsroom.

Here’s my attempt at a little reading list of recent work on the structure of stories. Pointers to additional material are welcome!

The Big Picture
What’s wrong with the article, anyway? Jeff Jarvis explores this question in “The article as luxury or by-product.” This essay provoked lots of interesting reaction, such as from Mathew Ingram.

So how do we understand ways to fix this? Vadim Lavrusik takes a shot at this question and comes up with the building blocks of context, social, personalization, mobile, participation. It’s a good taxonomy so I’m going to partially steal it for this post.

At the  NYC Hacks Hackers meetup last week, Trei Brundrett took us through SB Nation’s “story stream” product, and Gideon Lichfield of The Economist gave a really nice run through of the “news thing” concept that was fleshed-out collaboratively last month at Spark Camp by Gideon, Matt Thompson, and a room full of others. Very meaty, detailed, up-to-the-minute discussions, for serious news nerds. Video here.

Context
You just can’t do better than Matt Thompson’s “An antidote for web overload.” I also recommend Matt’s wonderful “The three key parts of news stories that are usually missing.” Another good primer is Jay Rosen’s “Future of Context” talk at SXSW.

See also my “Short doesn’t mean shallow,” about hyperlinks as a contextual storytelling form.

For an example of these ideas in action, consider Mother Jone’s Egypt Explainer page — which Gideon Lichfield critiques in the video linked above.

Social
What does it mean for news to be social anyway? Henry Jenkins argues for the power of “spreadable media” as a new distribution model.

In “What’s the point of social news?” I discuss two areas where social media have a huge impact on news: the use of social networks as a personalized filter, and distributed sourcing of tips and material.

Personalization
News is now personalized by a variety of filters, both social and algorithmic. Eli Pariser argues this puts us in a “filter bubble.” He may be right, but research by Pew and others [1,2] consistently shows that when users are allowed to recommend any URL to one other, the “news agenda” that the audience constructs has only 5%-30% of stories in common with mainstream media.

comparison of questions asked of the White House by a Twitter audience vs. by journalists shows a remarkable difference in focus.  All of this this suggests to me that whatever else is happening, personalization meets an audience need that traditional broadcast journalism does not.

Besides, maybe not every person needs to see every story, if we view the goal of journalism as empowerment.

Participation
What do we know and what don’t we know about public participation in the journalism project, and what has worked or failed so far? Jay Rosen has an invaluable summary.

I also recommend the work of Amanda Michel as someone who does crowd-based reporting every day, and my own speculations on distributed investigative reporting.

Structured information
Is the product of journalism narratives or (potentially machine-readable) facts? Adrian Holovaty seems to be the first to have explored this in his 2006 essay “A fundamental way newspaper websites need to change.” This mantle has been more recently taken up by Stijn Debrouwere in his “Information Architecture for News Websites” series, and in Reg Chua’s “structured journalism,” and in a wide-ranging series at Xark.

There are close connections here to semantic web efforts, and occasional overlap between the semweb and journalism communities.

Mobile
I haven’t seen any truly good roundup posts on what mobile will mean for news story form, but there are some bits and pieces. Mobile is by-definition location aware, and Mathew Ingram examines how location is well used by Google News (and not by newsrooms.)

Meanwhile, Zach Seward of the Wall Street Journal has done some interesting news-related things with Foursquare.

Real time
Emily Bell, formerly of the Guardian and now at Columbia, explains why every news organization needs to be real-time.

For a granular look at how informations spreads in real time, consider Mathew Ingram on “Osama bin Laden and the new ecosystem of news.” For a case study of real-time mobile reporting, we have Brian Stelter’s “What I learned in Joplin.”

 

Share This:
  • Twitter
  • Facebook
  • Digg
  • del.icio.us
  • Google Bookmarks
  • Tumblr
  • email
  • StumbleUpon
  • Yahoo! Bookmarks
  • Ping.fm

19 responses so far

Jul 08 2011

A job posting that really doesn’t suck

I just got a pile of money to build a piece of state-of-the-art open-source visualization software, to allow journalists and curious people everywhere to make sense of enormous document dumps, leaked or otherwise.

Huzzah!

Now I am looking for a pair of professional developers to make it a reality. It won’t be hard for the calibre of person I’m trying to find to get some job, but I’m going to try to convince you that this is the best job.

The project is called Overview. You can read about it at overview.ap.org. It’s going to be a system for the exploration of large to very large collections of unstructured text documents. We’re building it in New York in the main newsroom of The Associated Press, the original all-formats global news network. The AP has to deal with document dumps constantly. We download them from government sites. We file over 1000 freedom of information requests each year. We look at every single leak from Wikileaks, Anonymous, Lulzsec. We’re drowning in this stuff. We need better tools. So does everyone else.

So we’re going make the killer app for document set analysis. Overview will start with a visual programming language for computational linguistics algorithms. Like Max/MSP for text. The output of that will be connected to some large-scale visualization. All of this will be backed by a distributed file store and computed through map-reduce. Our target document set size is 10 million. The goal is to design a sort of visualization sketching system for large unstructured text document sets. Kinda like Processing, maybe, but data-flow instead of procedural.

We’ve already got a prototype working, which we pointed at the Wikileaks Iraq and Afghanistan data sets and learned some interesting things. Now we have to engineer an industrial-strength open-source product. It’s a challenging project, because it requires production implementation of state-of-the-art, research-level algorithms for distributed computing, statistical natural language processing, and high-throughput visualization. And, oh yeah, a web interface. So people can use it anywhere, to understand their world.

Because that’s what this is about: a step in the direction of applied transparency. Journalists badly need this tool. But everyone else needs it too. Transparency is not an end in itself — it’s what you can do with the data that counts. And right now, we suck at making sense of piles of documents. Have you ever looked at what comes back from a FOIA request? It’s not pretty. Governments have to give you the documents, but they don’t have to organize them. What you typically get is a 10,000 page PDF. Emails mixed in with meeting minutes and financial statements and god-knows what else. It’s like being let into a decrepit warehouse with paper stacked floor to ceiling. No boxes. No files. Good luck, kiddo.

Intelligence agencies have the necessary technology, but you can’t have it. The legal profession has some pretty good “e-discovery” software, but it’s wildly expensive. Law enforcement won’t share either. There are a few cheapish commercial products but they all choke above 10,000 documents because they’re not written with scalable, distributed algorithms. (Ask me how I know.) There simply isn’t an open, extensible tool for making sense of huge quantities of unstructured text. Not searching it, but finding the patterns you didn’t know you were looking for. The big picture. The Overview.

So we’re making one. Here are the buzzwords we are looking for in potential hires:

  • We’re writing this in Java or maybe Scala. Plus JavaScript/WebGL on the client side.
  • Be a genuine computer scientist, or at least be able to act like one. Know the technologies above, and know your math.
  • But it’s not just research. We have to ship production software. So be someone who has done that, on a big project.
  • This stuff is complicated! The UX has to make it simple for the user. Design, design, design!
  • We’re open-source. I know you’re cool with that, but are you good at leading a distributed development community?

And that’s pretty much it. We’re hiring immediately. We need two. It’s a two-year contract to start. We’ve got a pair of desks in the newsroom in New York, with really nice views of the Hudson river. Yeah, you could write high-frequency trading software for a hedge fund. Or you could spend your time analyzing consumer data and trying to get people to click on ads. You could code any of a thousand other sophisticated projects. But I bet you’d rather work on Overview, because what we’re making has never been done before. And it will make the world a better place.

For more information, see :

Thanks for your time. Please contact jstray@ap.org if you’d like to work on this.

Share This:
  • Twitter
  • Facebook
  • Digg
  • del.icio.us
  • Google Bookmarks
  • Tumblr
  • email
  • StumbleUpon
  • Yahoo! Bookmarks
  • Ping.fm

5 responses so far

May 25 2011

The challenges of distributed investigative journalism

One of the clearest ideas to emerge from the excitement around the new media transformation of journalism is the notion that the audience should participate in the process. This two way street has been nicely described by Guardian editor Alan Rusbridger as the “mutualization of journalism.” But how to do it? What’s missing from what has been tried so far? Despite many experiments, the territory is still so unexplored that it’s almost impossible to say what will work without trying it. With that caveat, here are some more or less wild speculations about the sorts of tools that “open” investigative journalism might need to work.

There have been many collaborative journalism projects, from the Huffington Post’s landmark “Off The Bus” election campaign coverage to the BBC’s sophisticated “user-generated content hub” to CNN’s iReport. One lesson in all of this is that form matters. Take the lowly comment section. News site owners have long complained, often with good reason, that comments are a mess of trolls and flame wars. But the prompt is supremely important in asking for online collaboration. Do journalists really want “comments”? Or do they want error corrections, smart additions, leads, and evidence that furthers the story?

Which leads me to investigative reporting. It’s considered a specialty within professional journalism, dedicated to getting answers to difficult questions — often answers that are embarrassing to those in power. I don’t claim to be very good at journalistic investigations, but I’ve done enough reporting to understand the basics. Investigative reporting is as much about convincing a source to talk as it is about filing a FOIA request, or running a statistical analysis on a government data feed. At heart, it seems to be a process of assembling widely dispersed pieces of information — connecting the distributed dots. Sounds like a perfect opportunity for collaborative work. How could we support that?

A system for tracking what’s already known
Reporters keep notes. They have files. They write down what was said in conversations, or make recordings. They collect documents. All of this material is typically somewhere on or around a reporter’s desk or sitting on their computer. That means it’s not online, which means no one else can build on it. Even within the same newsroom, notes and source materials are seldom shared. We have long had customer relationship management systems that track every contact with a customer. Why not a “source relationship management” system that tracks every contact with every source by every reporter in the newsroom? Ideally, such a system would be integrated into the reporter’s communications tools: when I make a phone call and hit record (after getting the source’s permission of course) that recording could be automatically entered into system’s files, stamped by time, date, and source, then transcribed by machine to make it searchable. Primary documents would be also be filed in the system, along with notes and links and comments from everyone working on the story. The entire story of the story could be in one place.

There have been experiments in collaborative journalistic files, such as OpenFile.ca or even good local wikis. But I don’t believe there has yet been a major professional newsroom which operated with open files. For that matter, I am not aware of this type of information filing system in existence anywhere in journalism, though I suspect it’s what intelligence services do.

Public verification processes
Journalism aims to be “true,” a goal which requires elaborate verification processes. But in every newsroom I’ve worked with, essential parts of the verification standards are not codified. “At least two sources” is a common maxim, but are there any situations where one is enough? For that matter, who counts as a definitive source? When is a conflict of interest serious enough to disqualify what someone is telling you? The answers to these questions and many more are a matter of professional practice and culture. This is confusing enough for a new reporter joining staff, let alone outsiders who might want to help.

Verification is necessarily contextual. Both the costs of verification and the consequences of being in error vary widely with circumstance, so journalists must make situational choices. How sure do we have to be before we say something is true, how do we measure that certainty, and what would it take to be more sure? Until this sort of nuanced guidance is made public, and the public is provided with experienced support to encourage good calls in complex or borderline cases, it won’t be possible to bring enthusiastic outsiders fully into the reporting process. They simply won’t know what’s expected of them, to be able to participate in the the production of a product to certain standards. Those standards depend on what accuracy/cost/speed tradeoffs best serve the communities that a newsroom writes for, which means that there is audience input here too.

What is secret, or, who gets to participate?
Traditionally, a big investigative story is kept completely secret until it’s published. This is shifting, as some journalists begin to view investigation as more of a process than a product. However, you may not want the subject of an investigation to know what you already know. It might, for example, make your interview with a bank CEO tricky if they know you’ve already got the goods on them from a former employee. There are also off-the-record interviews, embargoed material, documents which cannot legally be published, and a multitude of concerns around the privacy rights of individuals. I agree with Jay Rosen when he says that “everything a journalist learns that he cannot tell the public alienates him from the public,” but that doesn’t mean that complete openness is the solution in all cases. There are complex tradeoffs here.

So access to at least some files must be controlled, for at least some period of time. Ok then — who gets to see what, when? Is there a private section that only staff can see and a public section for everyone else? Or, what about opening some files up to trusted outsiders? That might be a powerful way to extend investigations outside the boundaries of the newsroom, but it brings in all the classic problems of distributed trust, and more generally, all the issues of “membership” in online communities. I can’t say I know any good answers. But because the open flow of information can be so dramatically productive, I’d prefer to start open and close down only where needed. In other words, probably the fastest way to learn what truly needs to be secret is to blow a few investigations when someone says something they shouldn’t have, then design processes and policies to minimize those failure modes.

There is also a professional cultural shift required here, towards open collaboration. Newsrooms don’t like to get scooped. Fair enough, but my answer to this is to ask what’s more important: being first, or collectively getting as much journalism done as possible?

Safe places for dangerous hypotheses
Investigative journalism requires speculation. “What if?” the reporter must say, then go looking for evidence. (And equally, “what if not?” so as not to fall prey to confirmation bias.) Unfortunately, “what if the district attorney is a child molester?” is not a question that most news organizations can tolerate on their web site. In the worst case, the news organization could be sued for libel. How can we make a safe and civil space — both legally and culturally — for following speculative trains of thought about the wrongdoings of the powerful? One idea, which is probably a good idea for many reasons, is to have very explicit marking of what material is considered “confirmed,” “vetted,” “verified,” etc. and what material is not. For example, iReport has such an endorsement system. A report marked “verified” would of course have been vetted according to the public verification process. In the US, that marking plus CDA section 230 might solve the legal issues.

A proposed design goal: maximum amplification of staff effort
There are very many possible stories, and very few paid journalists. The massive amplification of staff effort that community involvement can provide may be our only hope for getting the quantity and quality of journalism that we want. Consider, for example, Wikipedia. With a paid staff of about 35 they produce millions of near-real time topic pages in dozens of languages.

But this is also about the usability of the social software designed to facilitate collaborative investigations. We’ll know we have the design right when lots of people want to use it. Also: just how much and what types of journalism could volunteers produce collaboratively? To find out, we could try to get the audience to scale faster than newsroom staff size. To make that happen, communities of all descriptions would need to find the newsroom’s public interface a useful tool for uncovering new information about themselves even when very little staff time is available to help them. Perhaps the best way to design a platform for collaborative investigation would be to imagine it as encouraging and coordinating as many people as possible in the production of journalism in the broader society, with as few full time staff as possible. These staff would be experts in community management and information curation. I don’t believe that all types of journalism can be produced this way or that anything like a majority of people will contribute to the process of journalism. Likely, only a few percent will. But helping the audience to inform itself on the topics of its choice on a mass scale sounds like civic empowerment to me, which I believe to be a fundamental goal of journalism.

Share This:
  • Twitter
  • Facebook
  • Digg
  • del.icio.us
  • Google Bookmarks
  • Tumblr
  • email
  • StumbleUpon
  • Yahoo! Bookmarks
  • Ping.fm

7 responses so far

Apr 20 2011

Measuring and improving accuracy in journalism

Professional journalism is supposed to be “factual,” “accurate,” or just plain true. Is it? Has news accuracy been getting better or worse in the last decade? How does it vary between news organizations, and how do other information sources rate? Is professional journalism more or less accurate than everything else on the internet? These all seem like important questions, so I’ve been poking around, trying to figure out what we know and don’t know about the accuracy of our news sources. Meanwhile, the online news corrections process continues to evolve, which gives us hope that the news will become more accurate in the future.

Accuracy is a hard thing to measure because it’s a hard thing to define. There are subjective and objective errors, and no standard way of determining whether a reported fact is true or false. But a small group of academics has been grappling with these questions since the early 20th century, and undertaking periodic news accuracy surveys. The results aren’t encouraging. The last big study of mainstream reporting accuracy found errors (defined below) in 59% of 4,800 stories across 14 metro newspapers. This level of inaccuracy — where about one in every two articles contains an error — has persisted for as long as news accuracy has been studied, over seven decades now.

With the explosion of available information, more than ever it’s time to get serious about accuracy, about knowing which sources can be trusted. Fortunately, there are emerging techniques that might help us to measure media accuracy cheaply, and then increase it. We could continuously sample a news source’s output to produce ongoing accuracy estimates, and build social software to help the audience report and filter errors. Meticulously applied, this approach would give a measure of the accuracy of each information source, and a measure of the efficiency of their corrections process (currently only about 3% of all errors are corrected.) The goal of any newsroom is to get the first number down and the second number up. I am tired of editorials proclaiming that a news organization is dedicated to the truth. That’s so easy to say that it’s meaningless. I want an accuracy process that gives us something more than a rosy feeling.

This is a long post, but there are lots of pretty pictures. Let’s begin with what we know about the problem.

Continue Reading »

Share This:
  • Twitter
  • Facebook
  • Digg
  • del.icio.us
  • Google Bookmarks
  • Tumblr
  • email
  • StumbleUpon
  • Yahoo! Bookmarks
  • Ping.fm

24 responses so far

« Prev - Next »