UN asks Ushahidi to produce Crisis Map of Libya

Patrick Meier and some of the rest of the team behind the amazing Haiti earthquake crisis mapping effort have set up a Libya Crisis Map which plots tweets and many of other kinds of on-the-ground reports on a map of the country. The reports come from citizens and journalists and aid agencies and anyone else, and each is tagged as “verified” or “unverified.” Sensitive information, which might be used to retaliate against sources or others, is redacted.

Meier recently wrote about how the Libyan situation differs from Haiti. The situation is much more politically complex than Haiti, but also, this time the UN’s OCHA relief coordination agency asked for the map:

The second reason why this is no Haiti is because the request for activation of the Standby Task Force to provide live crisis mapping support came directly from the UN OCHA’s Information Management unit in Geneva. This was not the case in Haiti since there was no precedent for the crisis mapping efforts we launched at the time. We did not have buy in from the humanitarian community and the latter was reluctant to draw on anything other than official sources of information. Crowdsourcing and social media were unchartered territories. OCHA also reached out to CrisisCommons and OpenStreetMap and we are all working together more closely than ever before.

Contrast this to the case of Libya this week which saw an established humanitarian organization specifically request a volunteer technical community for a live map of reports generated from Twitter, Facebook, Flickr, YouTube and mainstream media sources. Seriously, I have never been more impressed by the humanitarian community than I am today. The pro-active approach they have taken and their constructive engagement is absolutely remarkable. This is truly spectacular and the group deserve very high praise.

Definitely worth taking a look at the map. More on the process at UN Global Pulse.

A computational journalism reading list

[Last updated: 18 April 2011 — added statistical NLP book link]

There is something extraordinarily rich in the intersection of computer science and journalism. It feels like there’s a nascent field in the making, tied to the rise of the internet. The last few years have seen calls for a new class of  “programmer journalist” and the birth of a community of hacks and hackers. Meanwhile, several schools are now offering joint degrees. But we’ll need more than competent programmers in newsrooms. What are the key problems of computational journalism? What other fields can we draw upon for ideas and theory? For that matter, what is it?

I’d like to propose a working definition of computational journalism as the application of computer science to the problems of public information, knowledge, and belief, by practitioners who see their mission as outside of both commerce and government. This includes the journalistic mainstay of “reporting” — because information not published is information not known — but my definition is intentionally much broader than that. To succeed, this young discipline will need to draw heavily from social science, computer science, public communications, cognitive psychology and other fields, as well as the traditional values and practices of the journalism profession.

“Computational journalism” has no textbooks yet. In fact the term barely is barely recognized. The phrase seems to have emerged at Georgia Tech in 2006 or 2007. Nonetheless I feel like there are already important topics and key references.

Data journalism
Data journalism is obtaining, reporting on, curating and publishing data in the public interest. The practice is often more about spreadsheets than algorithms, so I’ll suggest that not all data journalism is “computational,” in the same way that a novel written on a word processor isn’t “computational.” But data journalism is interesting and important and dovetails with computational journalism in many ways.

Big data requires powerful exploration and storytelling tools, and increasingly that means visualization. But there’s good visualization and bad visualization, and the field has advanced tremendously since Tufte wrote The Visual Display of Quantitative Information. There is lots of good science that is too little known, and many open problems here.

  • Tamara Munzner’s chapter on visualization is the essential primer. She puts visualization on rigorous perceptual footing, and discusses all the major categories of practice. Absolutely required reading for anyone who works with pictures of data.
  • Ben Fry invented the Processing language and wrote his PhD thesis on “computational information design,” which is his powerful conception of the iterative, interactive practice of designing useful visualizations.
  • How do we make visualization statistically rigorous? How do we know we’re not just fooling ourselves when we see patterns in the pixels? This amazing paper by Wickham et. al. has some answers.
  • Is a visualization a story? Segal and Heer explore this question in “Narrative Visualization: Telling Stories with Data.”

Computational linguistics
Data is more than numbers. Given that the web is designed to be read by humans, it makes heavy use of human language. And then there are all the world’s books, and the archival recordings of millions of speeches and interviews. Computers are slowly getting better at dealing with language.

Communications technology and free speech
Code is law. Because our communications systems use software, the underlying mathematics of communication lead to staggering political consequences — including whether or not it is possible for governments to verify online identity or remove things from the internet. The key topics here are networks, cryptography, and information theory.

  • The Handbook of Applied Cryptography is a classic, and free online. But despite the title it doesn’t really explain how crypto is used in the real world, like Wikipedia does.
  • It’s important to know how the internet routes information, using TCP/IP and BGP, or at a somewhat higher level, things like the BitTorrent protocol. The technical details determine how hard it is to do things like block websites, suppress the dissemination of a file, or remove entire countries from the internet.
  • Anonymity is deeply important to online free speech, and very hard. The Tor project is the outstanding leader in anonymity-related research.
  • Information theory is stunningly useful across almost every technical discipline. Pierce’s short textbook is the classic introduction, while Tom Schneider’s Information Theory Primer seems to be the best free online reference.

Tracking the spread of information (and misinformation)
What do we know about how information spreads through society? Very little. But one nice side effect of our increasingly digital public sphere is the ability to track such things, at least in principle.

  • Memetracker was (AFAIK) the first credible demonstration of whole-web information tracking, following quoted soundbites through blogs and mainstream news sites and everything in between. Zach Seward has cogent reflections on their findings.
  • The Truthy Project aims for automated detection of astro-turfing on Twitter. They specialize in covert political messaging, or as I like to call it, computational propaganda.
  • We badly need tools to help us determine the source of any given online “fact.” There are many existing techniques that could be applied to the problem, as I discussed in a previous post.
  • If we had information provenance tools that worked across a spectrum of media outlets and feed types (web, social media, etc.) it would be much cheaper to do the sort of information ecosystem studies that Pew and others occasionally undertake. This would lead to a much better understanding of who does original reporting.

Filtering and recommendation
With vastly more information than ever before available to us, attention becomes the scarcest resource. Algorithms are an essential tool in filtering the flood of information that reaches each person. (Social media networks also act as filters.)

  • The paper on preference networks by Turyen et. al. is probably as good an introduction as anything to the state of the art in recommendation engines, those algorithms that tell you what articles you might like to read or what movies you might like to watch.
  • Before Google News there was Columbia News Blaster, which incorporated a number of interesting algorithms such as multi-lingual article clustering, automatic summarization, and more as described in this paper by McKeown et. al.
  • Anyone playing with clustering algorithms needs to have a deep appreciation of the ugly duckling theorem, which says that there is no categorization without preconceptions. King and Grimmer explore this with their technique for visualizing the space of clusterings.
  • Any digital journalism product which involves the audience to any degree — that should be all digital journalism products — is a piece of social software, well defined by Clay Shirky in his classic essay, “A Group Is Its Own Worst Enemy.” It’s also a “collective knowledge system” as articulated by Chris Dixon.

Measuring public knowledge
If journalism is about “informing the public” then we must consider what happens to stories after publication — this is the “last mile” problem in journalism. There is almost none of this happening in professional journalism today, aside from basic traffic analytics. The key question here is, how does journalism change ideas and action? Can we apply computers to help answer this question empirically?

  • World Public Opinion’s recent survey of misinformation among American voters solves this problem in the classic way, by doing a randomly sampled opinion poll. I discuss their bleak results here.
  • Blogosphere maps and other kinds of visualizations can help us understand the public information ecosystem, such as this interactive visualization of Iranian blogs. I have previously suggested using such maps as a navigation tool that might broaden our information horizons.
  • UN Global Pulse is a serious attempt to create a real-time global monitoring system to detect humanitarian threats in crisis situations. They plan to do this by mining the “data exhaust” of entire societies — social media postings, online records, news reports, and whatever else they can get their hands on. Sounds like key technology for journalism.
  • Vox Civitas is an ambitious social media mining tool designed for journalists. Computational linguistics, visualization, and more.

Research agenda
I know of only one work which proposes a research agenda for computational journalism.

This paper presents a broad vision and is really a must-read. However, it deals almost exclusively with reporting, that is, finding new knowledge and making it public. I’d like to suggest that the following unsolved problems are also important:

  • Tracing the source of any particular “fact” found online, and generally tracking the spread and mutation of information.
  • Cheap metrics for the state of the public information ecosystem. How accurate is the web? How accurate is a particular source?
  • Techniques for mapping public knowledge. What is it that people actually know and believe? How polarized is a population? What is under-reported? What is well reported but poorly appreciated?
  • Information routing and timing: how can we route each story to the set of people who might be most concerned about it, or best in a position to act, at the moment when it will be most relevant to them?

This sort of attention to the health of the public information ecosystem as a whole, beyond just the traditional surfacing of new stories, seems essential to the project of making journalism work.

The state of The State of the Union coverage, online

The state of the union is a big pre-planned event, so it’s a great place to showcase new approaches and techniques. What do news digital news organizations do when they go all out? Here’s my roundup of online coverage Tuesday night.

Live coverage

The Huffington Post, the New York Times, the Wall Street JournalABCCNNMashable, and many others, including even Mother Jones had live web video. But you can get live video on television, so perhaps the digitally native form of the live blog is more interesting. This can include commentary from multiple reporters, reactions from social media, link round-ups, etc. The New York Times, the Boston Globe, The Wall Street JournalCNNMSNBC, and many others had a live blog. The Huffington Post’s effort was particularly comprehensive, continuing well into Wednesday afternoon.

Multi-format, socially-aware live coverage is now standard, and by my reckoning makes television look meagre. But the experience is not really available on tablet and mobile yet. For example, almost all of the live video feeds were in Flash and therefore unavailable on Apple devices, as CNET reports.

As far as tools, there was some use of Coveritlive, but most live blogs seemed to be using nondescript custom software.


Lots of visualization love this year. But visualizations take time to create, so most of them were rooted in previously available SOTU information. The Wall Street Journal did an interactive topic and keyword breakdown of Obama’s addresses to congress since 2009, which moved about an hour after Tuesday’s speech concluded.

The New York Times had a snazzy graphic comparing the topics of 75 years of SOTU addresses,  by looking at the rates of certain carefully chosen words. Rollovers for individual counts, but mostly a flat thing.

The Guardian Data Blog took a similar historical approach, with Wordles for SOTU speeches from Obama and seven other presidents back to Washington. Being the Data Blog, they also put the word frequencies for these speeches into a downloadable spreadsheet. It’s a huge image, definitely intended for big print pages.

A shout-out to my AP colleagues for all their hard work on our SOTU interactive, which included the video, a fact-checked transcript, and an animated visualization of Twitter responses before, during, and after the State of the Union.

But it’s not clear what, if anything, we can actually learn from such visualizations. In terms of solid journalism content, possibly the best visualization came not from a news organization but from Nick Diakopoulos and co. at Rutgers University. Their Vox Civitas tool does filtering, search, and visualization of over  100,000 tweets captured during the address.

I find this interface a little too complex for general audience consumption — definitely a power user’s tool. But the algorithms are second to none. For example, Vox Civitas compares tweets to the text of the speech within the previous two minutes to detect “relevance,” and the automated keyword extraction — you can see the keywords at the bottom of the interface above — is based on tf-idf and seems to choose really interesting and relevant words. The interactive graph of keyword frequency over time clearly shows the sort of information that I had hoped to reveal with the AP’s visualization.

Fact Checking

A number of organizations did real-time or near real-time fact checking, as Yahoo reports. The Sunlight Foundation used itsSunlight Live system fo real-time fact checks and commentary. This platform, incorporating live video, social media monitoring, and other components is expected to be available as an open-source web app, for the use of other news organizations, by mid-2011.

The Associated Press published a long fact check piece (also integrated into the AP interactive), ABC had their own story, and CNN took a stab at it.

But the heaviest hitter was Politifact, who had a number of fact check rulings within hours and several more by Wednesday evening. These are together in a nice summary article, but as is their custom the individual fact checks are extensively documented and linked to primary sources.

Audience engagement

Pretty much every news organization had some SUTO action on social media, though with varying degrees of aggressiveness and creativity. Some of the more interesting efforts involved solicitation of audience responses of a specific kind. NPR asked people to describe their reaction to the state of the union in three words. This was promoted aggressively on Twitter and Facebook. They also asked for political affiliation, and split out the 4000 responses into Democratic and Republican word clouds:

Apparently, Obama’s salmon joke went down well. The Wall Street Journal went live Tuesday morning with “The State of the Union is…” asking viewers to leave a one word answer. This was also promoted on Twitter. Their results were presented in the same interactive, as a popularity-sorted list.

Aside from this type of interactive, we saw lots of agressive social media engagement in general. The more social-media savvy organizations were all over this, promoting their upcoming coverage and responding to their audiences. As usual, the Huffington Post was pretty seriously tweeting the event, posting about updates to their live blog, etc. and going well into Wednesday morning. Perhaps inspired by NPR, they encouraged people to tweet their #3wordreaction to the speech. They also collected and highlighted reaction from teachers, Sarah Palin, etc.

But as an AP colleague of mine asked, engagement to what end? Getting people’s attention is great, but then how do we, as journalists, focus that attention in a way that makes people think or act?

The White House

No online media roundup of the SOTU would be complete without a discussion of the White House’s own efforts, including web and mobile app presences. Fortunately, Nieman Journalism Lab has done this for us. Here I’ll just add that the White House livestreamed a Q&A session in front of  an audience immediately after the speech, in which White House Office of Public Engagement’s Kal Penn (aka Kumar) read questions from social media. Then Obama himself did an intervew Thursday afternoon in which he answered questions submitted as videos on YouTube.

What’s the point of social news?

According to Facebook, social news seems to be mostly about knowing what all my friends are reading. I’m not so sure. But I think there really is something to the idea of “social news” for journalism, and for journalism product design.

I take “social” to mean “interacting with other people.” That’s a fundamental technical possibility of digital media, as basic to the internet as moving pictures are to television. I’m not sure that anyone really knows yet what to do with that possibility, but happily there are already at least two very well-developed uses. Maybe social news isn’t about “friends” at all, but about filtering and news-gathering.

Twitter is really a filter
I get most of both my general and special interest news from Twitter. I rarely go to the home page of a news site, or use a news app. It’s not the tweets themselves that are informative, but the links within them to articles posted elsewhere. I follow a large set of people with varied interests, and some of them work for news organizations, but most do not. My twitter feed is faster, more diverse, and available across more platforms (all of them) than any one news organization’s output.

This doesn’t mean that Twitter is a perfect news delivery system, but to me it’s proven better than just about anything else at getting me the news mix that I want, and keeping me interested in the world at large. (Admittedly, I follow people I’ve met in other countries, so yeah, travel is way better than Twitter for that.) I am not alone in this opinion. The structure of follower relationships among Twitter users suggests that it’s more of a news network than a social network.

The usefulness of Twitter for news has a lot to do with certain basic design choices. First, a tweet is really as short as you can get and still communicate a complete concept, so it’s basically an extended headline. Second, Twitter differs from Facebook in that relationships can be unidirectional: I don’t need anyone’s permission to follow them, and they may not know or care that I do. Following someone on Twitter also differs from following a blog via RSS because most tweets refer to someone else’s work through a link — Twitter is more about re-publishing than publishing. Retweets also include the name of the original tweeter, which enables discovery of interesting new curators.

Filtering is much more valuable than it used to be, in this era of information overload, and these properties make Twitter an excellent filtering system. There are several news products based almost entirely on displaying links tweeted by the people you follow, such as The Twitter Tim.es and Flipboard. The medium that Twitter invented — global public short messaging with links — has already been endlessly replicated and will be with us forever.

There is a sense in which news organizations have always seen filtering as a big part of their value. One of the duties of the professional editor is to decide what you need to see. But at least one thing has upset that model irretrievably: the internet is not a broadcast medium. While each person reads an identical copy of the Times and watches an identical CNN broadcast, there’s no reason my internet has to look the same as your internet. A small team of human editors can’t personalize the headlines for every reader, so that leaves algorithmic filtering, such as Google News’ personalization features, or social filtering, such as Twitter.

The point is, there’s probably something to learn from how Twitter uses social relationships to route information. As the Nieman Journalism Lab said: “social news isn’t about the people you know so much as the people with whom you share interests.” To put this in terms of the product I wish I had: when I use your news product, I want to be able to follow the recommended reading of other members of the audience, if they so allow. Also, can I follow a particular reporter? And does your product integrate with the other methods I already use for getting information, so I don’t have to choose?

Social networks are great for reporting
Audience-journalist collaboration, blah blah blah. If the idea that professionals are no longer the only players in news is new to you, see blogging and Wikipedia. But a news organization probably has to look at this from a different angle. For me, the core idea of social news-gathering is that the audience is, or could be, an extension of the news organization’s source network.

Hopefully, a newsroom knows about interesting developments before anyone else, and then verifies and publicizes them, but that’s getting near impossible when anyone can publish, and when virality can amplify primary sources without the involvement of a media organization. We don’t know yet very much about collective news-gathering, but there are promising directions. It seems like maybe there are two broad categories of breaking news: public events that anyone could have witnessed, and private events initially known only to privileged observers.

Social media is now routinely used to augment reporting of public events. There are entire units in news organizations dedicated to getting stories from the audience, often under the awkward rubric of “user-generated content.” But why sift for events online when you can give your audience the tools to give you the story directly? Right now if I see a plane land in a river, I tweet it. Wouldn’t a news organization prefer that I send my eye-witness photo to the UGC editor instead? To this end, several mobile news apps include the ability to submit pictures. CNN’s iReport app and website is probably the best developed of these. Ideally, I could send that breaking news tweet to the newsroom and to my friends at the same time, within the same application.

Fast reporting of private events has always depended on having the right sources. A well established source may call the reporter or send an email when something newsworthy happens. Someone with a much looser connection to the organization may not, and perhaps this is an opportunity for social news tools. When someone knows something — or can talk about something — you want them to contact the newsroom first. The potential of this weak-tie news sourcing approach hasn’t really been studied, to my knowledge, but I imagine that it would require, at minimum, a trusted brand, an easily-reachable editorial staff, and frictionless communication tools. If it’s easier just to tweet or blog the news, the source will.

There are several other good examples of social news-gathering, on the theme of asking your audience for help. Crowdsourcing is usually thought of as the recruitment of many unspecialized helpers, as the Guardian did with its MP expenses project. But the Guardian also reached out to its audience to find that one specialist attorney who could unravel the mystery of Tony Blair’s tax returns. Hopefully the specialists a newsroom needs to consult are already among the audience, and they will see the call for experts when a reporter sends one out. For that matter, a smart and engaged audience can correct you quickly when you are wrong. Nothing says “we care about accuracy” like a fact check box on every story.

But is it journalism?
Yes, absolutely. The job of journalism is to collect accurate information on an ongoing basis and ensure that the audience for each story learns about that story. Any way you can deliver that service is fair game. People depend on each other for the news all the time, so journalists better get in those conversations.

Designing journalism to be used

There are lots of reasons people might want to follow the news, but to me, journalism’s core mission is to facilitate agency. I don’t think current news products are very good at this.

Journalism, capital J, is supposed to be about ideals such as “democracy” and “the public interest.” It’s probably important to be an informed voter, but this is a very shallow theory of why journalism is desirable. Most of what we see around us isn’t built on votes. It’s built on people imagining that some part of the world should be some other way, and then doing what it takes to accomplish that. Democracy is fine, but a real civic culture is far more participatory and empowering than elections. This requires not just information, but information tools.

Newspaper stories online and streaming video on a tablet are not those tools. They are transplantations of what was possible with paper and television. Much more is now possible, and I’m going to try to sketch the outlines of how newsroom products might better support the people who are actually changing the world.

What’s a journalism “product”?

Continue reading Designing journalism to be used

Don’t throw that out! Editing like it’s paper destroys journalistic value


The village of Kangzhuang, in Henan Province, China, was built in 2006 next to the Tianrui cement factory (in background, above), to house villagers relocated after the government bought their land. A huge grey cloud of dust hangs permanently over the village. The villagers told me that the factory is shut down for a day or two whenever air quality authorities come to visit, then started up again as soon as they leave.

I visited Kangzhuang on another story. But eating dinner that night, covered in a film of dust, I suggested to a Chinese colleague that this issue of cheated environmental regulations deserved to be investigated further.

“That’s not news,” she said. “That happens all over China, every day.”

And she’s right: it’s not news. All chronic problems fade from attention, and old news won’t interest (most) readers, or sell papers. But we’re leaving the paper era, and this changes things. In the web era, documenting old, marginal, or incomplete stories is much more valuable — and much more affordable — than it used to be.

There has been much discussion of how web writing style must differ from paper writing style, but the difference in medium also changes what stories can and should be covered. Despite its physicality, paper is a deeply transient medium. Yesterday’s newspaper lines birdcages, but yesterday’s web stories will be showing up on Google five years from now. An editor selecting stories needs to be thinking about not only tomorrow’s page views but next year’s as well, and also, crucially, how the story will function in combination with stories from other outlets. There are close ties here to the concept of stock and flow in journalism, and the new-media notions of topic pages and context.

I ran into this paper-web cultural divide discussing a pitch for a story about the number of civilian casualties in the Iraq war. I had done a little bit of new reporting on the topic, but mostly I just wanted to write a thorough summary of the various estimates and a careful, clear analysis of how reliable each might be. An editor friend said that, since the studies I would be referencing were several years old, this wasn’t news and would be hard to sell to the outlet I was proposing; I said that I was planning to write an authoratative article on a topic of great international concern for a general audience, a piece which doesn’t really exist yet. I felt that this would be exactly the sort of public service that journalism is supposed to be, and further, we’d have a good shot at getting in the top five results on Google.

We clearly had somewhat different conceptions of what stories count as publishable journalism, which seemed to be derived from our focus on different media.

Not only is the web a permanent medium, it’s a distributed, accessible-from-anywhere medium (unless your government doesn’t want you to know about certain things, but there are ways around that.) A single report of environmental cheating at a cement factory is not going to change anything for the people who will get sick from cement dust, but hundreds of such reports all over China might. If editors, used to selling papers, think of stories as throwaway consumables rather than careful additions to a permanent store, they miss opportunities for collective action.

But how to fund such long-term, speculative projects? After all, every story costs money to produce. I think part of the answer lies in another medium-driven difference: the web is more amenable to journalism of different levels of quality and completeness. The New York Times aims to be “the paper of record,” which means it hopes always to tell the full story, and to never get a fact wrong. On the web, this is ridiculously inefficient. As a social medium, the web draws power from collaboration and conversation — say, between different papers in different places — and that process is severely hindered if only “finished” work makes it online.

Yes, there will always be a need for the authoritative voice and the carefully edited, sub-edited, copy-edited and fact-checked article. But what about all that other good stuff that journalists produce in the course of their work? What about the juicy trimmings that had to be cut from the main story, the tantalizing leads that the reporter hasn’t had time to follow up, and the small incidents that have meaning only in aggregate? The web demands that we put more online than we would publish on paper, and provides a place for information of all grades. In this new medium, amateur journalists (such as bloggers and thoughtful commenters) are often much more adept at creating value from information by-products than their professional peers. News organizations will have to find forms for publishing unpolished information, such as the beat blog.

The report of environmental malfeasance in Kangzhuang is not yet a story. It hasn’t been checked against other sources, we haven’t heard from the local government ministry about whether they did a proper environmental study prior to putting 2,300 people next to a factory, and anyway the journalists who visited the town (myself among them) are currently busy writing up an entirely different story. But this tidbit deserves to be aired in a public place, so that others can build on it in the future. This is a valuable public service that could be provided for low cost, a service that is possible only on the web. Will an industry trained in the paper era see this possibility?

Identity, Anonymity, and Controlling Trolls

Multiple personalities

Flame wars and jihadist rants and generally worthless behavior in the comments: that’s the problem I’m trying to solve here.

And I’m trying to do it while preserving anonymity. Internet conversation can get nasty when the participants are anonymous, which has led to proposals of tying all online identities to “real” identities. This is the wrong solution to the troll problem, because it destroys privacy in a serious way. I want to build discussion systems that allow anonymous comments, yet remain orderly, civil, and enlightening. I think this can be done with filtering systems based on reputation.

Reputation is a thing that sticks to an identity. Historically most people had only one identity, closely tied to their physical presence. But now, online, every one of us has multiple identities: think of how many user names and logins you have. There’s some consolidation going on, in the increasing acceptance of Google, Twitter, and Facebook logins across the web, and this is mostly a good thing.  But I don’t think we want to aim for a world where each person has only one online identity. Multiple identities are good and useful.

Multiple identities are closely related to anonymity. Anonymity doesn’t mean having no identity, it means not being able to tie one of my identities to the others. I want to be very careful about who gets to tie the different parts of me together. I’m going to give two arguments for this, which I’ll call the “does your mother know” and “totalitarian state” arguments. They’re both really important. I’d be really if sad if we lost anonymity in either case. And after I’ve convinced you that we need anonymity, I’ll talk about how we get people to behave even if they don’t leave a name.

Keeping the different facets of ourselves apart is the essence of privacy. We’ve always been different people in different contexts, but this was only possible because we could expect that word of what we did with our friends last night would not get back to our mother. This expectation depends upon the ability to separate our actions in different contexts;  your mom or your boss knows that someone in the community is going on a bender/having kinky sex/voting Republican, but she doesn’t know it’s you. The ability to have different identities in different contexts is intricately tied to privacy, and in my mind no different than setting a post to “friends only” or denying Amazon.com the details of your personal life. Although the boundaries around what is “personal” are surely changing, if you really think we’re heading toward a world where everybody knows everything about everyone, you’re mad. For one thing, secrets are immensely valuable to the business world.

And then there’s China. I live right next door to the most invasive regime in the world. The Chinese government, and certain others such as Korea, are trying very hard to tie online and corporeal identities together by instituting real name policies. This makes enforcement of legal and social norms easier. Which is great until you disagree. Every damn blog comment everywhere is traceable to you. Every Wikipedia edit. Everything. China is trying as hard as it can to make opposing speech literally impossible. This is not theoretical. As of last week, you can’t send dirty words through SMS.

When the digital panopticon is a real possibility, I think that the ability to speak without censure is vital to the balance of power in all sectors. Anonymity is important to a very wide range of interests, as the diversity of the Tor project shows us. Tor is a tool and a network for anonymity online, and it is sponsored by everyone from rights activist groups to the US Department of Defense to journalists and spies. Anonymity is very, very useful, and is deeply tied to the human right of privacy.

Right, but… how do we get sociopaths to play nice in the comments section if they can say anything they want without repercussions?

The general answer is that we encourage social behavior online in exactly the way we encourage it offline: social norms and peer pressure. We can build social tools into our online systems, just like we already do. A simple example is the “flag this” link on many commenting systems. Let’s teach people to click it when they mean “this is a useless post by troll.” Collaborative moderation systems — such as “rate this post” features of all kinds — work similarly.

Collaborative moderation is a really big, important topic, and I’ll write more about it later. There are voting systems of all kinds, and the details matter. Compare Slashdot versus Digg versus Reddit. But all of these systems rate comments, not users, and I think this makes them weaker than they could be at suppressing trolls and spam. Identities matter, because identities have reputations.

Reputation is an expectation about how an identity will behave. It is built up over time. Crucially, a throw-away “anonymous” identity doesn’t have it. That’s why systems based on reputation in various forms work to produce social behavior. There are “currency” systems like StackOverflow‘s karma where one user can give another credit for answering a question. There are voting systems such as the Huffington Post‘s “I’m a fan of (comment poster)” which are designed to identity trustworthy users. Even Twitter Lists are a form of reputation system, where one user can choose to continuously rebroadcast someone else’s tweets.

And in the context of online discussion, you use reputation to direct attention.

That’s what filtering is: directing attention. And this is how you deal with trolls without restricting freedom of speech: you build collaborative filters based on reputation. Reputation is powerful precisely because it predicts behavior. New or “anonymous” identities would have no reputation and thus command little attention (at least until they said a few interesting things) while repeat offenders would sink to the bottom. Trolls would still exist, but they simply wouldn’t be heard.

NB, none of this requires tying online identities to corporeal people. Rather than being frightened of anonymity and multiple identities, I think we need to embrace them. We need to trust that we can evolve the right mixes of software and norms so that collaboration overwhelms vandalism, just as Wikipedia did. This field is mostly unexplored. We need to learn how identity relates to trust and reputation and action. And we need to think of social software as architecture, a space that shapes and channels the behavior of the people in it.

Simply trying to make it impossible to do anything bad will destroy much that is great about the internet. And it lacks imagination.

Not Quite Global New Year

Today I have been keeping Twitter window open, watching messages tagged #10yearsago scroll by. It’s striking. This is the sort of grass-roots expression of hopes and dreams that adventurous journalists used to travel the world for, and compile into coffee table books. Now we can all see it live for free.

aricaaa #10yearsago boys still had cooties. ah i miss those days!

davidwees Happy New Year! #10yearsago today I was in a dead-end job working in a warehouse. Now I love what I do and have a great family.

scottharrison: #10yearsago I was a sycophant and a drunk selling vodka to bankers in clubs. Grateful for God’s grace and sense of humor.

Sirenism #10yearsago I was eleven and one of my brothers friends tried to kiss me at midnight. I punched him in the nuts.

cosmicjester Holy Shit #10yearsago I met a girl at a friends birthday party, we both liked Red Dwarf and the Beatles. Then she became the girl.

shaunraney #10yearsago was the the saddest day of my life.

As striking as this is, I notice that almost all of the traffic is in English. The only other language reasonably well represented is Indonesian. Curious, though it is the 4th largest country by population, and social media are hugely popular here.

I’ve also really enjoyed watching the clock strike midnight in different time zones. Here in Jakarta, the NYE conversations of my friends in California — 13 hours behind — seem so last night. I’m nursing a hangover, they’re working on one.

It’s so easy to forget the world outside what you know. I hope that global media like Twitter will help us to remember everyone else. The technological means have arrived with a roar, but we’re still not really talking to one another. What is the next step?

Comments on the New York Times’ Comments System

Here are some problems I see with the implementation of the commenting system on the New York Times web site. Assuming that they want the discussions about their content to be taking place on their site. The way things stand now, I suspect that they’re actively sending many readers to Facebook instead of keeping them on nytimes.com.

  • How come some articles allow comment and others don’t? Is the policy of which articles get comments explained anywhere? Arriving at Times content from a link, I’m confused about whether I can expect a good discussion or just broadcast.
  • Even for articles that we are allowed to comment on, the comments are hidden. Sometimes there’s a pull-out quote (which is cool!) but more often we see only this:

    NYTimes hidden comments

  • The number of comments on each article is not visible from the front page or the section pages. There’s no way for readers to see, at a glance, which discussions are hot.
  • The “recommend” button on each comment is welcome, and serves as a useful way to filter comments. The “highlight” button which seems to appear on the more recommended comments is a little more obscure — does it just put the comment in the “highlight” list, or is there editor moderation involved? The “what’s this” tip doesn’t clear this up (click image below for larger)

    Highlight what

  • There is no comment view that is sorted for both relevance and freshness, which is the most useful way to track a discussion. Digg and others get this right by adding a time-weighting to a comment’s position in the list.
  • There is no way to reply to someone else’s comment. This makes it impossible to have a real discussion on the site. Many other commenting systems organize comments into threads. By not supporting this, the Times is saying that we can talk to them, but not to each other.
  • The comments on an article close after a certain point. Although I can see that this might be due to moderator workload issues, it’s also a way to drive away future traffic — as when that link goes viral a week later, or the discussion lasts for months.
  • Speaking of which, the comments are moderated. This topic’s a little more complex: there are advantages and disadvantages to this. But I’d like to note that there are plenty of civil discussions happening on the internet in unmoderated places. The strategy of putting a “flag this” link on each comment that sends it for human review is relevant here.
  • From the comments FAQ: “We appreciate it when readers and people quoted in articles or blog posts point out errors of fact or emphasis and will investigate all assertions. But these suggestions should be sent by e-mail.” Really? Why? Wouldn’t claims of possible errors of fact be the among the first things you’d want readers to see? Your comments are moderated anyway. And I’d like to point out that your own journalists and editors read the site too — you need those corrections up quickly for internal communication (see: the hilarious Washington Post vs. Public Enemy “911” correction saga.)

In short, the comment system seems to be to have been designed by someone who has never responded to a message that says “there’s a great discussion about that going on at [link].”