Jonathan Stray’s blog | Jonathan Stray | Information, culture, and belief

Peace, Conflict, and Data

July 3, 2013March 3, 2014conflict resolution, data, non-violence, politics, twitter, visualization3 Comments

A talk I gave at the IPSI Bologna Symposium on conflict resolution. Slides here.

We might be able to do better at conflict resolution — making peace in violent conflicts — with the help of good data analysis. There have long been data sets about war and violent conflict at the state level, but we now have much more.

There are now extraordinarily detailed, open-source event data streams that can be used for violence prediction. Conflict “microdata” from social media and communications records can be used to visualize the divisions in society. I also suggest a long term program of conflict data collection to learn, over many cases, what works in conflict resolution and what doesn’t.

We’re really just at the beginning of all of this. There are huge issues around data collection, interpretation, privacy, security, and politics. But the potential is too great to ignore.

Update: two excellent resources have come to my attention in the days since I gave this talk (which is, of course, part of why I give talks.)

First, see the International Peace Institute’s paper on Big Data for Conflict Prevention. This paper was co-authored by Patrick Meier, who has been deeply involved in the crisis mapping work I mentioned in my talk.

But even more awesome, Erica Chenoweth has done exactly the sort of data-driven case-control study I was contemplating in my talk, and shown that non-violent political resistance succeeds twice as often as armed resistance. Her data set, the Nonviolent and Violent Campaigns and Outcomes (NAVCO) Data Project, also shows that non-violence is much more likely to lead to good democracies five years later, and that a movement that can recruit 10% of the population is almost guaranteed to succeed.

I highly recommend her talk.

Recent work

July 2, 2013July 2, 201322 Comments

I realize I haven’t been posting here for some time. That’s because I’ve been posting elsewhere! Here’s some of what I’ve been up to, mostly in the last six months or so:

Good old-fashioned journalism:

The NSA Surveillance FAQ. A summary of all the mess of stories, leaks, and laws. My first piece for ProPublica.
Everything you always wanted to know about gun violence in America, for The Atlantic.
The Whole Dysfunctional National Conversation About Guns—on Twitter … in One Interactive Graph. A social network content analysis, showing extreme polarization.

Writing about journalism, for the Nieman Journalism Lab:

Work at the intersection of data and journalism:

Computer Science and Journalism: Two great tastes that taste great together. Public lecture at the University of Hong Kong.
Computational Journalism, all lectures, readings and assigments for the 8-part course I taught in Hong Kong (and will teach again at Columbia this fall.)
Text analysis in transparency, a talk at the Sunlight Foundation. Bonus: includes a discussion of impact in transparency projects.
How a computer can organize thousands of documents for a reporter, about Overview, the document mining system I’m working on.

So, yeah, I’ve been busy.

Journalism is more than one thing

June 13, 201211 Comments

There’s a craving in the air for a definitive statement on what journalism is, something to rally around as everything changes. But I want to do the opposite. I want to explode journalism, to break it apart into its atomic acts. I’m beginning to suspect that taking it apart is the only way we can put it all back together again.

In the endless debate about what the “future of journalism” holds, “journalism” doesn’t have a very clear meaning. We’re in the midst of hot arguments over who is a journalist, whether social media is journalism, whether data is journalism, whether cherished tenets like objectivity are necessary for journalism. As the print advertising model that funded the bulk of working journalists collapses and forces transformation, it’s pressing to know what is worth preserving, or building anew.

After decades where “journalism is what journalists do” was good enough, there is a sudden a bloom of definitions. Some claim that “original reporting” is the core, deliberately excluding curation, aggregation, and analysis. Others say “investigative reporting” is the thing that counts, while a recent FCC report uses the term “accountability journalism” liberally. These are all efforts to define some key journalistic act, some central thing we can rally around.

I don’t think I could tell you what the true core of journalism is. But I think I have a pretty good idea of what journalists actually do. It’s a lot of things, all of them valuable, none of them the exclusive province of the professional. Journalists go to the scene and write or narrate or shoot what is happening. They do months-long investigations and publish stories that hold power accountable. They ask pointed questions of authorities. They read public records and bring obscure but relevant facts to light. All of this is very traditional, very comfortable newswork.

But journalists do all sorts of other things too. They use their powerful communication channels to bring attention to issues that they didn’t, themselves, first report. They curate and filter the noise of the Internet. They assemble all of the relevant articles in one place. They explain complicated subjects. They liveblog. They retweet the revolution. And even in the age of the Internet, there is value to being nothing more than a reliable conduit for bits; just pointing a camera at the news — and keeping it live no matter what — is an important journalistic act.

There’s more. Journalists verify facts and set the record straight when politicians spin. (You’d think this would be uncontroversial among journalists, but it’s not.) They provide a place for public discussion, or moderate such a place. And even though magazine journalism can be of a very different kind, like Hunter S. Thompson writing for The Atlantic, we still call it journalism. Meanwhile, newspaper journalists write an enormous number of interpretive pieces, a much larger fraction than is normally appreciated. The stereotypical “what just happened” report has become less and less common throughout the last 100 years, and fully 40 percent of front page stories are now analytical or interpretive, according to an excellent piece of forthcoming research. And, of course, there are the data journalists to cope with the huge rise in the availability and value of data.

Can we really say which of these is the “true” journalism?

I think it depends hugely on the context. If some important aspect of the present has never been represented anywhere else, then yes, original reporting is the key. But maybe what the public needs is already in a document somewhere, and just posting a link to it on a widely viewed channel is all that is needed. At the other end of the spectrum, verifying the most basic, on-the-ground facts be can challenge enough. I saw the process that the AP went through to confirm Gadhafi’s death, and it was a tricky undertaking in the middle of a conflict zone. In other cases, the missing piece might not require any new reporting at all, just a brilliant summary that pulls together all the loose threads.

There are a lot of different roles to play in the digital public sphere. A journalist might step into any or all of these roles. So might anyone else, as we are gradually figuring out.

But this, this broad view of all of the various important things that a journalist might do, this is not how the profession sees itself. And it’s not how newsrooms are built. “I’ll do a story” is a marvelous hammer, but it often leads to enormous duplication of effort and doesn’t necessarily best serve the user. Meanwhile, all the boundaries are in flux. Sources can reach the audience directly, and what we used to call “technology” companies now do many of the things above. Couple this with the massive, beautiful surge of participatory media creation, and it’s no longer clear where to draw the lines.

But that’s okay. Even now, news organizations do a huge number of different things, a sort of package service. Tomorrow, that might be a different package. Each of the acts that make up journalism might best be done inside or outside the newsroom, by professionals or amateurs or partners or specialists. It all depends upon the economics of the ecosystem and, ultimately, the needs of the users. Journalism is many good things, but it’s going to be a different set of good things in each time, place, and circumstance.

(originally published at Nieman Journalism Lab)

The hard part of solution journalism is agreeing on the problems

May 15, 2012May 15, 201237 Comments

The only editorial mantra that ever made any sense to me comes from the Voice of San Diego new reporter guidelines: “Our bent: Reform. Things can always be better.” It’s been said that the role of journalism is to inform, but informing seems like a means, not an end, and I believe that a better world is the ultimate goal for journalism. The ambitious idea of solution journalism is to concentrate reporting on what could be improved and how, not just what is wrong. There are a small number of people practicing this today, such as David Bornstein who writes the New York Times’ “fixes” column, and Dowser.org.

But “things can always be better” is a supremely difficult phrase. It appeals to our hopes, while it hides our disagreements and our ignorance. Before we can come up with solutions, we have to agree on what the problems are. This is harder than it sounds; you can’t just sit down and make a list like “unemployment, education, crime, homelessness, global warming…” and get to reporting. People are going to disagree not only about priorities, but about how to best to understand a problem, and even about whether or not certain things are problems. Dealing in solutions also tends to move the journalist from informer to advocate, which is tricky territory.

I think there’s a way to do solution journalism that deals with these difficulties, but first we have to understand why this is so hard.

What’s a social problem?
In my time as a journalist I’ve seen a lot of bitter complaining that some particular issue is under-covered. Often, there is merit to the complaints. But let’s take the larger view and ask how we should decide which problems are deserving of attention, and how much. How do we weigh homelessness versus crime, or compare it to failing schools, onerous taxes, corrupt financiers, AIDS, unemployment, and global warming? How do we rate the local against the global? How do we weigh one endangered species against another? (In practice, very inconsistently.)

Sociologists have understood for some time that social problems are “products of a process of collective definition,” as Stephen Hilgartner and Charles Bosk put it in 1988. “After all,” they wrote,

there are many situations in society that could be perceived as social problems but are not so defined. A theory that views social problems as mere reflections of objective conditions cannot explain why some conditions are defined as problems, demanding a great deal of societal attention, whereas others, equally harmful or dangerous, are not. … The extent of the harm in these cases cannot, in itself, explain these differences, and it is not enough to say that some of these situations become problems because they are more “important.” All of these issues are important — or at least capable of being seen as such.

“Social problems” are real, but they are not like trees and planets and atoms, things “out there” in the universe that will be discovered the same way by anyone who looks. Although there are surely things wrong in the world, the process that transforms real-world conditions into the “issues” of any particular time and place, the issues that journalists “should” be writing about, is social and subjective. This was one of the lessons of the social constructionists in the 1970s. Meanwhile, it was the architect, engineer, and urban planner Horst Rittel who gave us a way to think and talk about problems that are real, but extraordinarily hard to pin down.

Wicked problems
A “wicked problem” is one where defining the problem is part of the problem. Suppose we’re concerned about homelessness. All right, the problem is that there are people on the streets. Why is that? Maybe they lack any employable skills, and the true difficulty lies with the education system. Or maybe they’re mentally ill, in which case health care could be the root problem. Or, maybe we need to look broader. Perhaps something is wrong with the way that we are managing our economy, so that too many people are plunged into poverty. And if we notice that many homeless people are women, or black, perhaps this is an issue with systemic discrimination of one kind of another. The whole thing is a massive tangle of cause and effect.

In a brilliant 1973 essay, Rittel saw that top-down, institutional solutions to social problems based on “objective” criteria simply wouldn’t work, because there is no one clear “right” way to define a problem, let alone solve it.

The search for scientific bases for confronting problems of social policy is bound to fail, because of the nature of those problems. They are “wicked” problems, whereas science has developed to deal with “tame” problems. Policy problems cannot be definitively described. Moreover, in a pluralistic society there is nothing like the undisputable public good; there is no objective definition of equity. … Goal-finding is turning out to be an extraordinarily obstinate task.

Rittel goes on eloquently about the features that wicked problems share. Jay Rosen has a good summary:

Wicked problems have these features: It is hard to say what the problem is, to define it clearly or to tell where it stops and starts. There is no “right” way to view the problem, no definitive formulation. The way it’s framed will change what the solution appears to be. Someone can always say that the problem is just a symptom of another problem and that someone will not be wrong. There are many stakeholders, all with their own frames, which they tend to see as exclusively correct. Ask what the problem is and you will get a different answer from each. The problem is inter-connected to a lot of other problems; pulling them apart is almost impossible.

Trained in cybernetics, an early mathematical form of systems theory, Rittel thought in networks of cause and effect and saw how all of society operates as an irreducible whole. But he was also deeply involved in the practical realities of social undertakings as an architect, designer, and civic planner, and he appreciated the reality of our pluralistic cultures. The result is a very nuanced argument that social problems cannot be grasped in “objective” terms. In most cases there is no obviously right conception of a problem, and no single “correct” solution. Instead, Rittel became interested the process of “design.”

Designing society
Rittel researched, practiced, and wrote on the subject of design, which he said was about planning a path from what “is” to what “ought” to be. Rather than a linear method, he saw design as an iterative process of imagining future worlds and investigating the tools available to reach them from the actual present. In “The Reasoning of Designers” he wrote,

A design problem keeps changing while it is treated, because the understanding of what ought to be accomplished, and how it might be accomplished is continually shifting. Learning what the problem is IS the problem.

Such a design process is flexible and amorphous enough to attack the wicked problems of society. But it is necessarily a subjective process, dependent on the background assumptions and values of the designer, and also necessarily a political process because design, especially social planning, affects many lives.

No plan has ever been beneficial to everybody. Therefore, many persons with varying, often contradictory interests and ideas are or want to be involved in plan-making. The resulting plans are usually compromises resulting from negotiation and the application of power. The designer is party in these processes; he takes sides. Designing entails political commitment — although many experts would rather see themselves as neutral, impartial, benevolent experts who serve the abstraction of “the common good.”

Rittel saw many parallels between design and discussion. In fact he saw design as “a process of argumentation” and asked how people could engage in productive discussions to come up with good plans. There are strong parallels here to the concept of deliberative democracy, and the idea that journalism “must provide a forum for public criticism and compromise” (according to the Elements of Journalism.)

The role of the solution journalist
A journalist is not an urban planner, a teacher, an economist, a police captain, or an epidemiologist. We already have those people in society, so I don’t know why we would imagine that journalists are supposed to invent good plans. Even the idea of journalists merely promoting particular solutions flies in the face of the orthodoxy that says journalism exists to inform, not to advise or act. Personally, I find the idea of total journalistic detachment to be nonsensical; if journalism has no effect, then it simply does not work. But neither do I think that journalists have any particular legitimacy to decide for everyone else. Chris Anderson nails this point when he asks,

by what right, and on what grounds, do journalists claim the authority to offer solutions to any particularly difficult problem? Journalists are neither elected, nor particularly accountable, nor all that expert in anything in particular.

I answer this by saying that I don’t want the journalist to offer solutions. The solution journalist ought to be well informed, certainly, and perhaps they ought to report and write on possible solutions to social problems, but I dont think that’s their primary responsibility. Rather, I see the solution journalist as responsible for the process of public discussion by which problems are defined and turned into plans for the future.

This is the moderator’s role. There is wide scope here, beyond the daily nuts and bolts of moderating a networked discussion (for which there are already a great variety of models.) It would be very valuable if the journalist continually curated links that describe both potential issues and potential solutions within the community. It would be crucial to include a variety of voices in this discussion, or the conclusions may not be representative; I like John Dewey’s definition of a public as a group of people affected by some issue. And the journalist could step in at key moments to clarify basic points of fact, either by citing authoritative references or by doing some reporting. The point is to have a healthy discussion about just what are the most pressing public problems — and the possible solutions. “Healthy” might mean many things, such as reality-based, respectful, and productive. Deciding what kind of discussion we want to have and how best to go about having it is itself a wonderful design problem!

There is a great deal of room here for experimentation with software and process. As early as 1970 Rittel designed what we would now call “social software” to facilitate discussions, building his “issue-based information system” for government planning departments. But we know very little about how to make discussion systems work at web scale. We have a few tantalizing examples — the Slashdots, Wikipedias, and Reddits of the world — but no general principles. Meanwhile, we are just beginning to ask about the very human process of tending to an online community. What is the most effective and the fairest way to deal with trolls, crazies, and other spoilers? How do we make the hard decisions about excluding people? How can the users best contribute to the process? What is the right combination of norms, rules, and code? Unfortunately, we are going to have to learn how to do this differently for different sizes of groups. A neighborhood, a city, a country and a planet will all require different approaches, because social interactions do not scale cleanly (see, e.g., Dunbar’s number.)

So there is software, and there is process, and there are people bound up together who will see different aspects of their shared condition. Sometimes they will disagree violently about the truest representation and the worthiest goal. Perhaps the work of solution journalism is not to propose solutions, but to help a community come to a shared understanding of what its major problems are, which is the first and possibly hardest step in solving them.

Darfur and the limits of public outcry

May 6, 2012February 6, 20139 Comments

I just finished reading Rebecca Hamilton’s new book Fighting for Darfur: Public Action and the Struggle to Stop Genocide, and I must say I’m more confused than ever about the role that ordinary people can play in resolving international problems. But I think I’m confused in a good way, that kind of “this is a lot trickier than I thought” way that leads to learning. Hamilton was deeply involved in student activism for Darfur, but in 2006 she switched tracks to study whether this sort of advocacy had any real effect. Over the next few years she interviewed everyone involved: activists, people within the governments of the U.S., Sudan, and other countries, staff from the UN and the International Criminal Court, and of course lots of Darfuris on numerous visits to the region.

This is a story about the limitations of public outcry, which Hamilton also talks about in this excerpt (full video and transcript)

All of this seems especially interesting right now in light of the debate around the Kony 2012 video and Mike Daisey’s falsehoods about the working conditions of Apple employees in China. At what point does simplification or sensationalization of a message make broad public “awareness” ineffective or even harmful? A number of smart people have wrestled with this question recently, including Ethan Zuckerman, who co-founded the Global Voices international citizen media project, in a very thoughtful essay.

Hamilton explains that the U.S. Darfur advocacy movement began on the back of the lessons of Samantha Power’s hugely influential book A Problem from Hell:

“It is in the realm of [U.S.] domestic politics that the battle to stop genocide is lost,” was the key message from the mammoth research Samantha Power had undertaken into the genocides of the twentieth century. It was a mantra that could be seen scribbled on post-it notes on Darfur advocates’ desks and added at the sign off of to their emails. The citizens who started to join the growing movement for Darfur believed that the power to make “never again” meaningful was in their hands, that if they created a loud enough outcry, they could generate the political will needed to get their political leaders to save Darfuri lives.

But this is only true if the problem is, in fact, a lack of political will — and if the political pressure that activists create pushes in the direction of solutions that actually work.

What happened next — during the six or seven years since the start of the attacks in Darfur and the writing of the book — is complicated. Secretary of State Colin Powell publicly called what was happening in Darfur a “genocide” in September 2004, marking the first time in history that an international leader had used “the g-word” while the violence was still ongoing, but the Darfur advocacy moment was really just in its infancy at that point, and Hamilton traces the internal politics of the decision to other factors. Then there was a UN resolution referring the matter to the newly-established International Criminal Court but, writes Hamilton, “contrary to conventional wisdom, the growing Darfur movement was not a significant part of this decision. Although some Darfur advocates voiced their support, the most influential advocates were those based in Africa.”

In 2006, advocates focussed their attention on getting a UN security council resolution authorizing a peacekeeping mission to Darfur. Getting the UN to deploy troops seemed like a way forward, but China, with its close connections to Sudan, would not support the necessary UN resolution. Here, perhaps, is a place where the citizen’s advocacy moment was clearly effective.

U.S. Darfur advocates realized that domestic pressure would not work to influence Chinese leaders. But the 2008 Olympics in Beijing were coming up. Activists executed a prolonged, international “genocide olympics” campaign to publicly link China with the events in Darfur. This included marches, a torch relay, and press campaigns such as a Wall Street Journal op-ed. This had real consequences for China, including the high-profile withdrawal of Steven Speilberg as an artistic advisor to the opening ceremonies. Eventually, China backed down, signing on to a UN Security Council “presidential statement” calling for Sudan to “cooperate fully” with the International Criminal Court.

As one U.S. government official put it, “Activists finally ‘cracked the code’ on moving China.” This didn’t mean that China moved into line with the activist position, but it did move from obstructing all outside involvement with Darfur back to a position of neutrality. In an admittedly rare instance, the Olympics, when activists in the West could threaten an image China actually cared about, public shaming had worked.

The only problem was that a UN peacekeeping mission was doomed to fail, because Sudan didn’t want peacekeepers there at all:

Any mission to protect civilians using outside forces without the consent of the Sudanese government would not only be tantamount to invasion in rhetorical and legal terms, it would bring with it logistical and military complications rising near the level of practical impossibility. No country, not even the United States, was willing to fight a real war with real costs in terms of lives lost in order to protect Darfuris. And until any country was willing to do that, the theoretical debates could continue ad infinitum. The reality was that Sudanese consent was a necessity.

This is just the barest outlines of the story, which was (and is) an intricate international situation. But if this 1,000 word post can only barely outline the situation, how is an advocacy movement supposed to explain the details to large numbers of people? And how are regular people supposed to influence the decision makers in a different country? U.S. politicians have to listen to U.S. voters, but foreign politicians don’t.

International situations seem to require international advocacy — a much harder proposition. As Hamilton asks in this video, “more generally, beyond a state model at all, how are we building connections between different communities?”

Perhaps the most fundamental question here is, why do we believe that bringing something to the attention of a large number of people will have any real effect at all? Of course it’s impossible to know what would have happened in Darfur had there not been this sort of mass advocacy, but the fact remains that in many of the ways that count, the effort was a failure. Hamilton ends the introduction of her book on this point:

Until Darfur, the persistent failure of the U.S. government to protect civilians from genocidal violence could be all-too-easily attributed to and justified by the absence of a politically relevant outcry from citizens. The insufficiency of that alibi has now been revealed. By telling the story of what happened when citizens did create an outcry, Fighting for Darfur enables us to take the next step and begin to understand the other missing pieces of the puzzle.

What does Google gain by not letting me use any name I want?

January 26, 2012March 3, 20143 Comments

tl;dr: all you handle kids are scaring the straights away, and it’s a problem for us.

When Google+ launched this summer, it required users to register under their “real name” — not necessarily their legal name, but the vaguely defined “name your friends, family or co-workers usually call you.” A lot of people thought this was a bad idea, including me. Real names harm a variety of people in different situations, while psuedonyms are an important tool of privacy in a medium where every public utterance is recorded and forever searchable. Although the issue of online identity is far broader than Google+, the launch of this new service was seen by many as a chance to reexamine this point, and the policy triggered a public backlash which came to be known as the nymwars.

Last week, Google+ rolled out a partial reversal of this policy, allowing arbitrary names, but only for new users, and only if they are an already “established identity.” Again, this seems an impossibly vague standard. Also, why? Why can’t I just call myself whatever I want?

Certain answers come out in Google+ chief architect Yonatan Zunger’s recent thread on the topic. It seems that, rightly or wrongly, Google has certain strong ideas about what “kind of community” Google+ is supposed to be. Moreover, they claim that only a minority of people have strong feelings about the use of pseudonyms online, and that they have data showing that the use of “handles” drives other people away.

In this sense the use of real names is at heart a business issue, just as many folks suspected. But not the business issues that have been most talked about. One standard argument is that Google wants your real name for the benefit of advertisers, or for the benefit of state authorities. Of course Zunger and others could be deceiving us (or themselves), and I certainly believe that Google is engaged in a competitive deathmatch to be the dominant online identity provider. But the “advertisers” and “authorities” arguments for real names seem to me weaker than they first appear. After all, it’s not your name that drives personalized advertising algorithms, it’s the content you produce and where you are in the social network. On the commerce side, no one needs my real name to take my money, because payment systems are ultimately tied to credit cards, bank accounts, or phones. These sorts of social and financial links also make real names much less interesting/useful from a law enforcement point of view, especially given that Google will already turn over all your information to government authorities when asked. In either case, names are really only useful as a (very unreliable) key to match between multiple databases. Your behavior is much more telling. For an illustrative example, consider that it’s possible to accurately guess your age, gender, and political orientation from public Twitter data.

So let’s look closely at what Zunger said about the recent names policy change, in response to detailed questions. This is his stated reason for the original real names policy:

First of all, you might ask why we have a names policy at all. (i.e., why we don’t simply go with the JWZ proposal) One thing which we have discovered, while putting some miles on the system, is that it is indeed important to have a name-based service rather than a handle-based service. This isn’t a matter of functionality so much as of community: You get a different kind of community when people are known as Mary Smith than when they are known as captaincrunch42, and for a social product in particular we decided that the first kind of community is the one we want to build. In order to do that, we want to establish a general norm that the names you put in to the system should be names, not handles.

Zunger is talking here both about what “kind of community” Google+ is intended to be, and how he thinks that sort of community can be established — by making rules in an attempt to encourage certain norms. He distinguishes between “names” and “handles.” I’m not really sure I immediately know how to tell these apart. Further adding to the confusion, Zunger is also clear that Google+ is (now) not concerned with whether you are psuedonymous or not:

Our name check is therefore looking, not for things that don’t look like “your” name, but for things which don’t look like names, period. In fact, we do not give a damn whether the name posted is “your” name or not: we will not challenge you on this basis, nor is there any mechanism for other users to cause you to be challenged for this.

In regards to a question about “anonymity,” he says

it depends on what you mean by “anonymity.” If you mean that the name on your account isn’t associated with you in meatspace, I think that we support that right now.

Ok, so why bother restricting names at all? “You claim evidence that a no-handles policy is better for discourse,” wrote Sai. “I’ve seen zero proof of this, and indeed proof to the contrary.” Zunger responded that the policy is

not a no-handles policy, but a rare-handles policy. I don’t have data which I’m at liberty to share, but we got very strong feedback about this one, especially from less technical users, and also very disproportionately across genders: women liked handles a lot less than men. (This is somewhat reflected in the populations which have the highest density of handles: e.g., people who are old-time Internet users and whose handles date back to usernames)

Yet Zunger also admits that real names don’t constrain bad behavior the way he was hoping they would:

We thought this was going to be a huge deal: that people would behave very differently when they were and weren’t going by their real names. After watching the system for a while, we realized that this was not, in fact, the case. (And in particular, bastards are still bastards under their own names.) We’re focusing right now on identifying bad behaviors themselves, rather than on using names as a proxy for behavior.

What’s going on here? Zunger says he both “got very strong feedback” in favor of real names and “bastards are still bastards under their own names.” How can both these things be true? He explains a little farther down the thread:

Actually, it’s not that people think that nyms are abusive at all. It’s that people react differently to seeing that they’ve been circled by John Smith, versus seeing that they’ve been circled by CaptainCrunch49. Various categories of user tend to react very negatively to the latter, say something to the effect of “who are these strange people?!,” and log off and never come back.

…

The initial policy was different, and it was based on a number of reasons, such as the theory that permanent names encouraged good behavior (turned out not to be true) and the theory that name-based services have a different ambiance, and lead to different collective behavior, than handle-based services. (Seems to be true)

…

It’s definitely an issue of perception, not security. But handles are only used in a fairly limited subculture, and a lot of the past intersections of that subculture with the broader culture have been negative: people associate handles with trolls on forums. … Obviously, not everyone with a handle is a bad actor, but handle namespaces have acquired this rep in spades.

When pressed on whether real names are “more engaging and encourage interaction”, he says

This does, in fact, seem to be the case — people seem to interact really differently when they see names and when they see handles. This is one of the main reasons why we continue to think that this distinction is worth preserving.

So Zunger is claiming that the goal of excluding “handles” is based on user behavior differences that can be seen in the data — not “bad behavior” but other things, the only one of which he’s specified is leaving and never coming back. This is a core business issue. But it’s also a user experience question. Zunger refers twice to negative experiences of women on G+, and this is consistent with what I’ve heard from my female friends who complain of “creepy” people adding them to circles. Even when not creepy, “who are all these people adding me?” has been a common refrain with G+. This is a problem which is made worse if people choose psuedonyms which they don’t aren’t already commonly use elsewhere — how do I know who you are when you add me to your circles? Of course, there are psuedonym-preserving potential answers to this question, such as seeing that a known friend vouched for them.

Part of the reason there is such heated argument about the use of psuedonyms is because there’s so little data. The best large scale evidence I know of is Disqus’s figures, which lead the company to conclude that “pseudonyms are the most valuable contributors to communities” in terms of comment threads. Zunger isn’t releasing any data, but he drops many hints about the content of Google’s data set, which contains much richer information than comments. It appears that Google has done some social network analysis on psuedonym use:

There’s a lot of clustering asymmetry in this, however. Generally, if you know at least one person who has an unusual name, you’re likely to know a lot of such people; i.e., people with unusual names travel in tightly-connected clusters. That’s largely because these names tend to be tied to particular subcultures. The problem we’re really encountering here is of culture clashes: people from one culture absolutely freak out when they encounter people from a very alien culture. That’s actually a very deep problem which affects a lot more than names, and it’s one that I’m spending a lot of skull sweat on lately. (I can tell you more off-line) If we can find a good way to deal with that, then the handles problem goes away too, and we can just revert to the simple jwz solution.

And so we get to the current confused state of affairs: you don’t have to sign up under your “real” name (whatever that is), but you have to meet some vaguely defined standard for “established” names. The new criteria are spelled out in the post by VP Bradley Horowitz:

If we flag the name you intend to use, you can provide us with information to help confirm your established identity. This might include:

– References to an established identity offline in print media, news articles, etc
– Scanned official documentation, such as a driver’s license
– Proof of an established identity online with a meaningful following

As a matter of practice, Zunger explains that G+ uses machine classification to decide whether a name is allowed, augmented by humans in the uncertain cases:

The classifier is training to get the (huge number of) easy cases right, not the hard ones; those are always going to be passed off to actual humans. … The goal is that most things which are marked as “not a name” are genuinely cases of something being meant as either a nickname or an organization; whenever the appeals process is triggered, and even more so whenever something passes an appeal, that’s a sign that the first-stage check failed and we need to improve our rules. So then we can look at the pattern of appeals, see if there are classes of names which we are systematically getting wrong, and learn from this to improve the process and reduce the chance of someone being sent through it incorrectly.

But Zunger hasn’t yet really answered the question of what qualifies as a legitimate name at signup — what will pass the human review process that is used to train the machine classifiers? And while there is a new “nickname” feature available to people who have already created an account, you can’t only be known by your nickname, which shows up in addition to your “real” name.

So where does this leave us? On G+ you have to use some name you commonly use elsewhere. The policies are ambiguous but strongly favor the sorts of names most people use to introduce themselves in person, rather than “handles.” You can use a handle on G+, but only if you can convince a human that you have been widely calling yourself that elsewhere. In other words, we still can’t call ourselves whatever we want, because some number of people are going to fail the (unspecified) name check. You can’t create a brand new identity in order to explore, say, an openly gay existence online, or to see how people would treat you if they didn’t know you were a kid (which is something about the internet that was really important to me when I was 14.) The weird thing here is, overall the policy doesn’t sound like it’s really about whether your name is “real,” but whether it sounds like “not a handle” in a way that doesn’t frighten other people away.

So what’s so bad about this? Identity is complicated in real life. We are all different people in different contexts, and expressing yourself online has the risk of smushing all those contexts together in a way that loses something, such as the ability to reveal yourself fully without fear. Google understands that such problems exist, or they wouldn’t have done the sociology research that clearly influenced the “circles” feature. But as Moot has argued, maybe it’s only under anonymous conditions that we are authentic. The current name policy is not going to work for many people in many cases. Zunger knows this:

I completely agree that “well, shut them out then” is not the right thing to do. But I’m currently stuck between shutting out a small number of people, or creating an environment in which a large number of people (especially women, and especially people who are already feeling uncertain in the online environment) feel a hostile environment and get shut out, too. I do not like being in this situation and am actively trying to work on real solutions which will allow us to bridge the gap, and make this a good environment for everyone.

Which is a lovely sentiment, but how is this to happen? Zunger freely admits that the recent changes are supposed to work for most people, most of the time. In this sense the problem is more that G+ is aimed at the mainstream than that it specifically excludes you. It sucks to be different, and G+ is not trying to solve this.

And there’s one of the crucial questions in the whole nymwars debate: are we arguing about rights, or are we arguing about what serves “most” people? I think you could make a good argument that the ability to choose one’s name online is a right, just as it is a right offline. Where the waters get muddy is that Google+ and other huge networks are private property, and can set more or less any rules they want. Yet we all use them; we depend on them for public interaction. Rebecca MacKinnon explores this problem at book length in Consent of the Networked.

Meanwhile, Zunger seems intent on preserving the names/handles distinction:

I’m making a tradeoff in this service by restricting the space of names to things which are, by some criterion, “name-shaped.” On the one hand, the exclusion of handles has a nontrivial cultural effect, because handle-based cultures such as Internet fora, YouTube, some parts of fandom, etc., have established cultural norms which are (on the very large-scale average) ultimately somewhat similar to one another and very different from those in many name-based cultures, such as G+, FB, or meatspace. Since we have made an explicit decision to make G+ a name-based culture, and since the large bulk of our users come exclusively from such cultures (i.e., have little or no familiarity with handle-based cultures), there are significant culture clash risks associated with culture mixing and we’ve chosen to resolve those by basically excluding handles. (With rare exceptions for very established handles, which is an exception people are used to because they see those cases as intrinsically exceptional; as an extreme example, Lady Gaga) On the other hand, this excludes identities which come from handle-based cultures.

…

When the excluded identity is in the second category, then this is frankly working as intended: I’m trading off one virtue of social health (building up a unified culture on G+) against another virtue of social health (allowing as many identities as possible to be represented on the service).

…

The resolution that we’re aiming for amounts to attempting to structure the name restrictions as narrowly as possible in order to attain the social health virtue of building up a name-based culture.

Zunger hasn’t yet explained why a “name-based” culture is a “social health virtue.” He already said that “name-based” cultures don’t control “bad behavior” and “bastards.” But we do have one really important clue: he claims that only a small number of “subcultures” use handles, and that handles drive away people who aren’t used to such cultures. If Zunger is telling the whole story, G+’s critics are right in that this is a business issue, but it isn’t so much that advertisers want real names. Instead, this names policy seems solidly about being acceptable to as many potential users as possible. Which is not such a terrible goal, but it’s by definition anti-subcultural, and that does kill some of the genuine and important ways that people enjoy interacting with each other .

Norms, Laws, and Code

January 16, 2012January 17, 201216 Comments

If we want to organize a group of people to do something online, and we’re not planning to just hire everyone to perform specific tasks, then there are only a few basic options. We can try to build agreement, we can set up rules, or we can write code that creates or constrains the possible actions. I’ve begun to think of these as norms, laws, and code.

I’m thinking about these things because, like many people, I am trying to understand how to use online platforms to get people to work together in some productive way. As Joel Spolsky points out in “Modern Community Building,” this is a very new project and we know little about it. If an online community has a goal, or even any way to say whether it’s “doing well” or not, then it requires a particular type of more-or-less specific behavior from its members. That requirement is a constraint on free choice. Online communities have some combination of informal agreements, formal rules, and software that makes certain types of actions acceptable or unacceptable, possible or impossible, easy or hard.

Norms are agreements about behavior. Within any particular community or culture they define what “good” is, what “rude” is, and what “moral” and “valuable” are too. They can be thought of as more or less arbitrary social constraints on acceptable actions, such as idea that men don’t wear dresses. But they also encourage things we want, such as the idea that helping a stranger is a worthwhile thing to do. In the online setting, norms are how we recognize “good” behavior such as helpfully answering someone’s question, versus “bad” behavior such as trolling and spam. Norms are “all in people’s head” yet they are real in that there are social penalties for violating them, such as a loss in popularity, conflict with other members of a group, or shunning.

I understand laws as formalized rules backed by force. In many cases laws are codifications of norms (e.g. the criminalization of theft) along with pre-defined penalties for violation, backed by an authority with both the ability and the will to enforce these penalties. Laws can also govern the provision of rights or rewards, as in tax refunds or the requirement that health insurance companies cover pre-existing conditions. In the context of an online social platform, “laws” might also mean “rules,” standards of behavior enforced by the site’s owner or other authority and backed by threats such as account suspension.

And then there’s code. Code influences social action whenever people interact with one another through the use of software — that is, social software. Lawrence Lessig has famously argued that code is law, and I am indebted to him for getting me thinking about how code constrains our free choice. But I don’t think law is the right metaphor. You don’t get punished for “violating” code; you simply aren’t able to do things that your software doesn’t allow. Code is more like physics, the basic possibilities of a universe. Or it is like architecture, a built world that has walls which are no less real for being human inventions. Code is the background context that defines what is and is not physically possible. Even if I were completely unconstrained by norms and laws, I couldn’t use a software feature that does not exist.

Code also influences action in the positive sense of what is easy or convenient to do. This is the concept of an affordance in psychology and user interface design. Ease of action is important because changes in degree eventually become changes in kind; although it was possible to create a blog before platforms such as blogger and WordPress, there wasn’t a lot of blogging then because it required hand-editing of HTML pages. Similarly, the fact that anyone can publish anything online doesn’t mean much if there is no way to find it. Web search doesn’t create a global network where anyone can publish, but it makes the network massively more practical.

Norms, laws, and code all apply together in any piece of social software. Consider a discussion forum. There are norms which regulate acceptable behavior — things like prohibitions against trolling, requests to stay on topic, the sort of things encapsulated in the netiquette that evolved with the very first internet discussion systems. Then there are laws, hard rules. “Anyone making racist or hateful remarks will be kicked off the forum,” and that sort of thing. Violations of the rules are judged by an authority who also has the ability to apply non-voluntary penalties, like a moderator who can suspend user accounts. Finally there is the code itself. Email is very different from recent social networks because it gives the user no way to make a completely public post, nor any “subscription” or filtering tools to let the user indicate which public posts they want to see.

Most functional online communities include all three elements. Wikipedia features a volunteer-mediated dispute resolution process, which starts with non-binding recommendations (norms) but can eventually escalate into administrator-enforced temporary or permanent bans (laws). There are various pieces of user interface which support user requests for moderation and allocation of moderators to cases (code).

Different online communities have different norms, laws, and code. “Be civil” is one of Wikipedia’s core policies, whereas 4chan is much more freewheeling. Most news sites retain the right not to publish your comment (rules enforced by an authority), whereas sites such as Slashdot and Reddit rarely delete anthying, preferring instead to punish worthless contributions by downvoting and other mechanisms which starve them of attention. Of course downvoting is not possible without appropriate software, which means that code influences norms. More precisely, code shapes both problems and possible solutions. If it’s not possible to make public posts then we don’t need to worry about privacy; and if we have the ability to restrict the distribution of each post then we have a tool that can be used to address privacy concerns. But any solution which involves people voluntarily working together for their common good — like downvoting — still requires norms or rules.

In short, we can we can convince people, force people, or make certain things possible or impossible, easy or hard. These are the levers we have when trying to get people to work together online toward a common goal. The ability to pull these levers is not evenly distributed among the members of an online community; it’s easier for moderators or other community leaders to promote new norms, laws and rules are typically the creations of site owners or administrators, and code is ultimately under the control of whoever causes it to be written and installed. So there is a politics here. We can talk about whether a social platform is more authoritarian or more democratic, and every choice of rule or code has political implications in terms of who is empowered to do what.

Because they involve non-consensual limits to behavior, these constraints also have bearing on online rights. I find it hard to get upset about limits to “free speech” in someone else’s blog comments. Dave Winer eloquently makes the case that having your own blog means the ability to “craft your own medium,” and you get to set the rules. But when a privately owned online platform becomes integrated into the life of very many people, its norms, laws, and code start to intersect with things that could be considered basic freedoms. This is why there was such an outcry over the real names policy for Google+. Requiring the use of legal names online privileges the already powerful, and removes a crucial tool for online privacy. If personal freedom, empowerment, and participation are important to us, then norms, rules, and code of large online systems cannot be created arbitrarily; they have to serve the users’ basic human interests. This is a point thoroughly explored by Rebecca MacKinnon. But of course every online service provider is also constrained by the laws of the jurisdiction it operates in, which means that an online community can’t choose its rules arbitrarily. Not only does every online social system have its own politics, but those politics intersect with other spheres in complicated ways.

Wiki variations

December 28, 2011December 29, 20112 Comments

In the beginning there was Wikipedia, and it was brilliant. Somehow, making a set of pages that anyone could edit worked. The result was not cacophony but the greatest public collection of knowledge that the world has ever known. And that’s pretty much where we’ve left things, which is a great shame, because there’s so much more to be explored here.

A set of revision-controlled, hyperlinked topic pages is a stupidly useful form. It seems too simple to improve. What we can experiment with is how the pages are produced — which really seems like a far more interesting problem anyway. We can also look at novel ways to use a wiki. Here’s a brain dump of all the different directions I can imagine pushing the classic concept.

Who can edit? Just because Wikipedia is open to all doesn’t mean that all wikis must be. Actually, not even Wikipedia is open to everyone; admins can “protect” pages, restricting editing in various ways temporarily or permanently, or in extreme cases ban users entirely. But the presumption is openness. There are other wikis that start the other way around, such as news organizations’ “topic pages” which are only editable by staff. This control often results a much more consistent product and may also serve to minimize errors, though I’ve never been able to find a quantitative comparison versus pro journalism’s error rate. But the cost of being closed is that no one else can contribute. And sure enough, on most topics I find Wikipedia to be more comprehensive and up-to-date. Compare NYT vs Wikipedia on global warming.

Between entirely closed and entirely open there is a huge unexplored design space. The Washington Post’s WhoRunsGov, a directory of American government personell, was an example of what I’m going to call a “moderated wiki.” Anyone could submit an edit, but the changes had to be approved by staff before going up. WhoRunsGov is no longer up, so perhaps it was not considered a success, but I don’t know anything about why.

There are lots of other in-between possibilities. We could have a post-moderated wiki where changes are visible immediately but checked later, or employ any of the various reputation systems that are commonly used in community moderation; the basic idea is that proven editors have greater privilege and control. I can also imagine a system where all content is written by a small closed group, perhaps the staff of some organization, but the community votes on what articles need to be updated, and submits suggestions, links, etc. The staff then updates the pages according to the community priority. Openfile.ca embodies certain aspects of this.

Another simple variation: I have not yet seen a publicly visible wiki that is editable by everyone within a large organization (as opposed to a few sanctioned authors.) Organizations and communities already have elaborate structures for deciding who is “in” and who is “out,” and this could translate very naturally into editing rights.

Specialized Wikis. It’s going to be extraordinarily hard to produce a better general reference work than Wikipedia, with its millions of articles in dozens of languages and tens of thousands of editors. But your organization or community might know far more about finance, or green roofs, or global media law, or… Each topic potentially has its own community and its own dynamics that could lend itself to different types of editing schemes.

For that matter, Wikipedia’s content is freely re-usable under its Creative Commons CC-BY-SA license. It would be perfectly permissible to build a wiki interface that displayed specialized pages where available, and used Wikipedia content where it is not. Essentially, this is the choice to take editorial control of a certain small set of pages, while retaining the broad utility of a general reference.

Combine wiki and news content. For most people, the news isn’t really comprehensible without detailed background information. And vice versa: after reading a wiki article, I’m probably far more interested in the most recent news on that topic. It seems natural to build a user interface that combines a wiki page with a news stream on that topic, and several news organizations have tried this. But I haven’t found an example that really sings. For me, this is largely because they don’t leverage the broader world of available content. Where is the Wikipedia/Google News mashup?

The revision history of a page, the list of every edit over time, is also a form of recorded news. James Bridle’s 12 volume edit history of “The Iraq War” makes this point beautifully. His work is paper performance art, but the concept has a natural online interpretation: a wiki that automatically highlights the sentences that have changed since the reader last visited that page. Rather than asking readers to construct the whole story from the updates, we would be showing them where the updates fit into the whole story. At least one experimental news site has tried this.

Authorship tracking. Although it is possible in principle to use the revision history on any Wikipedia article to determine who wrote what, both the culture and the user interface discourage this. This is not the only option. The U.S. intelligence community has Intellipedia, which logs authorship:

It’s the Wikipedia on a classified network, with one very important difference: it’s not anonymous. We want people to establish a reputation. If you’re really good, we want people to know you’re good. If you’re making contributions, we want that known. If you’re an idiot, we want that known too.

This also works the other way around, where reputation of the author translates into credibility of the text. I’m not clear on exactly how Intellipedia’s attribution system works; perhaps it simply requires authenticated user logins, or maybe it includes UI features such as appending a user name to each contributed paragraph. One could also imagine systems that constructed a list of bylines based on who wrote how much in the current article. The “blame” function of software version control systems is technical precedent for automatically tracking individual contributions in a collaboratively edited file.

Sourcing and attribution. Wikipedia has three core content policies: neutral point of view (NPOV), no original research, and verifiability. Together, these policies describe what counts as “truth” in the Wikipedia world. NPOV is roughly equivalent to the classic notion of journalistic “objectivity,” no original research says that Wikipedia can never be a primary source, and verifiability says that all statements of fact must be cited (and defines, loosely, what counts as a reputable source for citations.)

The citation system used to enforce verifiability has its roots in age-old scholarship practices, while the no original research policy was originally drafted to exclude kooks with fringe theories. Together they have another extremely important effect: they offload the burden of credibility. Without these policies, the credibility of information on Wikipedia would have to lean far more heavily on the reputation of its authors; difficult to establish, since neither authorship nor authors are well tracked. By depending on the credibility of outside sources, Wikipedia was able to bootstrap from existing systems for authoritative knowledge, while maintaining the flexibility to incorporate any reasonable source.

There’s no reason that an already-credible organization couldn’t choose differently. Scientific journals, news organizations, government agencies etc. routinely act as the original publisher of crucial information, and it seems a small step to say that they could put that information in a wiki. The wiki would be credible to the extent that the organization is considered a credible author, which means that authorial tracking would also be required; perhaps certain “source” pages could designated read only, or all edits could be moderated, or there could be fine-grained attribution of text. They key point is that the user interface clearly distinguishes text that has been authoritatively vetted from text that has not.

Shared text. We need shared texts because we need shared understandings of the world. Without them, collective action becomes impossible and we all suffer. Wikipedia is an ambitious project to create a global knowledge system that is more or less acceptable to all people. The neutral point of view policy is important here, but the wide-open nature of Wikipedia is perhaps more essential to this vision. By definition, a consensus article is something that everyone is happy with; if an article ever reaches a state where it is amenable to all factions, there is no motivation for anyone to edit it further. That this happens for so many pages, even on contentious topics, is remarkable. The mechanics of this process are actually fairly extensive, including an elaborate tiered volunteer dispute resolution process that usually stabilizes edit wars.

There are variations here too. We could explore other methods of dispute resolution, or we could get more sophisticated about Wikipedia’s policy of representing multiple points of view. We could try to map the viewpoints of different authors directly, or we could have multiple versions of a page, each open to a different faction, and then compare the resulting texts to better understand where the differences lie. As always, there is no reason to imagine that “completely open” is the only option; but some openness seems essential.

And this cuts to the heart of what is unique about the wiki form. Open texts have a special legitimacy precisely because they are fragile: they can only exist when all who have an interest in the outcome manage to work together to create and preserve them. Wikipedia shows that this is possible in many more cases than we thought, but it is hardly the final word.

What should the digital public sphere do?

November 29, 2011March 2, 2014journalism, knowledge, media, politics, public sphere, visualization26 Comments

Earlier this year, I discovered there wasn’t really a name for the thing I wanted to talk about. I wanted a word or phrase that includes journalism, social media, search engines, libraries, Wikipedia, and parts of academia, the idea of all these things as a system for knowledge and communication. But there is no such word. Nonetheless, this is an essay asking what all this stuff should do together.

What I see here is an ecosystem. There are narrow real-time feeds such as expertly curated Twitter accounts, and big general reference works like Wikipedia. There are armies of reporters working in their niches, but also colonies of computer scientists. There are curators both human and algorithmic. And I have no problem imagining that this ecosystem includes certain kinds of artists and artworks. Let’s say it includes all public acts and systems which come down to one person trying to tell another, “I didn’t just make this up. There’s something here of the world we share.”

I asked people what to call it. Some said “media.” That captures a lot of it, but I’m not really talking about the art or entertainment aspects of media. Also I wanted to include something of where ideas come from, something about discussions, collaborative investigation, and the generation of new knowledge. Other people said “information” but there is much more here than being informed. Information alone doesn’t make us care or act. It is part of, but only part of, what it means to connect to another human being at a distance. Someone else said “the fourth estate” and this is much closer, because it pulls in all the ideas around civic participation and public discourse and speaking truth to power, loads of stuff we generally file under “democracy.” But the fourth estate today means “the press” and what I want to talk about is broader than journalism.

I’m just going to call this the “digital public sphere”, building on Jürgen Habermas’ idea of a place for the discussion of shared concerns, public yet apart from the state. Maybe that’s not a great name — it’s a bit dry for my taste — but perhaps it’s the best that can be done in three words, and it’s already in use as a phrase to refer to many of the sorts of things I want to talk about. “Public sphere” captures something important, something about the societal goals of the system, and “digital” is a modifier that means we have to account for interactivity, networks, and computation. Taking inspiration from Michael Schudson’s essay “Six or seven things that news can do for democracy,” I want to ask what the digital public sphere can do for us. I think I see three broad categories, which are also three goals to keep in mind as we build our institutions and systems.

1. Information. It should be possible for people to find things out, whatever they want to know. Our institutions should help people organize to produce valuable new knowledge. And important information should automatically reach each person at just the right moment.

2. Empathy. The vast majority of people in the world, we will only know through media. We must strive to represent the “other” to each-other with compassion and reality. We can’t forget that there are people on the other end of the wire.

3. Collective action. What good is public deliberation if we can’t eventually come to a decision and act? But truly enabling the formation of broad agreement also requires that our information systems support conflict resolution. In this age of complex overlapping communities, this role spans everything from the local to the global.

Each of these is its own rich area, and each of these roles already cuts across many different forms and institutions of media.

Information
I’d like to live in a world where it’s cheap and easy for anyone to satisfy the following desires:

“I want to learn about X.”
“How do we know that about X?”
“What are the most interesting things we don’t know about X?”
“Please keep me informed about X.”
“I think we should know more about X.”
“I know something about X and want to tell others.”

These desires span everything from mundane queries (“what time does the store close?”) to complex questions of fact (“what will be the effects of global climate change?”) And they apply at all scales; I might have a burning desire to know how the city government is going to deal with bike lanes, or I might be curious about the sum total of humanity’s knowledge of breast cancer — everything we know today, plus all the good questions we can’t yet answer. Different institutions exist to address each of these needs in various ways. Libraries have historically served the need to answer specific questions, desires number #1 and #2, but search engines also do this. Journalism strives to keep people abreast of current events, the essence of #4. Academia has focused on how we know and what we don’t yet know, which is #2 and #3.

This list includes two functions related to the production of new knowledge, because it seems to me that the public information ecosystem should support people working together to become collectively smarter. That’s why I’ve included #5, which is something like casting a vote for an unanswered question, and #6, the peer-to-peer ability to provide an answer. These seem like key elements in the democratic production of knowledge, because the resources which can be devoted to investigating answers are limited. There will always be a finite number of people well placed to answer any particular question, whether those people are researchers, reporters, subject matter experts, or simply well-informed. I like to imagine that their collective output is dwarfed by human curiosity. So efficiency matters, and we need to find ways to aggregate the questions of a community, and route each question to the person or people best positioned to find out the answer.

In the context of professional journalism, this amounts to asking what unanswered questions are most pressing to the community served by a newsroom. One could devise systems of asking the audience (like Quora and StackExchange) or analyze search logs (ala Demand Media.) That newsrooms don’t frequently do these things is, I think, an artifact of industrial history — and an unfilled niche in the current ecosystem. Search engines know where the gaps between supply and demand lie, but they’re not in the business of researching new answers. Newsrooms can produce the supply, but they don’t have an understanding of the demand. Today, these two sides of the industry do not work together to close this loop. Some symbiotic hybrid of Google and The Associated Press might be an uncannily good system for answering civic questions.

When new information does become available, there’s the issue of timing and routing. This is #4 again, “please keep me informed.” Traditionally, journalism has answered the question “who should know when?” with “everyone everything as fast as possible” but this is ridiculous today. I really don’t want my phone to vibrate for every news article ever written, which is why only “important” stories generate alerts. But taste and specialization dictate different definitions of “important” for each person, and old answers delivered when I need them might be just as valuable as new information delivered hot and fresh. Google is far down this track with its thinking on knowing what I want before I search for it.

Empathy
There is no better way to show one person to another, across a distance, than the human story. These stories about other people may be informative, sure, but maybe their real purpose is to help us feel what it is like to be someone else. This is an old art; one journalist friend credits Homer with the last major innovation in the form.

But we also have to show whole groups to each other, a very “mass media” goal. If I’ve never met a Cambodian or hung out with a union organizer, I only know what I see in the media. How can and should entire communities, groups, cultures, races, interests or nations be represented?

A good journalist, anthropologist, or writer can live with a community for a while, observing and learning, then articulate generalizations. This is important and useful. It’s also wildly subjective. But then, so is empathy. Curation and amplification can also be empathetic processes: someone can direct attention to the genuine voices of a community. This “don’t speak, point” role has been articulated by Ethan Zuckerman and practiced by Andy Carvin.

But these are still at the level of individual stories. Who is representative? If I can only talk to five people, which five people should I know? Maybe a human story, no matter how effective, is just a single sample in the sense of a tiny part standing for the whole. Turning this notion around, making it personal, I come to an ideal: If I am to be seen as part of some group, then I want representations of that group to include me in some way. This is an argument that mass media coverage of a community should try to account for every person in that community. This is absurd in practical terms, but it can serve as a signpost, a core idea, something to aim for.

Fortunately, more inclusive representations are getting easier. Most profoundly, the widespread availability of peer-to-peer communication networks makes it easier than ever for a single member of a community to speak and be heard widely.

We also have data. We can compile the demographics of social movements, or conduct polls to find “public opinion.” We can learn a lot from the numbers that describe a particular population, which is why surveys and censuses persist. But data are terrible at producing the emotional response at the core of empathy. For most people, learning that 23% of the children in some state live in poverty lacks the gut-punch of a story about a child who goes hungry at the end of every month. In fact there is evidence that making someone think analytically about an issue actually makes them less compassionate.

The best reporting might combine human stories with broader data. I am impressed by CNN’s interactive exploration of American casualties in Iraq, which links mass visualization with photographs and stories about each individual. But that piece covers a comparatively small population, only a few thousand people. There are emerging techniques to understand much larger groups, such as by visualizing the data trails of online life, all of the personal information that we leave behind. We can visualize communities, using aggregate information to see the patterns of human association at all scales. I suspect that mass data visualization represents a fundamentally new way of understanding large groups, a way that is perhaps more inclusive than anecdotes yet richer than demographics. Also, visualization forces us into conversations about who exactly is a member of the community in question, because each person is either included in a particular visualization or not. Drawing such a hard boundary is often difficult, but it’s good to talk about the meanings of our labels.

And yet, for all this new technology, empathy remains a deeply human pursuit. Do we really want statistically unbiased samples of a community? My friend Quinn Norton says that journalism should “strive to show us our better selves.” Sometimes, what we need is brutal honesty. At other times, what we need is kindness and inspiration.

Collective action

What a difficult challenge advances in communication have become in recent decades. On the one hand they are definitely bringing us closer to each other, but are they really bringing us together?

– Ryszard Kapuściński, The Other

I am sensitive to the idea of filter bubbles and concerns about the fragmentation of media, the worry that the personalization of information will create a series of insular and homogenous communities, but I cannot abide the implied nostalgia for the broadcast era. I do not see how one-size-fits-all media can ever serve a diverse and specialized society, and so: let a million micro-cultures bloom! But I do see a need for powerful unifying forces within the public sphere, because everything from keeping a park clean to tackling global climate change requires the agreement and cooperation of a community.

We have long had decision making systems at all scales — from the neighborhood to the United Nations — and these mechanisms span a range from very lightweight and informal to global and ritualized. In many cases decision-making is built upon voting, with some majority required to pass, such as 51% or 66%. But is a vicious, hard-fought 51% in a polarized society really the best we can do? And what about all the issues that we will not be voting on — that is to say, most of them?

Unfortunately, getting agreement among even very moderate numbers of people seems phenomenally difficult. People disagree about methods, but in a pluralistic society they often disagree even more strongly about goals. Sometimes presenting all sides with credible information is enough, but strongly held disagreements usually cannot be resolved by shared facts; experimental work shows that, in many circumstances, polarization deepens with more information. This is the painful truth that blows a hole in ideas like “informed public” and “deliberative democracy.”

Something else is needed here. I want to bring the field of conflict resolution into the digital public sphere. As a named pursuit with its own literature and community, this is a young subject, really only begun after World War II. I love the field, but it’s in its infancy; I think it’s safe to say that we really don’t know very much about how to help groups with incompatible values find acceptable common solutions. We know even less about how to do this in an online setting.

But we can say for sure that “moderator” is an important role in the digital public sphere. This is old-school internet culture, dating back to the pre-web Usenet days, and we have evolved very many tools for keeping online discussions well-ordered, from classic comment moderation to collaborative filtering, reputation systems, online polls, and various other tricks. At the edges, moderation turns into conflict resolution, and there are tools for this too. I’m particularly intrigued by visualizations that show where a community agrees or disagrees along multiple axes, because the conceptually similar process of “peace polls” has had some success in real-world conflict situations such as Northern Ireland. I bet we could also learn from the arduously evolved dispute resolution processes of Wikipedia.

It seems to me that the ideal of legitimate community decision making is consensus, 100% agreement. This is very difficult, another unreachable goal, but we could define a scale from 51% agreement to 100%, and say that the goal is “as consensus as possible” decision making, which would also be “as legitimate as possible.” With this sort of metric — and always remembering that the goal is to reach a decision on a collective action, not to make people agree for the sake of it — we could undertake a systematic study of online consensus formation. For any given community, for any given issue, how fragmented is the discourse? Do people with different opinions hang out in different places online? Can we document examples of successful and unsuccessful online consensus formation, as has been done in the offline case? What role do human moderators play, and how can well-designed social software contribute? How do the processes of online agreement and disagreement play out at different scales and under different circumstances? How we do know when the process has converged to a “good” answer, and when it has degraded into hegemony or groupthink? These are mostly unexplored questions. Fortunately, there’s a huge amount of related work to draw on: voting systems and public choice theory, social network analysis, cognitive psychology, information flow and media ecosystems, social software design, issues of identity and culture, language and semiotics, epistemology…

I would like conflict resolution to be an explicit goal of our media platforms and processes, because we cannot afford to be polarized and grid-locked while there are important collective problems to be solved. We may have lost the unifying narrative of the front page, but that narrative was neither comprehensive nor inclusive: it didn’t always address the problems of concern to me, nor did it ask me what I thought. Effective collective action, at all relevant scales, seems a better and more concrete goal than “shared narrative.” It is also an exceptionally hard problem — in some ways it is the problem of democracy itself — but there’s lots to try, and our public sphere must be designed to support this.

Why now?
I began writing this essay because I wanted to say something very simple: all of these things — journalism, search engines, Wikipedia, social media and the lot — have to work together to common ends. There is today no one profession which encompasses the entirety of the public sphere. Journalism used to be the primary bearer of these responsibilities — or perhaps that was a well-meaning illusion sprung from near monopolies on mass information distribution channels. Either way, that era is now approaching two decades gone. Now what we have is an ecosystem, and in true networked fashion there may not ever again be a central authority. From algorithm designers to dedicated curators to, yes, traditional on-the-scene pro journalists, a great many people in different fields now have a part in shaping the digital public sphere. I wanted try to understand what all of us are working toward. I hope that I have at least articulated goals that we can agree are important.

What’s with this programmer-journalist identity crisis?

October 5, 2011October 5, 20114 Comments

I’ve felt it myself: somehow, people want me to declare an identity. Am I really a programmer or a journalist? And if people ask you something a lot, you can internalize it. But I think I just figured out my definitive personal answer.

Other people have been thinking about this too. Like this person and this person and just about most of the news nerds out there. Partially this is because there is recognizably a community of people who like to program with journalistic intent within the more or less traditional journalism industry. That community needed an identity to help it stick together, so we got language like programmer-journalist and hacks/hackers, and the hyphens are always awkward. Makes people wonder about the “right” balance. For that matter, I spend lots of time doing things that wouldn’t fit either label, yet somehow go together with both.

I’ve realized how to articulate my answer to “what’s your profession?” and such vexing questions as “what’s the difference between being a programmer-journalist and an IT person?” It’s this: can you code, are you good at helping people learn about their world, and do you see how software as civic media might contribute to some sort of democratic or social good / making the world a better place? Excellent.

Now suppose you work as one of these hyphenated creatures. Your on-the-job mixture of more traditionally journalistic-y activities (like talking to people to get otherwise unobtainable information) and more traditionally geeky activities (like all-weekend coding binges) is a matter of personal preference. If you personally find that you’d rather be doing more of something, or believe that it might be the sort activity that will improve the press in a way you believe is important, then you should try to do that. Choose different projects or talk to your boss or convince other people this is a good idea or change jobs or something. Do any of the things people do when they want to try to change the kind of work they’re paid to do.

Don’t worry about what the “right” mixture is or how you describe your affiliations. Just worry about living your life in a way that changes you and the world in a way that is pleasing.

And please, let’s not tell news organization IT people they’re not “journalists” or reporters they’re not real programmers. Are they creatively contributing to the mission of the organization? Then why deny the credit?