media – Jonathan Stray

What should the digital public sphere do?

Jonathan Stray — Wed, 30 Nov 2011 01:12:46 +0000

Earlier this year, I discovered there wasn’t really a name for the thing I wanted to talk about. I wanted a word or phrase that includes journalism, social media, search engines, libraries, Wikipedia, and parts of academia, the idea of all these things as a system for knowledge and communication. But there is no such word. Nonetheless, this is an essay asking what all this stuff should do together.

What I see here is an ecosystem. There are narrow real-time feeds such as expertly curated Twitter accounts, and big general reference works like Wikipedia. There are armies of reporters working in their niches, but also colonies of computer scientists. There are curators both human and algorithmic. And I have no problem imagining that this ecosystem includes certain kinds of artists and artworks. Let’s say it includes all public acts and systems which come down to one person trying to tell another, “I didn’t just make this up. There’s something here of the world we share.”

I asked people what to call it. Some said “media.” That captures a lot of it, but I’m not really talking about the art or entertainment aspects of media. Also I wanted to include something of where ideas come from, something about discussions, collaborative investigation, and the generation of new knowledge. Other people said “information” but there is much more here than being informed. Information alone doesn’t make us care or act. It is part of, but only part of, what it means to connect to another human being at a distance. Someone else said “the fourth estate” and this is much closer, because it pulls in all the ideas around civic participation and public discourse and speaking truth to power, loads of stuff we generally file under “democracy.” But the fourth estate today means “the press” and what I want to talk about is broader than journalism.

I’m just going to call this the “digital public sphere”, building on Jürgen Habermas’ idea of a place for the discussion of shared concerns, public yet apart from the state. Maybe that’s not a great name — it’s a bit dry for my taste — but perhaps it’s the best that can be done in three words, and it’s already in use as a phrase to refer to many of the sorts of things I want to talk about. “Public sphere” captures something important, something about the societal goals of the system, and “digital” is a modifier that means we have to account for interactivity, networks, and computation. Taking inspiration from Michael Schudson’s essay “Six or seven things that news can do for democracy,” I want to ask what the digital public sphere can do for us. I think I see three broad categories, which are also three goals to keep in mind as we build our institutions and systems.

1. Information. It should be possible for people to find things out, whatever they want to know. Our institutions should help people organize to produce valuable new knowledge. And important information should automatically reach each person at just the right moment.

2. Empathy. The vast majority of people in the world, we will only know through media. We must strive to represent the “other” to each-other with compassion and reality. We can’t forget that there are people on the other end of the wire.

3. Collective action. What good is public deliberation if we can’t eventually come to a decision and act? But truly enabling the formation of broad agreement also requires that our information systems support conflict resolution. In this age of complex overlapping communities, this role spans everything from the local to the global.

Each of these is its own rich area, and each of these roles already cuts across many different forms and institutions of media.

Information
I’d like to live in a world where it’s cheap and easy for anyone to satisfy the following desires:

“I want to learn about X.”
“How do we know that about X?”
“What are the most interesting things we don’t know about X?”
“Please keep me informed about X.”
“I think we should know more about X.”
“I know something about X and want to tell others.”

These desires span everything from mundane queries (“what time does the store close?”) to complex questions of fact (“what will be the effects of global climate change?”) And they apply at all scales; I might have a burning desire to know how the city government is going to deal with bike lanes, or I might be curious about the sum total of humanity’s knowledge of breast cancer — everything we know today, plus all the good questions we can’t yet answer. Different institutions exist to address each of these needs in various ways. Libraries have historically served the need to answer specific questions, desires number #1 and #2, but search engines also do this. Journalism strives to keep people abreast of current events, the essence of #4. Academia has focused on how we know and what we don’t yet know, which is #2 and #3.

This list includes two functions related to the production of new knowledge, because it seems to me that the public information ecosystem should support people working together to become collectively smarter. That’s why I’ve included #5, which is something like casting a vote for an unanswered question, and #6, the peer-to-peer ability to provide an answer. These seem like key elements in the democratic production of knowledge, because the resources which can be devoted to investigating answers are limited. There will always be a finite number of people well placed to answer any particular question, whether those people are researchers, reporters, subject matter experts, or simply well-informed. I like to imagine that their collective output is dwarfed by human curiosity. So efficiency matters, and we need to find ways to aggregate the questions of a community, and route each question to the person or people best positioned to find out the answer.

In the context of professional journalism, this amounts to asking what unanswered questions are most pressing to the community served by a newsroom. One could devise systems of asking the audience (like Quora and StackExchange) or analyze search logs (ala Demand Media.) That newsrooms don’t frequently do these things is, I think, an artifact of industrial history — and an unfilled niche in the current ecosystem. Search engines know where the gaps between supply and demand lie, but they’re not in the business of researching new answers. Newsrooms can produce the supply, but they don’t have an understanding of the demand. Today, these two sides of the industry do not work together to close this loop. Some symbiotic hybrid of Google and The Associated Press might be an uncannily good system for answering civic questions.

When new information does become available, there’s the issue of timing and routing. This is #4 again, “please keep me informed.” Traditionally, journalism has answered the question “who should know when?” with “everyone everything as fast as possible” but this is ridiculous today. I really don’t want my phone to vibrate for every news article ever written, which is why only “important” stories generate alerts. But taste and specialization dictate different definitions of “important” for each person, and old answers delivered when I need them might be just as valuable as new information delivered hot and fresh. Google is far down this track with its thinking on knowing what I want before I search for it.

Empathy
There is no better way to show one person to another, across a distance, than the human story. These stories about other people may be informative, sure, but maybe their real purpose is to help us feel what it is like to be someone else. This is an old art; one journalist friend credits Homer with the last major innovation in the form.

But we also have to show whole groups to each other, a very “mass media” goal. If I’ve never met a Cambodian or hung out with a union organizer, I only know what I see in the media. How can and should entire communities, groups, cultures, races, interests or nations be represented?

A good journalist, anthropologist, or writer can live with a community for a while, observing and learning, then articulate generalizations. This is important and useful. It’s also wildly subjective. But then, so is empathy. Curation and amplification can also be empathetic processes: someone can direct attention to the genuine voices of a community. This “don’t speak, point” role has been articulated by Ethan Zuckerman and practiced by Andy Carvin.

But these are still at the level of individual stories. Who is representative? If I can only talk to five people, which five people should I know? Maybe a human story, no matter how effective, is just a single sample in the sense of a tiny part standing for the whole. Turning this notion around, making it personal, I come to an ideal: If I am to be seen as part of some group, then I want representations of that group to include me in some way. This is an argument that mass media coverage of a community should try to account for every person in that community. This is absurd in practical terms, but it can serve as a signpost, a core idea, something to aim for.

Fortunately, more inclusive representations are getting easier. Most profoundly, the widespread availability of peer-to-peer communication networks makes it easier than ever for a single member of a community to speak and be heard widely.

We also have data. We can compile the demographics of social movements, or conduct polls to find “public opinion.” We can learn a lot from the numbers that describe a particular population, which is why surveys and censuses persist. But data are terrible at producing the emotional response at the core of empathy. For most people, learning that 23% of the children in some state live in poverty lacks the gut-punch of a story about a child who goes hungry at the end of every month. In fact there is evidence that making someone think analytically about an issue actually makes them less compassionate.

The best reporting might combine human stories with broader data. I am impressed by CNN’s interactive exploration of American casualties in Iraq, which links mass visualization with photographs and stories about each individual. But that piece covers a comparatively small population, only a few thousand people. There are emerging techniques to understand much larger groups, such as by visualizing the data trails of online life, all of the personal information that we leave behind. We can visualize communities, using aggregate information to see the patterns of human association at all scales. I suspect that mass data visualization represents a fundamentally new way of understanding large groups, a way that is perhaps more inclusive than anecdotes yet richer than demographics. Also, visualization forces us into conversations about who exactly is a member of the community in question, because each person is either included in a particular visualization or not. Drawing such a hard boundary is often difficult, but it’s good to talk about the meanings of our labels.

And yet, for all this new technology, empathy remains a deeply human pursuit. Do we really want statistically unbiased samples of a community? My friend Quinn Norton says that journalism should “strive to show us our better selves.” Sometimes, what we need is brutal honesty. At other times, what we need is kindness and inspiration.

Collective action

What a difficult challenge advances in communication have become in recent decades. On the one hand they are definitely bringing us closer to each other, but are they really bringing us together?

– Ryszard Kapuściński, The Other

I am sensitive to the idea of filter bubbles and concerns about the fragmentation of media, the worry that the personalization of information will create a series of insular and homogenous communities, but I cannot abide the implied nostalgia for the broadcast era. I do not see how one-size-fits-all media can ever serve a diverse and specialized society, and so: let a million micro-cultures bloom! But I do see a need for powerful unifying forces within the public sphere, because everything from keeping a park clean to tackling global climate change requires the agreement and cooperation of a community.

We have long had decision making systems at all scales — from the neighborhood to the United Nations — and these mechanisms span a range from very lightweight and informal to global and ritualized. In many cases decision-making is built upon voting, with some majority required to pass, such as 51% or 66%. But is a vicious, hard-fought 51% in a polarized society really the best we can do? And what about all the issues that we will not be voting on — that is to say, most of them?

Unfortunately, getting agreement among even very moderate numbers of people seems phenomenally difficult. People disagree about methods, but in a pluralistic society they often disagree even more strongly about goals. Sometimes presenting all sides with credible information is enough, but strongly held disagreements usually cannot be resolved by shared facts; experimental work shows that, in many circumstances, polarization deepens with more information. This is the painful truth that blows a hole in ideas like “informed public” and “deliberative democracy.”

Something else is needed here. I want to bring the field of conflict resolution into the digital public sphere. As a named pursuit with its own literature and community, this is a young subject, really only begun after World War II. I love the field, but it’s in its infancy; I think it’s safe to say that we really don’t know very much about how to help groups with incompatible values find acceptable common solutions. We know even less about how to do this in an online setting.

But we can say for sure that “moderator” is an important role in the digital public sphere. This is old-school internet culture, dating back to the pre-web Usenet days, and we have evolved very many tools for keeping online discussions well-ordered, from classic comment moderation to collaborative filtering, reputation systems, online polls, and various other tricks. At the edges, moderation turns into conflict resolution, and there are tools for this too. I’m particularly intrigued by visualizations that show where a community agrees or disagrees along multiple axes, because the conceptually similar process of “peace polls” has had some success in real-world conflict situations such as Northern Ireland. I bet we could also learn from the arduously evolved dispute resolution processes of Wikipedia.

It seems to me that the ideal of legitimate community decision making is consensus, 100% agreement. This is very difficult, another unreachable goal, but we could define a scale from 51% agreement to 100%, and say that the goal is “as consensus as possible” decision making, which would also be “as legitimate as possible.” With this sort of metric — and always remembering that the goal is to reach a decision on a collective action, not to make people agree for the sake of it — we could undertake a systematic study of online consensus formation. For any given community, for any given issue, how fragmented is the discourse? Do people with different opinions hang out in different places online? Can we document examples of successful and unsuccessful online consensus formation, as has been done in the offline case? What role do human moderators play, and how can well-designed social software contribute? How do the processes of online agreement and disagreement play out at different scales and under different circumstances? How we do know when the process has converged to a “good” answer, and when it has degraded into hegemony or groupthink? These are mostly unexplored questions. Fortunately, there’s a huge amount of related work to draw on: voting systems and public choice theory, social network analysis, cognitive psychology, information flow and media ecosystems, social software design, issues of identity and culture, language and semiotics, epistemology…

I would like conflict resolution to be an explicit goal of our media platforms and processes, because we cannot afford to be polarized and grid-locked while there are important collective problems to be solved. We may have lost the unifying narrative of the front page, but that narrative was neither comprehensive nor inclusive: it didn’t always address the problems of concern to me, nor did it ask me what I thought. Effective collective action, at all relevant scales, seems a better and more concrete goal than “shared narrative.” It is also an exceptionally hard problem — in some ways it is the problem of democracy itself — but there’s lots to try, and our public sphere must be designed to support this.

Why now?
I began writing this essay because I wanted to say something very simple: all of these things — journalism, search engines, Wikipedia, social media and the lot — have to work together to common ends. There is today no one profession which encompasses the entirety of the public sphere. Journalism used to be the primary bearer of these responsibilities — or perhaps that was a well-meaning illusion sprung from near monopolies on mass information distribution channels. Either way, that era is now approaching two decades gone. Now what we have is an ecosystem, and in true networked fashion there may not ever again be a central authority. From algorithm designers to dedicated curators to, yes, traditional on-the-scene pro journalists, a great many people in different fields now have a part in shaping the digital public sphere. I wanted try to understand what all of us are working toward. I hope that I have at least articulated goals that we can agree are important.

A computational journalism reading list

Jonathan Stray — Tue, 01 Feb 2011 02:29:28 +0000

[Last updated: 18 April 2011 — added statistical NLP book link]

There is something extraordinarily rich in the intersection of computer science and journalism. It feels like there’s a nascent field in the making, tied to the rise of the internet. The last few years have seen calls for a new class of “programmer journalist” and the birth of a community of hacks and hackers. Meanwhile, several schools are now offering joint degrees. But we’ll need more than competent programmers in newsrooms. What are the key problems of computational journalism? What other fields can we draw upon for ideas and theory? For that matter, what is it?

I’d like to propose a working definition of computational journalism as the application of computer science to the problems of public information, knowledge, and belief, by practitioners who see their mission as outside of both commerce and government. This includes the journalistic mainstay of “reporting” — because information not published is information not known — but my definition is intentionally much broader than that. To succeed, this young discipline will need to draw heavily from social science, computer science, public communications, cognitive psychology and other fields, as well as the traditional values and practices of the journalism profession.

“Computational journalism” has no textbooks yet. In fact the term barely is barely recognized. The phrase seems to have emerged at Georgia Tech in 2006 or 2007. Nonetheless I feel like there are already important topics and key references.

Data journalism
Data journalism is obtaining, reporting on, curating and publishing data in the public interest. The practice is often more about spreadsheets than algorithms, so I’ll suggest that not all data journalism is “computational,” in the same way that a novel written on a word processor isn’t “computational.” But data journalism is interesting and important and dovetails with computational journalism in many ways.

The Nieman Journalism Lab’s interview with Guardian Data Blog editor Simon Rogers remains a solid introduction to (one kind of) contemporary practice.
The best practical guides I know are Rogers’ “How to: get to grips with data journalism” and Dan Nguyen’s series of data-scraping tutorials at ProPublica.
Stanford’s Journalism in the Age of Data is an hour-long documentary on data journalism and visualization.
The web is a linked system of human-readable documents. Now Tim Berners-Lee wants to create a web of machine-readable linked data. The full potential is unclear, but it’s a big idea that may come to be the backbone of semantic web visions. The New York Times, The Guardian, and others are experimenting with open data APIs.
Everyblock creator Adrian Holovaty seems to have been the first to suggest that reporters file structured data in his 2006 “A Fundamental Way Newspaper Websites Need to Change.” This idea is beautifully expanded in Stijn Debrouwere’s “Information Architecture for News Websites” series.

Visualization
Big data requires powerful exploration and storytelling tools, and increasingly that means visualization. But there’s good visualization and bad visualization, and the field has advanced tremendously since Tufte wrote The Visual Display of Quantitative Information. There is lots of good science that is too little known, and many open problems here.

Tamara Munzner’s chapter on visualization is the essential primer. She puts visualization on rigorous perceptual footing, and discusses all the major categories of practice. Absolutely required reading for anyone who works with pictures of data.
Ben Fry invented the Processing language and wrote his PhD thesis on “computational information design,” which is his powerful conception of the iterative, interactive practice of designing useful visualizations.
How do we make visualization statistically rigorous? How do we know we’re not just fooling ourselves when we see patterns in the pixels? This amazing paper by Wickham et. al. has some answers.
Is a visualization a story? Segal and Heer explore this question in “Narrative Visualization: Telling Stories with Data.”

Computational linguistics
Data is more than numbers. Given that the web is designed to be read by humans, it makes heavy use of human language. And then there are all the world’s books, and the archival recordings of millions of speeches and interviews. Computers are slowly getting better at dealing with language.

Word frequency techniques like tf-idf and the vector space document model are very simple and very useful. See also stemming. Lots more in the wonderful (and free!) Introduction to Information Retrieval. This book explains how search engines are built, and discusses tf-idf etc. in great technical detail.
Statistical language models are increasingly important for all kinds of applications. Michael Nielsen has a great introduction to statistical machine translation. Google’s Peter Norvig discusses how he implemented statistical spelling correction on his laptop during a long plane flight. For the full deal, see the book Foundations of Statistical Natural Language Processing.
On a related note, Google N-gram viewer lets you look at the frequency of short phrases within 4% of all books published, ever. The excellent paper gives examples of how to use this for cultural research. Dan Cohen has important criticisms.
Speech-to-text algorithms enable automated transcription, and Matt Thompson explores the huge implications for journalism.
Reuters maintains the OpenCalais entity extraction service, which parses text to contextually determine who and what is referenced.
IBM’s Watson project built a question-answering system that reads reference books and wins at Jeopardy. Imagine how useful to journalists and curious readers this could be! This paper on the DeepQA system describes how they did it.

Communications technology and free speech
Code is law. Because our communications systems use software, the underlying mathematics of communication lead to staggering political consequences — including whether or not it is possible for governments to verify online identity or remove things from the internet. The key topics here are networks, cryptography, and information theory.

The Handbook of Applied Cryptography is a classic, and free online. But despite the title it doesn’t really explain how crypto is used in the real world, like Wikipedia does.
It’s important to know how the internet routes information, using TCP/IP and BGP, or at a somewhat higher level, things like the BitTorrent protocol. The technical details determine how hard it is to do things like block websites, suppress the dissemination of a file, or remove entire countries from the internet.
Anonymity is deeply important to online free speech, and very hard. The Tor project is the outstanding leader in anonymity-related research.
Information theory is stunningly useful across almost every technical discipline. Pierce’s short textbook is the classic introduction, while Tom Schneider’s Information Theory Primer seems to be the best free online reference.

Tracking the spread of information (and misinformation)
What do we know about how information spreads through society? Very little. But one nice side effect of our increasingly digital public sphere is the ability to track such things, at least in principle.

Memetracker was (AFAIK) the first credible demonstration of whole-web information tracking, following quoted soundbites through blogs and mainstream news sites and everything in between. Zach Seward has cogent reflections on their findings.
The Truthy Project aims for automated detection of astro-turfing on Twitter. They specialize in covert political messaging, or as I like to call it, computational propaganda.
We badly need tools to help us determine the source of any given online “fact.” There are many existing techniques that could be applied to the problem, as I discussed in a previous post.
If we had information provenance tools that worked across a spectrum of media outlets and feed types (web, social media, etc.) it would be much cheaper to do the sort of information ecosystem studies that Pew and others occasionally undertake. This would lead to a much better understanding of who does original reporting.

Filtering and recommendation
With vastly more information than ever before available to us, attention becomes the scarcest resource. Algorithms are an essential tool in filtering the flood of information that reaches each person. (Social media networks also act as filters.)

The paper on preference networks by Turyen et. al. is probably as good an introduction as anything to the state of the art in recommendation engines, those algorithms that tell you what articles you might like to read or what movies you might like to watch.
Before Google News there was Columbia News Blaster, which incorporated a number of interesting algorithms such as multi-lingual article clustering, automatic summarization, and more as described in this paper by McKeown et. al.
Anyone playing with clustering algorithms needs to have a deep appreciation of the ugly duckling theorem, which says that there is no categorization without preconceptions. King and Grimmer explore this with their technique for visualizing the space of clusterings.
Any digital journalism product which involves the audience to any degree — that should be all digital journalism products — is a piece of social software, well defined by Clay Shirky in his classic essay, “A Group Is Its Own Worst Enemy.” It’s also a “collective knowledge system” as articulated by Chris Dixon.

Measuring public knowledge
If journalism is about “informing the public” then we must consider what happens to stories after publication — this is the “last mile” problem in journalism. There is almost none of this happening in professional journalism today, aside from basic traffic analytics. The key question here is, how does journalism change ideas and action? Can we apply computers to help answer this question empirically?

World Public Opinion’s recent survey of misinformation among American voters solves this problem in the classic way, by doing a randomly sampled opinion poll. I discuss their bleak results here.
Blogosphere maps and other kinds of visualizations can help us understand the public information ecosystem, such as this interactive visualization of Iranian blogs. I have previously suggested using such maps as a navigation tool that might broaden our information horizons.
UN Global Pulse is a serious attempt to create a real-time global monitoring system to detect humanitarian threats in crisis situations. They plan to do this by mining the “data exhaust” of entire societies — social media postings, online records, news reports, and whatever else they can get their hands on. Sounds like key technology for journalism.
Vox Civitas is an ambitious social media mining tool designed for journalists. Computational linguistics, visualization, and more.

Research agenda
I know of only one work which proposes a research agenda for computational journalism.

“Computational Journalism: A Call to Arms for Database Researchers” by Sarah Cohen et. al. raises the very intriguing possibility of building systems that automatically or semi-automatically scan databases for stories, document the rationale for believing certain facts, etc.

This paper presents a broad vision and is really a must-read. However, it deals almost exclusively with reporting, that is, finding new knowledge and making it public. I’d like to suggest that the following unsolved problems are also important:

Tracing the source of any particular “fact” found online, and generally tracking the spread and mutation of information.
Cheap metrics for the state of the public information ecosystem. How accurate is the web? How accurate is a particular source?
Techniques for mapping public knowledge. What is it that people actually know and believe? How polarized is a population? What is under-reported? What is well reported but poorly appreciated?
Information routing and timing: how can we route each story to the set of people who might be most concerned about it, or best in a position to act, at the moment when it will be most relevant to them?

This sort of attention to the health of the public information ecosystem as a whole, beyond just the traditional surfacing of new stories, seems essential to the project of making journalism work.

Internet as information democracy, or new media news monopolies?

Jonathan Stray — Sun, 30 May 2010 08:42:27 +0000

There was a dream that the internet would mean the end of the media gatekeeper; that anyone could get their message out without having to get the attention and approval of the media powers that be. This turns out to be not quite the case.

I took data from the Project form Excellence in Journalism’s State of the News Media 2010 report to create this chart showing the market share of the top 20 news web sites. In theory, the internet busts media monopolies by allowing anyone to publish for free. And there’s no doubt it’s been disruptive. But according to data from Nielsen, the top 7% of 4600 news and information sites get 80% of traffic (from American viewers.) We see a big concentration of power, as the rapid falloff in the chart above shows, and much of it still belongs to “old media.”

Organizations such as CNN, Fox, the New York Times and USA Today rank in the top 20. But so do new media giants AOL, Google News, The Huffington Post and Yahoo.com, which is the biggest news site of all.

(It’s also interesting to note that many of the top 20 new media news sites produce little or none of their own news; in the extreme case Google News produces no stories at all of its own. While some see aggregation as parasitic, I think it’s obvious that it delivers a tremendously valuable service to readers.)

For better or worse, the ability to publish anything nearly for free hasn’t meant the end of big media monopolies. It’s simply shifted the landscape and the power balance.

The limiting factor to getting your message out is no longer having access to an expensive printing press or a TV station. It’s attention: how many minutes of time can you get from how many people? In this game, brand still matters hugely. There are only so many URLs a person can remember, only so many sites they can check in a day.

You have an audience, or you don’t. Mindshare is now the barrier to entry in the media world. Perhaps it always was, though I daresay it was easier to get viewers to check out your new television network when there were only 13 channels. Online, the number of channels is infinite for all intents and purposes; a single person will never exhaust them all.

Which is not to say that the internet has changed nothing. We have seen over and over that bottom-up effects can propel something to mass attention, with no big company behind them. This is often called “going viral,” but that’s not quite a broad enough description of the effect. In many cases, what happens is that something becomes just popular enough to get picked up by mainstream media, who then propel it into the spotlight.

And what this PEJ top 20 list doesn’t take into account is that people now get online news from lots and lots of sources other than news websites.

Facebook is now the most widely used news reading program. It’s also now the #1 site on the internet. Should it top this chart of news sources? Meanwhile, Twitter has become a primary news source for very many people. And then there are mobile news apps, some of which belong to old media news organizations and some of which don’t. The richness of news distribution systems today is well captured in another PEJ report on the “participatory news consumer.”

So has the internet made it easier to get non-mainstream messages out? I think the answer can only be yes. But don’t expect that anyone will be reading your alternative narratives just because you’ve put them online. Your best bet to to be heard still lies with a small number of very large companies. And although the internet per se is relatively uncensored in many countries, commercial gatekeepers like Apple and Facebook own important dedicated channels, and both of them engage in censorship (1, 2).

Jürgen Habermas says he’s not on Twitter

Jonathan Stray — Mon, 01 Feb 2010 13:39:45 +0000

Over the last several days there has been considerable hubbub around the notion that pioneering media theorist Jürgen Habermas might have signed up for Twitter as @JHabermas. This would be “important if true”, as Jay Rosen put it. Intrigued, I tracked him down through the University of Frankfurt. I succeeded in getting him on the phone at his home in Sternburg, and asked him if he was on Twitter. He said,

No, no, no. This is somebody else. This is a mis-use of my name.

He added that “my email address is not publicly available,” which suggests that perhaps he didn’t quite understand what I was getting at. In fact, the father of the public sphere doesn’t seem to understand the internet very well at all, judging by his few previous references to the topic.

I know many people will be disappointed, especially @bitchphd who tweeted “JURGEN HABERMAS is on twitter. definitive response to all future articles about how stupid twitter is.” Personally I believe that Twitter is significant even without Habermas, but it’s clear that this is an issue for the next generation of theorists to decide.

UPDATE: here is an audio recording of my question and his answer.

Know Your Enemy

Jonathan Stray — Wed, 30 Sep 2009 19:08:24 +0000

In America, the enemy is Terrorism. It used to be the Russians, or more generically Communists. We discussed the history of this concept in class today. And then I asked: In the state-controlled Chinese media, who is the enemy today?

I got three immediate answers:

“The West.”

“Japan.”

“Separatists.” (E.g. Tibetans, Uighurs.)

There was instant consensus on this list, among the PRC students. Good to know.

We Have No Maps of The Web

Jonathan Stray — Mon, 04 May 2009 01:17:44 +0000

We dream the internet to be a great public meeting place where all the world’s cultures interact and learn from one another, but it is far less than that. We are separated from ourselves by language, culture and the normal tendency to seek out only what we already know. In reality the net is cliquish and insular. We each live in our own little corner, only dimly aware of the world of information just outside. In this the internet is no different from normal human life, where most people still die within a few kilometers of their birthplace. Nonetheless, we all know that there is something else out there: we have maps of the world. We do not have maps of the web.

I have met people who have never seen a world map. I once had a conversation with herders in the south Sahara who asked me if Canada was in Europe. As we talked I realized that the patriarch of the settlement couldn’t name more than half a dozen countries, and had no idea how long it might take to get to any of the ones he did know. He simply had no notion of how big the planet was. And to him, the world really is small: he lives in the desert, occasionally catches a ride to town for supplies, and will never leave the country in which he was born.

Online, we are all that man. Even the most global and sophisticated among us does not know the true scope of our informational world. Statistics on the “size” of the web are surprisingly hard to come by and even harder to grasp; learning that there are a trillion unique URLs is like being told that the land area of the Earth is 148 million square kilometers. We really have no idea what we’re missing, no visceral experience that teaches our ignorance.

We can remedy this.

First, language. When asked about the Chinese internet, the best most Westerners can manage is “here there be dragons.” Although machine translation is coming along and Google now includes it standard, we do not yet appreciate that the web in other languages could be important. In fact, unless you have twiddled your preferences, the multi-lingual web will not normally appear in your search results. There must have been a point in history when European maps did not show China, and Chinese maps did not show Europe; this is where we live today. The result is a strange sort of online invisibility between the major cultures of the world.

Another kind of invisibility results from gaps in media coverage. Even without the effects of censorship (of both press and internet varieties) there is the question of what counts as news; a famous example is the paucity of world events coverage in the American media. Although blogs can fill the reporting gap, a terrific story means nothing if no one knows where to read it.

Within the limitations of what we can view there are the limits of what we do view. A map of the Iranian blogosphere shows one cluster of visited of sites frequented by reformists and expats, and another for by conservatives and religious youth. In the United States, Amazon book sales data shows that liberals and conservatives don’t read each other’s books. Ideology aside, each person has particular interests; not everyone can be concerned with colony collapse disorder, Polish cinema, or the oil pipelines of Turkmenistan.

It’s not that everyone should care about everything; that’s ridiculous and impossible. I am also not concerned about finding things specifically sought; we have search engines for that. Rather, the point of a map is to know that something is there at all. I want school-children to see the web from space. I want maps of the web and its various resources, online, up to date, for everyone.

We understand, in a general sense, how to make such maps. There have already been a number of large-scale maps of online information, such as the blogosphere visualizations of Matthew Hurst. In his images, each dot is a blog and each arc represents a hyperlink. Automatic layout minimizes the distance between clusters of interlinked blogs, translating nearness on the web into nearness on the map. Looking at these incredibly detailed images, where each tiny dot is a blog, I am overwhelmed by how big just this one corner of the internet can be, and how little of it I can ever perceive. I am also deeply impressed by the Places and Spaces charts of science and other fields, and the phenomenal Scientific Method: Relationships Among Scientific Paradigms. Browsing these maps, I am struck everywhere by the existence large-scale patterns, the continents of a geography I didn’t know existed.

But these views are partial, specialized, and require enormous one-time resources to produce. They are curiosities, not navigation instruments. Until such maps exist in real-time in every browser they are just the toys of academics.

Imagine, then, a online newsreader (RSS reader, feed reader) with a map. I imagine all the world’s feeds drawn out in multiple colors, perhaps mapped out on a sphere. If each of your subscribed feeds was marked with a colored dot on the surface of this abstract Earth — which would include news and blogs from other cultures, ideologies, and languages — then it would be possible to see at a glance just where you stand in information space, and how wide or narrow your perspective. We would finally be able to put a finger down and say “you are here” in the world of what could be learned from the web.

The point is to engage curiosity, to encourage ourselves to leave the house online. In “Intelligent News Agents, With Real New” I envisioned a system that monitors what you read and automatically suggests topics that are as “different” as possible from your usual fare. This is a well-intended attempt to help you escape from the informational ghetto you grew up in, but I now think that such a system would be an utter failure. No one likes to be told what to read. Anyway, how is a programer to to decide what we “should” be viewing? Instead of trying to direct attention, let’s just make people aware of the geography.

There are many things that could be mapped. RSS feeds now include all the major news media, plus blogs, so they are an obvious place to start. A larger whole-web map seems essential for its sheer scope, and another “you are here” moment might arise from plotting personal browser history against such a map. All sorts of global patterns might also become apparent if we visually coded sites by language or topic, as I suggested in “How Many World Wide Webs are There?” Maps of academic publications or books, such as the maps of science discussed above, would reveal more slowly changing patterns in the world’s knowledge. Maps of corporate or political connections – something like a whole-world social network, or akin to the remarkable corporation browser of theyrule.net – would be difficult to generate, requiring considerable data-mining of public information, but could provide an up-to-date snapshot of global economic and power structures.

In all cases, our maps must be drawn very carefully, especially with regard to what counts as a link, because a map of something which is not fundamentally spatial can only be a metaphor. When well chosen, metaphors are powerful because they allow reasoning about one domain through the more familiar concepts of another; when poorly chosen, metaphors are unclear or deceptive. A map also engages our spatial reasoning faculties, the ability to grasp shape and structure at a glance. When we draw maps of information, we are seeking a visual representation of abstract properties such the number of connecting links between blogs, co-authorship of books, or similarity of word vectors. This can be done well or poorly, as Edward Tufte has spent his life demonstrating.

Along this line, I feel that our web maps should be spheres and not planes. Not only does a sphere suggest the Earth, but there is no center on a sphere, no privileged continent. A sphere also provides the concept of an antipode, the point farthest away from wherever you stand. It is good to wonder what is on the other side of the world.

The maps I want are also live. They are not snapshots, nothing like the “blogosphere as recorded by web crawl in August 2007” that we see in captions today. Instead, they must be continually updated, just as our search engines continually re-crawl the web. Our internet also needs history, as The Internet Archive and Google Trends know. I want a time slider on every map, a little widget that lets one scroll back and forth through history and actually watch new blogs rise to prominence, or see the polarization that occurred after 9/11. I want to see the continental drift.

Technologically, none of this is especially difficult, at least not in concept. A whole-web map of all accessible pages does require work with very large datasets, perhaps hundreds of terrabytes, but there are many corporations that know how to do this, often under the label of cloud computing. It also requires whole-web indices, and this is a trickier problem because only the search engine companies currently have the required infrastructure (and are willing to pay for it). The sorts of maps I propose are fundamentally expensive to maintain, which is probably part of why they don’t already exist. This implies centralization, and Google could certainly do the job — if they wanted to, or if they were willing to let others access their data. (Update: more on the economics of web indices.) But details follow need; like Stewart Brand, maybe we first need to want to see the whole world from space.

I live with very idealistic hopes. I believe that being aware of our world truly enables us live better at all scales, from where to brunch to national policy options for desertification. I also believe that communication can reduce bigotry, intolerance, and ultimately conflict, at least if the next generation is exposed young enough. But information that we do not even know exists cannot help us, and the ability to communicate with someone anywhere in the world means nothing if we are never tempted to do it. It is not our fault that we all live in informational ghettoes, but we need to make it obvious that we do.

Maine Man Tries to Build Dirty Bomb, No One Cares

Jonathan Stray — Wed, 11 Feb 2009 22:19:03 +0000

A leaked FBI report states that a man named James G. Cummings was trying to build a dirty bomb when he was shot and and killed by his wife last December 9th in Belfast, Maine. He had plans, parts, explosive ingredients, and small quantities of radioactive material, though nothing that could not be purchased legally within the US. Cummings was a white supremacist who was reportedly very upset about Obama’s election.

The leaked document has been posted on Wikileaks since January 16th. While the material concerning Cummins was first noticed by the rumor site Unattributable.com on January 19th, only yesterday was there any sort of story about it in the mainstream media, in this case the local Bangor Daily News.

Although this dastardly plot was probably not much more dangerous to the public than a garden-variety bomb, this man would certainly qualify as a bona fide “terrorist” under Bush-regime logic. Or at least he would if he was Arab. In point of fact, he actually is a threat to the public, or was. So why haven’t we heard about it? Are crazy white supremacists somehow less of a threat than crazy fundamentalist muslims?

The FBI report notes:

State authorities detected radiation emissions in four small jars in the residence labeled ‘uranium metal’, as well as one jar labeled ‘thorium.’ The four jars of uranium carried the label of an identified US company. Further preliminary analysis on 30 December 2008 indicated an unlabeled jar to be a second jar of thorium. Each bottle of uranium contained depleted uranium 238. Analysis also indicated the two jars of thorium held thorium 232.

Depleted uranium (DU), the by-product of uranium enrichment for use in nuclear power plants or weapons, is not terribly radioactive and is reportedly not very suitable for use in a dirty bomb. Thorium is similarly weakly radioactive, and can be purchased legally through chemical supply companies (such as Fisher Scientific). Dispersal of these isotopes wouldn’t exactly be healthy — they’re both considered carcinogens, and DU has been well documented to cause birth defects, which is why the US and Israeli armies really shouldn’t be spraying foreign countries with DU bullets. However, a depleted uranium/thorium bomb couldn’t really be considered a weapon of mass destruction.

Still, the man was on his way to building some sort of upsetting bomb. Aside from the nastyness of bombings of any sort, I am quite sure the headlines screaming “radioactivity” wouldn’t bother with the scientific subtleties I just covered. I for one am glad that the FBI finally clued in — though only because these materials were found after Cummins was shot and killed by his wife, who claimed she was defending herself after years of physical and sexual abuse.

This is all very strange, and I am left with questions.

Given this foiled plot, the sadly succesful Oklahoma City bombing of 1995, and other deranged loners such as the Unabomber, what is the actual risk to the public from foreign jihadists versus homegrown wackjobs, of which there are apparently plenty? [UPDATE: See also the Texas militia with a sodium cyanide bomb in 2003]
Do the DHS and the FBI know the true answer to this question? Are they allocating their resources appropriately? How come we only found out about this plot accidentally?
Again, the mainstream media still haven’t touched the story. Would this have been an instant headline if the guy was muslim?
If domestic terrorists don’t count, why not? Is it because they’re useless in justifying foreign wars? Or is mostly ignoring them the right response, implying that we are far too jumpy about terrorism in general?
This is completely ridiculous in so many ways. When do we, as a culture, decide to think rationally about terrorism?

And what would a rational approach be to terrorism be? I suggest public health as a model, which would doubtless show that if saving lives and property is the aim, we are wasting our time and money with “terrorism” as compared to, oh, I don’t know, obesity, car accidents, and global climate change.

Intelligent News Agents, With Real New

Jonathan Stray — Thu, 04 Sep 2008 23:17:34 +0000

You cannot read all of the news, every day. There is simply too much information for even a dedicated and specialized observer to consume it all, so someone or something has to make choices. Traditionally, we rely on some other person to tell us what to see: the editor of a newspaper decides what goes on the front page, the reviewer tells us what movies are worth it. Recently, we have been able to distribute this mediation process across wider communities: sites like Digg, StumbleUpon, or Slashdot all represent the collective opinions of thousands of people.

The next step is intelligent news agents. Google (search, news, reader, etc.) can already be configured to deliver to us only that information we think we might want to see. It’s not hard to imagine much more sophisticated agents that would scour the internet for items of interest.

In today’s context, it’s easy to see how such agents could actually be implemented. Sophisitacted customer preference engines are already capable of telling us what products we might like to consume — the best example is Amazon’s recommendation engine. It’s not a big leap to imagine using the same sort of algorithms to model the kinds of blog articles, web pages, youtube videos, etc. that we might enjoy consuming, and then deliver these things to us.

There is a serious problem with this. You’re going to get exactly what you ask for, and only that.

True, we all do this already. We read books and consume media which more or less confirm our existing opinions. This effect is visible as clustering in what we consume, as in this example of Amazon sales data for political books in 2008.

This image is from a beautiful analysis by orgnet.com. Basically, people buy either the red books or the blue books, but usually not both. The same sorts of patterns hold for movies, blogs, newspapers, ideologies, religions, and human beliefs of all kinds. This is a problem; but at least you can usually see the other color of books when you walk into Borders. If we end up relying on trainable agents for all of our information, we risk completely blacking out anything that disagrees with what we already believe.

I propose a simple solution. Automatic network analyses like the one above — of books, or articles, or web pages — could easily pinpoint the information sources that would expose me to the maximum novelty in the minimum time. If my goal is to gain a deep understanding of the entire scope of human discourse, rather than just the parts of it I already agree with, then it would be very simple to program my agent to bring to me exactly those things that would most rapidly give me insight into those regions of information space which are most vital and least known to me. I imagine some metric like “highest degree node most distant from the nodes I’ve already visited” would would work handily.

You can infer a lot about somewhat from the information they currently consume. If my agent noticed that I was a liberal, it could make me understand the conservative world-view, and vice-versa. If my agent detected that I was ignorant of certain crucial aspects of Chinese culture and politics, it could reccomend a primer article. Or it might deduce that I needed to understand just slightly more physics to participate meaningfully in the climate change debate, or decide (based on my movie viewing habits) that it was high time I review the influential films of Orson Welles. Of course, I might in turn decide that I actually, truly, don’t care about film at all; but the very act of excluding specific subjects or categories of thought would force us, consciously, to admit to the boundaries of our mental worlds.

We could program our information gathering systems to challenge us, concisely and effectively, if we so want. Intelligent agents could be mere sycophants, or they could be teachers.

Americans Have Only Their Own Culture

Jonathan Stray — Wed, 16 Jul 2008 01:25:10 +0000

The whole world watches Hollywood movies. I once found X-Men 2 on cable in Oman, the sex and violence airing between the preaching Imams. The whole world reads Western books, either in English or translation. The Da Vinci Code graces the dirty blankets of sidewalk booksellers in Mumbai, and Harry Potter is truly global.

Those who don’t live in America are lucky. They have at least two cultures: their own, and the American imports. Those who live within America are impoverished by comparison. Americans have to go well out of their way to consume media made by people who aren’t like them. We have to go to the “Foreign” section of the video store. We have to suffer through languages we don’t understand, because we are taught only English in schools.

This same effect is repeated on a smaller scale with regional cultural capitals. In Southeast Asia, all the good movies come from Thailand. In Nepal, everything is from India. South Africa produces most of the African media, while Qatar and Egypt supply the Arab world. In every case, media in the minority countries is often much more diverse, drawing from many sources.

Maybe this is imperialism. Maybe this is a bad thing. Maybe every peoples should be producing their own entertainments just as furiously as Hollywood. Maybe. My point is only this: if you live outside of the Empire, the Empire comes to you. But if you live inside, you have to look to find the rest of the world.