What should the digital public sphere do?

Earlier this year, I discovered there wasn’t really a name for the thing I wanted to talk about. I wanted a word or phrase that includes journalism, social media, search engines, libraries, Wikipedia, and parts of academia, the idea of all these things as a system for knowledge and communication. But there is no such word. Nonetheless, this is an essay asking what all this stuff should do together.

What I see here is an ecosystem. There are narrow real-time feeds such as expertly curated Twitter accounts, and big general reference works like Wikipedia. There are armies of reporters working in their niches, but also colonies of computer scientists. There are curators both human and algorithmic. And I have no problem imagining that this ecosystem includes certain kinds of artists and artworks. Let’s say it includes all public acts and systems which come down to one person trying to tell another, “I didn’t just make this up. There’s something here of the world we share.”

I asked people what to call it. Some said “media.” That captures a lot of it, but I’m not really talking about the art or entertainment aspects of media. Also I wanted to include something of where ideas come from, something about discussions, collaborative investigation, and the generation of new knowledge. Other people said “information” but there is much more here than being informed. Information alone doesn’t make us care or act. It is part of, but only part of, what it means to connect to another human being at a distance.  Someone else said “the fourth estate” and this is much closer, because it pulls in all the ideas around civic participation and public discourse and speaking truth to power, loads of stuff we generally file under “democracy.” But the fourth estate today means “the press” and what I want to talk about is broader than journalism.

I’m just going to call this the “digital public sphere”, building on Jürgen Habermas’ idea of a place for the discussion of shared concerns, public yet apart from the state. Maybe that’s not a great name — it’s a bit dry for my taste — but perhaps it’s the best that can be done in three words, and it’s already in use as a phrase to refer to many of the sorts of things I want to talk about. “Public sphere” captures something important, something about the societal goals of the system, and “digital” is a modifier that means we have to account for interactivity, networks, and computation. Taking inspiration from Michael Schudson’s essay “Six or seven things that news can do for democracy,” I want to ask what the digital public sphere can do for us. I think I see three broad categories, which are also three goals to keep in mind as we build our institutions and systems.

1. Information. It should be possible for people to find things out, whatever they want to know. Our institutions should help people organize to produce valuable new knowledge. And important information should automatically reach each person at just the right moment.

2. Empathy. The vast majority of people in the world, we will only know through media. We must strive to represent the “other” to each-other with compassion and reality. We can’t forget that there are people on the other end of the wire.

3. Collective action. What good is public deliberation if we can’t eventually come to a decision and act? But truly enabling the formation of broad agreement also requires that our information systems support conflict resolution. In this age of complex overlapping communities, this role spans everything from the local to the global.

Each of these is its own rich area, and each of these roles already cuts across many different forms and institutions of media.

Information
I’d like to live in a world where it’s cheap and easy for anyone to satisfy the following desires:

  1. “I want to learn about X.”
  2. “How do we know that about X?”
  3. “What are the most interesting things we don’t know about X?”
  4. “Please keep me informed about X.”
  5. “I think we should know more about X.”
  6. “I know something about X and want to tell others.”

These desires span everything from mundane queries (“what time does the store close?”) to complex questions of fact (“what will be the effects of global climate change?”) And they apply at all scales; I might have a burning desire to know how the city government is going to deal with bike lanes, or I might be curious about the sum total of humanity’s knowledge of breast cancer — everything we know today, plus all the good questions we can’t yet answer. Different institutions exist to address each of these needs in various ways. Libraries have historically served the need to answer specific questions, desires number #1 and #2, but search engines also do this. Journalism strives to keep people abreast of current events, the essence of #4. Academia has focused on how we know and what we don’t yet know, which is #2 and #3.

This list includes two functions related to the production of new knowledge, because it seems to me that the public information ecosystem should support people working together to become collectively smarter. That’s why I’ve included #5, which is something like casting a vote for an unanswered question, and #6, the peer-to-peer ability to provide an answer. These seem like key elements in the democratic production of knowledge, because the resources which can be devoted to investigating answers are limited. There will always be a finite number of people well placed to answer any particular question, whether those people are researchers, reporters, subject matter experts, or simply well-informed. I like to imagine that their collective output is dwarfed by human curiosity. So efficiency matters, and we need to find ways to aggregate the questions of a community, and route each question to the person or people best positioned to find out the answer.

In the context of professional journalism, this amounts to asking what unanswered questions are most pressing to the community served by a newsroom. One could devise systems of asking the audience (like Quora and StackExchange) or analyze search logs (ala Demand Media.) That newsrooms don’t frequently do these things is, I think, an artifact of industrial history — and an unfilled niche in the current ecosystem. Search engines know where the gaps between supply and demand lie, but they’re not in the business of researching new answers. Newsrooms can produce the supply, but they don’t have an understanding of the demand. Today, these two sides of the industry do not work together to close this loop. Some symbiotic hybrid of Google and The Associated Press might be an uncannily good system for answering civic questions.

When new information does become available, there’s the issue of timing and routing. This is #4 again, “please keep me informed.” Traditionally, journalism has answered the question “who should know when?” with “everyone everything as fast as possible” but this is ridiculous today. I really don’t want my phone to vibrate for every news article ever written, which is why only “important” stories generate alerts. But taste and specialization dictate different definitions of “important” for each person, and old answers delivered when I need them might be just as valuable as new information delivered hot and fresh. Google is far down this track with its thinking on knowing what I want before I search for it.

Empathy 
There is no better way to show one person to another, across a distance, than the human story. These stories about other people may be informative, sure, but maybe their real purpose is to help us feel what it is like to be someone else. This is an old art; one journalist friend credits Homer with the last major innovation in the form.

But we also have to show whole groups to each other, a very “mass media” goal. If I’ve never met a Cambodian or hung out with a union organizer, I only know what I see in the media. How can and should entire communities, groups, cultures, races, interests or nations be represented?

A good journalist, anthropologist, or writer can live with a community for a while, observing and learning, then articulate generalizations. This is important and useful. It’s also wildly subjective. But then, so is empathy. Curation and amplification can also be empathetic processes: someone can direct attention to the genuine voices of a community. This “don’t speak, point” role has been articulated by Ethan Zuckerman and practiced by Andy Carvin.

But these are still at the level of individual stories. Who is representative? If I can only talk to five people, which five people should I know? Maybe a human story, no matter how effective, is just a single sample in the sense of a tiny part standing for the whole. Turning this notion around, making it personal, I come to an ideal: If I am to be seen as part of some group, then I want representations of that group to include me in some way. This is an argument that mass media coverage of a community should try to account for every person in that community. This is absurd in practical terms, but it can serve as a signpost, a core idea, something to aim for.

Fortunately, more inclusive representations are getting easier. Most profoundly, the widespread availability of peer-to-peer communication networks makes it easier than ever for a single member of a community to speak and be heard widely.

We also have data. We can compile the demographics of social movements, or conduct polls to find “public opinion.” We can learn a lot from the numbers that describe a particular population, which is why surveys and censuses persist. But data are terrible at producing the emotional response at the core of empathy. For most people, learning that 23% of the children in some state live in poverty lacks the gut-punch of a story about a child who goes hungry at the end of every month. In fact there is evidence that making someone think analytically about an issue actually makes them less compassionate.

The best reporting might combine human stories with broader data. I am impressed by CNN’s interactive exploration of American casualties in Iraq, which links mass visualization with photographs and stories about each individual. But that piece covers a comparatively small population, only a few thousand people. There are emerging techniques to understand much larger groups, such as by visualizing the data trails of online life, all of the personal information that we leave behind. We can visualize communities, using aggregate information to see the patterns of human association at all scales. I suspect that mass data visualization represents a fundamentally new way of understanding large groups, a way that is perhaps more inclusive than anecdotes yet richer than demographics. Also, visualization forces us into conversations about who exactly is a member of the community in question, because each person is either included in a particular visualization or not. Drawing such a hard boundary is often difficult, but it’s good to talk about the meanings of our labels.

And yet, for all this new technology, empathy remains a deeply human pursuit. Do we really want statistically unbiased samples of a community? My friend Quinn Norton says that journalism should “strive to show us our better selves.” Sometimes, what we need is brutal honesty. At other times, what we need is kindness and inspiration.

Collective action

What a difficult challenge advances in communication have become in recent decades. On the one hand they are definitely bringing us closer to each other, but are they really bringing us together?

– Ryszard Kapuściński, The Other

I am sensitive to the idea of filter bubbles and concerns about the fragmentation of media, the worry that the personalization of information will create a series of insular and homogenous communities, but I cannot abide the implied nostalgia for the broadcast era. I do not see how one-size-fits-all media can ever serve a diverse and specialized society, and so: let a million micro-cultures bloom! But I do see a need for powerful unifying forces within the public sphere, because everything from keeping a park clean to tackling global climate change requires the agreement and cooperation of a community.

We have long had decision making systems at all scales — from the neighborhood to the United Nations — and these mechanisms span a range from very lightweight and informal to global and ritualized. In many cases decision-making is built upon voting, with some majority required to pass, such as 51% or 66%. But is a vicious, hard-fought 51% in a polarized society really the best we can do? And what about all the issues that we will not be voting on — that is to say, most of them?

Unfortunately, getting agreement among even very moderate numbers of people seems phenomenally difficult. People disagree about methods, but in a pluralistic society they often disagree even more strongly about goals. Sometimes presenting all sides with credible information is enough, but strongly held disagreements usually cannot be resolved by shared facts; experimental work shows that, in many circumstances, polarization deepens with more information. This is the painful truth that blows a hole in ideas like “informed public” and “deliberative democracy.”

Something else is needed here. I want to bring the field of conflict resolution into the digital public sphere. As a named pursuit with its own literature and community, this is a young subject, really only begun after World War II. I love the field, but it’s in its infancy; I think it’s safe to say that we really don’t know very much about how to help groups with incompatible values find acceptable common solutions. We know even less about how to do this in an online setting.

But we can say for sure that “moderator” is an important role in the digital public sphere. This is old-school internet culture, dating back to the pre-web Usenet days, and we have evolved very many tools for keeping online discussions well-ordered, from classic comment moderation to collaborative filtering, reputation systems, online polls, and various other tricks. At the edges, moderation turns into conflict resolution, and there are tools for this too. I’m particularly intrigued by visualizations that show where a community agrees or disagrees along multiple axes, because the conceptually similar process of “peace polls” has had some success in real-world conflict situations such as Northern Ireland. I bet we could also learn from the arduously evolved dispute resolution processes of Wikipedia.

It seems to me that the ideal of legitimate community decision making is consensus, 100% agreement. This is very difficult, another unreachable goal, but we could define a scale from 51% agreement to 100%, and say that the goal is  “as consensus as possible” decision making, which would also be “as legitimate as possible.” With this sort of metric — and always remembering that the goal is to reach a decision on a collective action, not to make people agree for the sake of it — we could undertake a systematic study of online consensus formation. For any given community, for any given issue, how fragmented is the discourse? Do people with different opinions hang out in different places online? Can we document examples of successful and unsuccessful online consensus formation, as has been done in the offline case? What role do human moderators play, and how can well-designed social software contribute? How do the processes of online agreement and disagreement play out at different scales and under different circumstances? How we do know when the process has converged to a “good” answer, and when it has degraded into hegemony or groupthink? These are mostly unexplored questions. Fortunately, there’s a huge amount of related work to draw on: voting systems and public choice theory, social network analysis, cognitive psychology, information flow and media ecosystems, social software design, issues of identity and culture, language and semiotics, epistemology…

I would like conflict resolution to be an explicit goal of our media platforms and processes, because we cannot afford to be polarized and grid-locked while there are important collective problems to be solved. We may have lost the unifying narrative of the front page, but that narrative was neither comprehensive nor inclusive: it didn’t always address the problems of concern to me, nor did it ask me what I thought. Effective collective action, at all relevant scales, seems a better and more concrete goal than “shared narrative.” It is also an exceptionally hard problem — in some ways it is the problem of democracy itself — but there’s lots to try, and our public sphere must be designed to support this.

Why now?
I began writing this essay because I wanted to say something very simple: all of these things — journalism, search engines, Wikipedia, social media and the lot — have to work together to common ends. There is today no one profession which encompasses the entirety of the public sphere. Journalism used to be the primary bearer of these responsibilities — or perhaps that was a well-meaning illusion sprung from near monopolies on mass information distribution channels. Either way, that era is now approaching two decades gone. Now what we have is an ecosystem, and in true networked fashion there may not ever again be a central authority. From algorithm designers to dedicated curators to, yes, traditional on-the-scene pro journalists, a great many people in different fields now have a part in shaping the digital public sphere. I wanted try to understand what all of us are working toward. I hope that I have at least articulated goals that we can agree are important.

 

A computational journalism reading list

[Last updated: 18 April 2011 — added statistical NLP book link]

There is something extraordinarily rich in the intersection of computer science and journalism. It feels like there’s a nascent field in the making, tied to the rise of the internet. The last few years have seen calls for a new class of  “programmer journalist” and the birth of a community of hacks and hackers. Meanwhile, several schools are now offering joint degrees. But we’ll need more than competent programmers in newsrooms. What are the key problems of computational journalism? What other fields can we draw upon for ideas and theory? For that matter, what is it?

I’d like to propose a working definition of computational journalism as the application of computer science to the problems of public information, knowledge, and belief, by practitioners who see their mission as outside of both commerce and government. This includes the journalistic mainstay of “reporting” — because information not published is information not known — but my definition is intentionally much broader than that. To succeed, this young discipline will need to draw heavily from social science, computer science, public communications, cognitive psychology and other fields, as well as the traditional values and practices of the journalism profession.

“Computational journalism” has no textbooks yet. In fact the term barely is barely recognized. The phrase seems to have emerged at Georgia Tech in 2006 or 2007. Nonetheless I feel like there are already important topics and key references.

Data journalism
Data journalism is obtaining, reporting on, curating and publishing data in the public interest. The practice is often more about spreadsheets than algorithms, so I’ll suggest that not all data journalism is “computational,” in the same way that a novel written on a word processor isn’t “computational.” But data journalism is interesting and important and dovetails with computational journalism in many ways.

Visualization
Big data requires powerful exploration and storytelling tools, and increasingly that means visualization. But there’s good visualization and bad visualization, and the field has advanced tremendously since Tufte wrote The Visual Display of Quantitative Information. There is lots of good science that is too little known, and many open problems here.

  • Tamara Munzner’s chapter on visualization is the essential primer. She puts visualization on rigorous perceptual footing, and discusses all the major categories of practice. Absolutely required reading for anyone who works with pictures of data.
  • Ben Fry invented the Processing language and wrote his PhD thesis on “computational information design,” which is his powerful conception of the iterative, interactive practice of designing useful visualizations.
  • How do we make visualization statistically rigorous? How do we know we’re not just fooling ourselves when we see patterns in the pixels? This amazing paper by Wickham et. al. has some answers.
  • Is a visualization a story? Segal and Heer explore this question in “Narrative Visualization: Telling Stories with Data.”

Computational linguistics
Data is more than numbers. Given that the web is designed to be read by humans, it makes heavy use of human language. And then there are all the world’s books, and the archival recordings of millions of speeches and interviews. Computers are slowly getting better at dealing with language.

Communications technology and free speech
Code is law. Because our communications systems use software, the underlying mathematics of communication lead to staggering political consequences — including whether or not it is possible for governments to verify online identity or remove things from the internet. The key topics here are networks, cryptography, and information theory.

  • The Handbook of Applied Cryptography is a classic, and free online. But despite the title it doesn’t really explain how crypto is used in the real world, like Wikipedia does.
  • It’s important to know how the internet routes information, using TCP/IP and BGP, or at a somewhat higher level, things like the BitTorrent protocol. The technical details determine how hard it is to do things like block websites, suppress the dissemination of a file, or remove entire countries from the internet.
  • Anonymity is deeply important to online free speech, and very hard. The Tor project is the outstanding leader in anonymity-related research.
  • Information theory is stunningly useful across almost every technical discipline. Pierce’s short textbook is the classic introduction, while Tom Schneider’s Information Theory Primer seems to be the best free online reference.

Tracking the spread of information (and misinformation)
What do we know about how information spreads through society? Very little. But one nice side effect of our increasingly digital public sphere is the ability to track such things, at least in principle.

  • Memetracker was (AFAIK) the first credible demonstration of whole-web information tracking, following quoted soundbites through blogs and mainstream news sites and everything in between. Zach Seward has cogent reflections on their findings.
  • The Truthy Project aims for automated detection of astro-turfing on Twitter. They specialize in covert political messaging, or as I like to call it, computational propaganda.
  • We badly need tools to help us determine the source of any given online “fact.” There are many existing techniques that could be applied to the problem, as I discussed in a previous post.
  • If we had information provenance tools that worked across a spectrum of media outlets and feed types (web, social media, etc.) it would be much cheaper to do the sort of information ecosystem studies that Pew and others occasionally undertake. This would lead to a much better understanding of who does original reporting.

Filtering and recommendation
With vastly more information than ever before available to us, attention becomes the scarcest resource. Algorithms are an essential tool in filtering the flood of information that reaches each person. (Social media networks also act as filters.)

  • The paper on preference networks by Turyen et. al. is probably as good an introduction as anything to the state of the art in recommendation engines, those algorithms that tell you what articles you might like to read or what movies you might like to watch.
  • Before Google News there was Columbia News Blaster, which incorporated a number of interesting algorithms such as multi-lingual article clustering, automatic summarization, and more as described in this paper by McKeown et. al.
  • Anyone playing with clustering algorithms needs to have a deep appreciation of the ugly duckling theorem, which says that there is no categorization without preconceptions. King and Grimmer explore this with their technique for visualizing the space of clusterings.
  • Any digital journalism product which involves the audience to any degree — that should be all digital journalism products — is a piece of social software, well defined by Clay Shirky in his classic essay, “A Group Is Its Own Worst Enemy.” It’s also a “collective knowledge system” as articulated by Chris Dixon.

Measuring public knowledge
If journalism is about “informing the public” then we must consider what happens to stories after publication — this is the “last mile” problem in journalism. There is almost none of this happening in professional journalism today, aside from basic traffic analytics. The key question here is, how does journalism change ideas and action? Can we apply computers to help answer this question empirically?

  • World Public Opinion’s recent survey of misinformation among American voters solves this problem in the classic way, by doing a randomly sampled opinion poll. I discuss their bleak results here.
  • Blogosphere maps and other kinds of visualizations can help us understand the public information ecosystem, such as this interactive visualization of Iranian blogs. I have previously suggested using such maps as a navigation tool that might broaden our information horizons.
  • UN Global Pulse is a serious attempt to create a real-time global monitoring system to detect humanitarian threats in crisis situations. They plan to do this by mining the “data exhaust” of entire societies — social media postings, online records, news reports, and whatever else they can get their hands on. Sounds like key technology for journalism.
  • Vox Civitas is an ambitious social media mining tool designed for journalists. Computational linguistics, visualization, and more.

Research agenda
I know of only one work which proposes a research agenda for computational journalism.

This paper presents a broad vision and is really a must-read. However, it deals almost exclusively with reporting, that is, finding new knowledge and making it public. I’d like to suggest that the following unsolved problems are also important:

  • Tracing the source of any particular “fact” found online, and generally tracking the spread and mutation of information.
  • Cheap metrics for the state of the public information ecosystem. How accurate is the web? How accurate is a particular source?
  • Techniques for mapping public knowledge. What is it that people actually know and believe? How polarized is a population? What is under-reported? What is well reported but poorly appreciated?
  • Information routing and timing: how can we route each story to the set of people who might be most concerned about it, or best in a position to act, at the moment when it will be most relevant to them?

This sort of attention to the health of the public information ecosystem as a whole, beyond just the traditional surfacing of new stories, seems essential to the project of making journalism work.

Internet as information democracy, or new media news monopolies?

There was a dream that the internet would mean the end of the media gatekeeper; that anyone could get their message out without having to get the attention and approval of the media powers that be. This turns out to be not quite the case.

I took data from the Project form Excellence in Journalism’s State of the News Media 2010 report to create this chart showing the market share of the top 20 news web sites. In theory, the internet busts media monopolies by allowing anyone to publish for free. And there’s no doubt it’s been disruptive. But according to data from Nielsen, the top 7% of 4600 news and information sites get 80% of traffic (from American viewers.) We see a big concentration of power, as the rapid falloff in the chart above shows, and much of it still belongs to “old media.”

Organizations such as CNN, Fox, the New York Times and USA Today rank in the top 20. But so do new media giants AOL, Google News, The Huffington Post and Yahoo.com, which is the biggest news site of all.

(It’s also interesting to note that many of the top 20 new media news sites produce little or none of their own news; in the extreme case Google News produces no stories at all of its own. While some see aggregation as parasitic, I think it’s obvious that it delivers a tremendously valuable service to readers.)

For better or worse, the ability to publish anything nearly for free hasn’t meant the end of big media monopolies. It’s simply shifted the landscape and the power balance.

The limiting factor to getting your message out is no longer having access to an expensive printing press or a TV station. It’s attention: how many minutes of time can you get from how many people? In this game, brand still matters hugely. There are only so many URLs a person can remember, only so many sites they can check in a day.

You have an audience, or you don’t. Mindshare is now the barrier to entry in the media world. Perhaps it always was, though I daresay it was easier to get viewers to check out your new television network when there were only 13 channels. Online, the number of channels is infinite for all intents and purposes; a single person will never exhaust them all.

Which is not to say that the internet has changed nothing. We have seen over and over that bottom-up effects can propel something to mass attention, with no big company behind them. This is often called “going viral,” but that’s not quite a broad enough description of the effect. In many cases, what happens is that something becomes just popular enough to get picked up by mainstream media, who then propel it into the spotlight.

And what this PEJ top 20 list doesn’t take into account is that people now get online news from lots and lots of sources other than news websites.

Facebook is now the most widely used news reading program. It’s also now the #1 site on the internet. Should it top this chart of news sources? Meanwhile, Twitter has become a primary news source for very many people. And then there are mobile news apps, some of which belong to old media news organizations and some of which don’t. The richness of news distribution systems today is well captured in another PEJ report on the “participatory news consumer.”

So has the internet made it easier to get non-mainstream messages out? I think the answer can only be yes. But don’t expect that anyone will be reading your alternative narratives just because you’ve put them online. Your best bet to to be heard still lies with a small number of very large companies. And although the internet per se is relatively uncensored in many countries, commercial gatekeepers like Apple and Facebook own important dedicated channels, and both of them engage in censorship (1, 2).

Jürgen Habermas says he’s not on Twitter

JuergenHabermas

Over the last several days there has been considerable hubbub around the notion that pioneering media theorist Jürgen Habermas might have signed up for Twitter as @JHabermas. This would be “important if true”, as Jay Rosen put it. Intrigued, I tracked him down through the University of Frankfurt. I succeeded in getting him on the phone at his home in Sternburg, and asked him if he was on Twitter. He said,

No, no, no. This is somebody else. This is a mis-use of my name.

He added that “my email address is not publicly available,” which suggests that perhaps he didn’t quite understand what I was getting at. In fact, the father of the public sphere doesn’t seem to understand the internet very well at all, judging by his few previous references to the topic.

I know many people will be disappointed, especially @bitchphd who tweeted “JURGEN HABERMAS is on twitter. definitive response to all future articles about how stupid twitter is.” Personally I believe that Twitter is significant even without Habermas, but it’s clear that this is an issue for the next generation of theorists to decide.

UPDATE: here is an audio recording of my question and his answer.

Know Your Enemy

In America, the enemy is Terrorism. It used to be the Russians, or more generically Communists. We discussed the history of this concept in class today. And then I asked: In the state-controlled Chinese media, who is the enemy today?

I got three immediate answers:

“The West.”

“Japan.”

“Separatists.” (E.g. Tibetans, Uighurs.)

There was instant consensus on this list, among the PRC students. Good to know.

We Have No Maps of The Web

web-from-space

We dream the internet to be a great public meeting place where all the world’s cultures interact and learn from one another, but it is far less than that. We are separated from ourselves by language, culture and the normal tendency to seek out only what we already know. In reality the net is cliquish and insular. We each live in our own little corner, only dimly aware of the world of information just outside. In this the internet is no different from normal human life, where most people still die within a few kilometers of their birthplace. Nonetheless, we all know that there is something else out there: we have maps of the world. We do not have maps of the web.

I have met people who have never seen a world map. I once had a conversation with herders in the south Sahara who asked me if Canada was in Europe. As we talked I realized that the patriarch of the settlement couldn’t name more than half a dozen countries, and had no idea how long it might take to get to any of the ones he did know. He simply had no notion of how big the planet was. And to him, the world really is small: he lives in the desert, occasionally catches a ride to town for supplies, and will never leave the country in which he was born.

Online, we are all that man. Even the most global and sophisticated among us does not know the true scope of our informational world. Statistics on the “size” of the web are surprisingly hard to come by and even harder to grasp; learning that there are a trillion unique URLs is like being told that the land area of the Earth is 148 million square kilometers. We really have no idea what we’re missing, no visceral experience that teaches our ignorance.

We can remedy this.

Continue reading We Have No Maps of The Web

Maine Man Tries to Build Dirty Bomb, No One Cares

A leaked FBI report states that a man named James G. Cummings was trying to build a dirty bomb when he was shot and and killed by his wife last December 9th in Belfast, Maine. He had plans, parts, explosive ingredients, and small quantities of radioactive material, though nothing that could not be purchased legally within the US. Cummings was a white supremacist who was reportedly very upset about Obama’s election.

The leaked document has been posted on Wikileaks since January 16th. While the material concerning Cummins was first noticed by the rumor site Unattributable.com on January 19th, only yesterday was there any sort of story about it in the mainstream media, in this case the local Bangor Daily News.

Although this dastardly plot was probably not much more dangerous to the public than a garden-variety bomb, this man would certainly qualify as a bona fide “terrorist” under Bush-regime logic. Or at least he would if he was Arab. In point of fact, he actually is a threat to the public, or was. So why haven’t we heard about it? Are crazy white supremacists somehow less of a threat than crazy fundamentalist muslims?

The FBI report notes:

Continue reading Maine Man Tries to Build Dirty Bomb, No One Cares

Intelligent News Agents, With Real New

You cannot read all of the news, every day. There is simply too much information for even a dedicated and specialized observer to consume it all, so someone or something has to make choices. Traditionally, we rely on some other person to tell us what to see: the editor of a newspaper decides what goes on the front page, the reviewer tells us what movies are worth it. Recently, we have been able to distribute this mediation process across wider communities: sites like Digg, StumbleUpon, or Slashdot all represent the collective opinions of thousands of people.

The next step is intelligent news agents. Google (search, news, reader, etc.) can already be configured to deliver to us only that information we think we might want to see. It’s not hard to imagine much more sophisticated agents that would scour the internet for items of interest.

In today’s context, it’s easy to see how such agents could actually be implemented. Sophisitacted customer preference engines are already capable of telling us what products we might like to consume — the best example is Amazon’s recommendation engine. It’s not a big leap to imagine using the same sort of algorithms to model the kinds of blog articles, web pages, youtube videos, etc. that we might enjoy consuming, and then deliver these things to us.

There is a serious problem with this. You’re going to get exactly what you ask for, and only that.

True, we all do this already. We read books and consume media which more or less confirm our existing opinions. This effect is visible as clustering in what we consume, as in this example of Amazon sales data for political books in 2008.

Social network graph of Amazon sales of political books, 2008

This image is from a beautiful analysis by orgnet.com. Basically, people buy either the red books or the blue books, but usually not both. The same sorts of patterns hold for movies, blogs, newspapers, ideologies, religions, and human beliefs of all kinds. This is a problem; but at least you can usually see the other color of books when you walk into Borders. If we end up relying on trainable agents for all of our information, we risk completely blacking out anything that disagrees with what we already believe.

I propose a simple solution. Automatic network analyses like the one above — of books, or articles, or web pages — could easily pinpoint the information sources that would expose me to the maximum novelty in the minimum time. If my goal is to gain a deep understanding of the entire scope of human discourse, rather than just the parts of it I already agree with, then it would be very simple to program my agent to bring to me exactly those things that would most rapidly give me insight into those regions of information space which are most vital and least known to me. I imagine some metric like “highest degree node most distant from the nodes I’ve already visited” would would work handily.

You can infer a lot about somewhat from the information they currently consume. If my agent noticed that I was a liberal, it could make me understand the conservative world-view, and vice-versa. If my agent detected that I was ignorant of certain crucial aspects of Chinese culture and politics, it could reccomend a primer article. Or it might deduce that I needed to understand just slightly more physics to participate meaningfully in the climate change debate, or decide (based on my movie viewing habits) that it was high time I review the influential films of Orson Welles. Of course, I might in turn decide that I actually, truly, don’t care about film at all; but the very act of excluding specific subjects or categories of thought would force us, consciously, to admit to the boundaries of our mental worlds.

We could program our information gathering systems to challenge us, concisely and effectively, if we so want. Intelligent agents could be mere sycophants, or they could be teachers.

Americans Have Only Their Own Culture

The whole world watches Hollywood movies. I once found X-Men 2 on cable in Oman, the sex and violence airing between the preaching Imams. The whole world reads Western books, either in English or translation. The Da Vinci Code graces the dirty blankets of sidewalk booksellers in Mumbai, and Harry Potter is truly global.

Those who don’t live in America are lucky. They have at least two cultures: their own, and the American imports. Those who live within America are impoverished by comparison. Americans have to go well out of their way to consume media made by people who aren’t like them. We have to go to the “Foreign” section of the video store. We have to suffer through languages we don’t understand, because we are taught only English in schools.

This same effect is repeated on a smaller scale with regional cultural capitals. In Southeast Asia, all the good movies come from Thailand. In Nepal, everything is from India. South Africa produces most of the African media, while Qatar and Egypt supply the Arab world. In every case, media in the minority countries is often much more diverse, drawing from many sources.

Maybe this is imperialism. Maybe this is a bad thing. Maybe every peoples should be producing their own entertainments just as furiously as Hollywood. Maybe. My point is only this: if you live outside of the Empire, the Empire comes to you. But if you live inside, you have to look to find the rest of the world.