Visualizing communities

There are in fact no masses; there are only ways of seeing people as masses.
Raymond Williams

Who are the masses that the “mass media” speaks to? What can it mean to ask what “teachers” or “blacks” or “the people” of a country think? These words are all fiction, a shorthand which covers over our inability to understand large groups of unique individuals. Real people don’t move in homogeneous herds, nor can any one person be neatly assigned to a single category. Someone might view themselves simultaneously as the inhabitant of a town, a new parent, and an active amateur astronomer. Now multiply this by a million, and imagine trying to describe the overlapping patchwork of beliefs and allegiances.

But patterns of association leave digital traces. Blogs link to each other, we have “friends” and “followers” and “circles,” we share interesting tidbits on social networks, we write emails, and we read or buy things. We can visualize this data, and each type of visualization gives us a different answer to the question “what is a community?” This is different from the other ways we know how to describe groups. Anecdotes are tiny slices of life that may or may not be representative of the whole, while statistics are often so general as to obscure important distinctions. Visualizations are unique in being both universal and granular: they have detail at all levels, from the broadest patterns right down to individuals. Large scale visualizations of the commonalities between people are, potentially, a new way to represent and understand the public — that is, ourselves.

I’m going to go through the major types of community visualizations that I’ve seen, and then talk about what I’d like to do with them. Like most powerful technologies, large scale visualization is a capability that can also be used to oppress and to sell. But I imagine social ends, worthwhile ways of using visualization to understand the “public” not as we imagine it, but as something closer to how we really exist.

Continue reading Visualizing communities

A computational journalism reading list

[Last updated: 18 April 2011 — added statistical NLP book link]

There is something extraordinarily rich in the intersection of computer science and journalism. It feels like there’s a nascent field in the making, tied to the rise of the internet. The last few years have seen calls for a new class of  “programmer journalist” and the birth of a community of hacks and hackers. Meanwhile, several schools are now offering joint degrees. But we’ll need more than competent programmers in newsrooms. What are the key problems of computational journalism? What other fields can we draw upon for ideas and theory? For that matter, what is it?

I’d like to propose a working definition of computational journalism as the application of computer science to the problems of public information, knowledge, and belief, by practitioners who see their mission as outside of both commerce and government. This includes the journalistic mainstay of “reporting” — because information not published is information not known — but my definition is intentionally much broader than that. To succeed, this young discipline will need to draw heavily from social science, computer science, public communications, cognitive psychology and other fields, as well as the traditional values and practices of the journalism profession.

“Computational journalism” has no textbooks yet. In fact the term barely is barely recognized. The phrase seems to have emerged at Georgia Tech in 2006 or 2007. Nonetheless I feel like there are already important topics and key references.

Data journalism
Data journalism is obtaining, reporting on, curating and publishing data in the public interest. The practice is often more about spreadsheets than algorithms, so I’ll suggest that not all data journalism is “computational,” in the same way that a novel written on a word processor isn’t “computational.” But data journalism is interesting and important and dovetails with computational journalism in many ways.

Visualization
Big data requires powerful exploration and storytelling tools, and increasingly that means visualization. But there’s good visualization and bad visualization, and the field has advanced tremendously since Tufte wrote The Visual Display of Quantitative Information. There is lots of good science that is too little known, and many open problems here.

  • Tamara Munzner’s chapter on visualization is the essential primer. She puts visualization on rigorous perceptual footing, and discusses all the major categories of practice. Absolutely required reading for anyone who works with pictures of data.
  • Ben Fry invented the Processing language and wrote his PhD thesis on “computational information design,” which is his powerful conception of the iterative, interactive practice of designing useful visualizations.
  • How do we make visualization statistically rigorous? How do we know we’re not just fooling ourselves when we see patterns in the pixels? This amazing paper by Wickham et. al. has some answers.
  • Is a visualization a story? Segal and Heer explore this question in “Narrative Visualization: Telling Stories with Data.”

Computational linguistics
Data is more than numbers. Given that the web is designed to be read by humans, it makes heavy use of human language. And then there are all the world’s books, and the archival recordings of millions of speeches and interviews. Computers are slowly getting better at dealing with language.

Communications technology and free speech
Code is law. Because our communications systems use software, the underlying mathematics of communication lead to staggering political consequences — including whether or not it is possible for governments to verify online identity or remove things from the internet. The key topics here are networks, cryptography, and information theory.

  • The Handbook of Applied Cryptography is a classic, and free online. But despite the title it doesn’t really explain how crypto is used in the real world, like Wikipedia does.
  • It’s important to know how the internet routes information, using TCP/IP and BGP, or at a somewhat higher level, things like the BitTorrent protocol. The technical details determine how hard it is to do things like block websites, suppress the dissemination of a file, or remove entire countries from the internet.
  • Anonymity is deeply important to online free speech, and very hard. The Tor project is the outstanding leader in anonymity-related research.
  • Information theory is stunningly useful across almost every technical discipline. Pierce’s short textbook is the classic introduction, while Tom Schneider’s Information Theory Primer seems to be the best free online reference.

Tracking the spread of information (and misinformation)
What do we know about how information spreads through society? Very little. But one nice side effect of our increasingly digital public sphere is the ability to track such things, at least in principle.

  • Memetracker was (AFAIK) the first credible demonstration of whole-web information tracking, following quoted soundbites through blogs and mainstream news sites and everything in between. Zach Seward has cogent reflections on their findings.
  • The Truthy Project aims for automated detection of astro-turfing on Twitter. They specialize in covert political messaging, or as I like to call it, computational propaganda.
  • We badly need tools to help us determine the source of any given online “fact.” There are many existing techniques that could be applied to the problem, as I discussed in a previous post.
  • If we had information provenance tools that worked across a spectrum of media outlets and feed types (web, social media, etc.) it would be much cheaper to do the sort of information ecosystem studies that Pew and others occasionally undertake. This would lead to a much better understanding of who does original reporting.

Filtering and recommendation
With vastly more information than ever before available to us, attention becomes the scarcest resource. Algorithms are an essential tool in filtering the flood of information that reaches each person. (Social media networks also act as filters.)

  • The paper on preference networks by Turyen et. al. is probably as good an introduction as anything to the state of the art in recommendation engines, those algorithms that tell you what articles you might like to read or what movies you might like to watch.
  • Before Google News there was Columbia News Blaster, which incorporated a number of interesting algorithms such as multi-lingual article clustering, automatic summarization, and more as described in this paper by McKeown et. al.
  • Anyone playing with clustering algorithms needs to have a deep appreciation of the ugly duckling theorem, which says that there is no categorization without preconceptions. King and Grimmer explore this with their technique for visualizing the space of clusterings.
  • Any digital journalism product which involves the audience to any degree — that should be all digital journalism products — is a piece of social software, well defined by Clay Shirky in his classic essay, “A Group Is Its Own Worst Enemy.” It’s also a “collective knowledge system” as articulated by Chris Dixon.

Measuring public knowledge
If journalism is about “informing the public” then we must consider what happens to stories after publication — this is the “last mile” problem in journalism. There is almost none of this happening in professional journalism today, aside from basic traffic analytics. The key question here is, how does journalism change ideas and action? Can we apply computers to help answer this question empirically?

  • World Public Opinion’s recent survey of misinformation among American voters solves this problem in the classic way, by doing a randomly sampled opinion poll. I discuss their bleak results here.
  • Blogosphere maps and other kinds of visualizations can help us understand the public information ecosystem, such as this interactive visualization of Iranian blogs. I have previously suggested using such maps as a navigation tool that might broaden our information horizons.
  • UN Global Pulse is a serious attempt to create a real-time global monitoring system to detect humanitarian threats in crisis situations. They plan to do this by mining the “data exhaust” of entire societies — social media postings, online records, news reports, and whatever else they can get their hands on. Sounds like key technology for journalism.
  • Vox Civitas is an ambitious social media mining tool designed for journalists. Computational linguistics, visualization, and more.

Research agenda
I know of only one work which proposes a research agenda for computational journalism.

This paper presents a broad vision and is really a must-read. However, it deals almost exclusively with reporting, that is, finding new knowledge and making it public. I’d like to suggest that the following unsolved problems are also important:

  • Tracing the source of any particular “fact” found online, and generally tracking the spread and mutation of information.
  • Cheap metrics for the state of the public information ecosystem. How accurate is the web? How accurate is a particular source?
  • Techniques for mapping public knowledge. What is it that people actually know and believe? How polarized is a population? What is under-reported? What is well reported but poorly appreciated?
  • Information routing and timing: how can we route each story to the set of people who might be most concerned about it, or best in a position to act, at the moment when it will be most relevant to them?

This sort of attention to the health of the public information ecosystem as a whole, beyond just the traditional surfacing of new stories, seems essential to the project of making journalism work.

By the numbers, American journalism failed to inform voters

A recent study by World Public Opinion.org shows that the majority of the American population believed false things about basic national issues, right before the 2010 mid-term elections. I don’t know how to interpret this as anything other than a catastrophic failure of American journalism, in its most fundamental, clichéd, “inform the public” role.

The most damning section of the report (PDF) is titled “Evidence of Misinformation Among Voters.”

The poll found strong evidence that voters were substantially misinformed on many of the issues prominent in the election campaign, including the stimulus legislation, the healthcare reform law, TARP, the state of the economy, climate change, campaign contributions by the US Chamber of Commerce and President Obama’s birthplace. In particular, voters had perceptions about the expert opinion of economists and other scientists that were quite different from actual expert opinion.

This study also found that Fox viewers were significantly more misinformed than average on many issues, which is mostly how this survey was covered in the blogosphere and mainstream news outlets. I think this Fox thing is a terrible diversion from the core problem: the American press did not succeed in informing the public. Not even right before an election, not even on the narrow set of issues that, by survey, voters cared to base their votes on.

The travesty here is that the relevant facts were instantly available from primary sources, such as the Congressional Budget Office and the Intergovernmental Panel on Climate Change. I interpret this failure in the following way: for many kinds of issues, the web makes it easy to find true information. But it doesn’t solve the problem of making people go look. That, perhaps, is a key role for modern journalism. Unfortunately, modern American journalism seems to be very bad at it. I imagine the same problem exists in the journalism of many other countries.

What the study actually says
The study compares what voters think experts believe with what those experts actually believe. This is a bit tricky, and the study isn’t saying that the experts are necessarily right, but we’ll get to that. First, some example findings:

  • 68% of voters thought that “most economists” believe that the stimulus package “saved or created a few jobs” and 20% thought most economists believe that the stimulus caused job losses, whereas only 8% correctly said that most economists think it “saved or created several million jobs.” (The Congressional Budget Office estimates that the stimulus saved several millions jobs, as do 75% of economists interviewed by the Wall Street Journal.)
  • 53% of voters thought that economists believe that Obama’s health care reform plan will increase the deficit, while 29% said that economists were evenly divided on this issue. Only 13% said correctly that a majority of economists think that health care reform will not increase the deficit. (The Congressional Budget Office estimates a net reduction in deficits of $143 billion over 2010-2019, and Boards of Trustees of the Medicare Fund also believe that the Affordable Care act will “postpone the exhaustion of … trust fund assets.”)
  • 12% of voters thought that “most scientists believe” that climate change is not occurring, while 33% thought scientists were evenly divided on the issue. That’s 45% with an incorrect perception, as opposed to the 54% who said, correctly, that most scientists think climate change is occurring. (Aside from the IPCC reports and virtually every governmental study of the issue worldwide, an April 2010 survey of climate scientists showed that 97% believe that human-caused climate change is occurring.)

A fussy but necessary digression: all of this rests on the reliability of the WorldPublicOpinion.org survey results. The survey was conducted by Knowledge Networks, Inc. using an online response panel randomly selected from the US population. Those without internet access were apparently provided it for free. I have been unable to find any serious independent evaluation of Knowledge Networks’ methodology, but their many research papers on sample design certainly talk the talk. All of the basic sampling errors, such as self-selection and language bias (what about Hispanics?) are at least addressed on paper. The margin of error is reported as 3.9%.

So let’s take these survey results as accurate, for the moment. This means that the majority of the American public had an incorrect conception of expert opinion on the issues that they voted on. That’s a mouthful. It’s not the same as “believed false things,” and in fact asking “what do you think experts believe” deliberately dodges the tricky question of what is true. If there is some misperception of expert belief, then in the strictest terms the public is misinformed. The study addresses this point as follows:

In most cases we inquired about respondents’ views of expert opinion, as well as the respondents’ own views. While one may argue that a respondent who had a belief that is at odds with expert opinion is misinformed, in designing this study we took the position that some respondents may have had correct information about prevailing expert opinion but nonetheless came to a contrary conclusion, and thus should not be regarded as ‘misinformed.’

So this study does not say “the American public are wrong about the economy and climate change.” It says that they haven’t really looked into it. I’m all for questioning authority’s claim to truth — anyone who follows my work knows that I’m generally a fan of Wikipedia, for example — but I believe we must take lifelong study and rigorous methodology seriously. To put it another way: voting contrary to the opinions of economists may be a fine thing, but voting without any awareness of their work is just silly. Yet that seems to be exactly what happened in the last election.

The role of the press, then and now
Of course, voting is hard and stuff is complex, which is why we rely on the media to break it all down for us. The sad part is that economics and climate change are familiar ground for journalists. It’s not like the facts of these issues were not published in mainstream news outlets. For that matter, journalists were not even necessary here. Any citizen with a web browser could have found out exactly what the Affordable Care Act was predicted to do to the deficit. The Congressional Budget Office published their report and then blogged about it in plain language.

Maybe publishing the truth was never enough. Maybe journalism never actually “informed the public,” but merely created conditions where the curious could get themselves informed by diligently reading the news. But on big issues like whether a piece of national legislation will affect the deficit, we no longer need professionals to enable this kind of self-motivated discovery. The sources go direct in such cases, as the Congressional Budget Office did. And do we really expect that the social media sphere — that’s all of us — will remain silent about the next big global warming study? We’re all going to use Facebook etc. to share links to the next IPCC report when it comes out.

If the problem of having access to true information about these sorts of “votable issues” is solved by the web, what isn’t solved by the web is getting every voter to go look at least once. That might be a job for informed professionals at the helm of big media channels. This is a big responsibility for a news organization to try to take, but I don’t see how it’s anything but the corollary to the responsibility to only publish true information. Presumably some of that information is important enough to know, so consumers would probably appreciate the idea that your mission is to ensure they are informed.

I suspect that paper-based habits are holding journalism back here. There is a deeply ingrained newsroom emphasis on reporting only what’s “new.” A budget report only gets to be news once, even if what it says is relevant for years. But there are no “editions” online; the same headline can float on the hot topics list for as long as it’s relevant. There is even more reason to keep directing attention to an issue if people are actively discussing it, if it is greatly polarized, or if there’s a lot of spin around it (see: the rise of fact-check journalism). In any case, journalists have long been good at keeping an issue in the news, by advancing the story daily in one way or another. But first they have to know what the public doesn’t know.

So the burning question that the World Public Opinion study leaves me with is just this: why wasn’t it a news organization that commissioned this survey?

See also: Does journalism work?

Does journalism work?

How do we know that the work that journalists do accomplishes anything at all? And what does journalism do, exactly, beyond vague statements like “supports democracy” and trivial ones like “gives me movie reviews”?

I made this image a couple months ago to introduce the question at a conference. A reporter researches and writes a story. The first arrow represents the process that gets that story published. We understand that process quite well, and the internet makes publishing really cheap and easy. Then there’s a process that takes published, accurate information and turns it into truth and justice for all. That’s the part that’s fuzzy. In fact I don’t think we understand it at all. I call this “the last mile problem” in journalism — how does journalism actually reach people?

Journalists occasionally claim a scalp, such as by embarrassing a politician enough to force them to resign, or focussing attention on some issue long enough to get legislation passed. Journalism also theoretically informs citizens so they can vote responsibly, in the elections which happen every few years. As I’ve argued before, these are weak levers by which to shift society. I’m less interested in what journalism does in extraordinary times, and more interested in how the journalist’s work improves the day-to-day operation of a society, and the experiences of the people living in it.

It’s possible that much of the journalism we have is effective. Maybe the mere existence of consistent reporting on the machinations of the powerful keeps them in line, and we’ll only know what journalism really gave us when it disappears and civilization collapses into a mire of secrecy and corruption. Or maybe that’s already happened. How would we know? How can we tell whether journalism, as a local or a global endeavor, is doing better this year than last?

Other fields have goals
I like to hang around the international development community, and those people have real problems. People working in public health are charged with improving access to clean water or preventing the spread of HIV. Others try to get more girls into school, or to raise entire communities out of poverty.

There are lots of ways to attack such complex social problems. An NGO or a foundation or a UN organ could lobby local politicians, produce research reports, provide services directly to affected populations, or launch a public awareness campaign. The way in which an organization proposes to have an effect is called their “theory of change.” This is a term I hear frequently at gatherings of development workers, and from the staff of NGOs and international organizations. Such organizations must continually develop and articulate their theory of change in order to secure philanthropic funding.

Journalism has no theory of change — at least not at the level of practice.

I’ve taken to asking editors, “what do you want your work to change in society?” The answer is generally along the lines of, “we aren’t here to change things. We are only here to publish information.” I don’t think that’s an acceptable answer. Journalism without effect does not deserve the special place in democracy that it tries to claim.

The question of “what change should journalism produce” is hard because it is unavoidably a normative question, a question about how journalists envision a “better” world. At the moment, the field of professional journalism is mired in intense confusion about its role and the meaning of classic standards such as “objectivity.” This has obscured discussion of the field’s goals at a moment of great transition brought on by new communications technology, precisely the time when clarity is most needed.

It’s telling that discussions of journalism’s fundamentals frequently harken back to the great debate of Lippman vs. Dewey. That happened in the 1920s. This was not only before live television and before the internet, it was before bastions of modern reasoning such as statistical inference, the study of cognitive biases, and the social construction of knowledge were fully developed. Other fields have done much better in adapting to the philosophical and technological revolutions of the last century.

Medicine in general and public health in particular have become relentlessly evidence-based. It’s no longer enough to run anti-smoking ads; we now require those responsible for public health to show that their preferred method of behavior modification actually reduces disease. Meanwhile, marketers have rallied around the idea that purpose of their work is to get targeted individuals to do something, whether that’s purchasing a product or voting for a particular candidate. That may not be an appropriate goal for non-advocacy journalism, but marketing and public relations researchers have made very careful studies of communication, recall, and belief.

Similar concerns over how messages are received arise in many fields, from crisis communications to public diplomacy. But not in journalism. If journalism does not change action it must change minds, but the tools and language of belief change seem to be entirely missing from the profession.

Journalism as surveillance of ignorance
It used to be the job of an editor to decide what to publish. Maybe it is now the job of an editor to decide what needs to be known. These are not at all the same thing. They used to be, when nothing could be done with a story after the ink hit paper. The internet allows so much more — promotion within specific communities, feedback on readership and reception, conversation as opposed to oratory. And potentially, cheap techniques to determine what people already believe.

We should expect that users will largely be choosing for themselves what to read and view. That’s reality, and that’s fine, and systems that make it easy to satisfy curiosity are systems that will make us smarter (even though we’ll mostly use them for entertainment.) But I believe there will still be an identifiable set of common content, the few things that the public — or some targeted fraction of it — absolutely has to know to participate meaningfully in the civic issues of the day. This is more or less what editors put on the front page today. But rather than the headlines reflecting the most important events, perhaps they should reflect the most pernicious misconceptions. Good journalists already have some sense of this, and every so often we learn of an alarming gap in public knowledge. A majority of Americans believed for years that Saddam Hussein was linked to 9/11, for example. Today, most Americans don’t know what’s actually in Obama’s new health care laws. (I apologize again to my international readers for the US-centric examples; I’d love to hear of similarly woeful tales from other countries.)

Combatting ignorance is harder than publishing. It’s my best guess for the second, mysterious arrow in the diagram above. Fortunately we also have new tools. We have reams and reams of data that people voluntarily put online, the “data exhaust” of entire societies. We also have old-fashioned public opinion polls, and their lightweight cousin online polls (though self-selection bias may render online surveys useless for all but the most casual work.) Somewhere in all this data and all this communication, it must be possible to figure out what it is that people actually believe — and where those beliefs are factually wrong in an uncomplicated way, precisely the way that an editor would say “that’s not true, we can’t print it.”

There are many possibilities for understanding the beliefs of an audience. I am particularly intrigued by opinion mapping, deliberative polling, and the attempts of UN Global Pulse to create data-driven societal monitoring systems. It may actually be possible to cheaply measure the state of public knowledge, which would also give us concrete metrics for improvement. We need new ways of thinking about the surveillance of ignorance, and we need software to implement them. But more than anything else, we need journalists attuned to what it is that people don’t know. Good journalists already are; they can see what is missing from discussion — whether that’s a question that no one has answered or a challenge to a prevalent belief — and do the hard work of adding it.

This effort applies at all scales. Each journalist has an audience or audiences, their communities of concern. Each could track what their audience already knows and believes. The job of the journalist, so conceived, is not merely to report the happenings, but to ensure that the audience is aware of and understands the most crucial of them. That won’t be easy. Aside from the challenges of determining what an audience already knows, people don’t like to be told they’re uninformed or wrong. This is why I believe a journalist needs to learn everything there is know about public communication, borrowing and adapting from marketing experts and public health planners. Genuine honesty and humility seems to me the ethical core, and newsroom transparency is a critical check on this power.

Of course, decisions would have to be made about what are misconceptions and which of them are important enough to combat. Decisions have to be made already about what to cover and promote with limited resources, and these hard choices are the iceberg that sinks any hope of a truly “impartial” journalism. It’s a reality that the profession has to deal with every day, and I wish we would get on with the work of crafting and communicating our normative stance, rather than insisting that “objectivity” means we don’t have one. (Even Wikipedia explains its norms in great detail.) I’d like to start with a list of things that journalists wish were better known. Be honest. I know you’ve already thought about this.

But if we can get over that hurdle — if we can admit that journalism needs concrete goals — then we stand a chance of doing better journalism, and knowing when we’re doing it. For me, the insane possibility of new communications technology carries with it the obligation to do better than we ever have before.

UPDATE: As if on cue, a major study was released four days after I published this, showing that a majority of American voters were misinformed about the issues they voted on in the recent mid-term elections. I discuss what that means here.

Countries Seen Through Comments

The comments on the news are more revealing of a culture than the news itself. Journalism too often has a commitment to a sort of sanitized neutrality, and certainly tries for clarity, smoothing away complex disagreements. That has its uses, but the comments are a much messier, more divided, more personal look at a culture. They have the texture of life at street level.

First, America. On January 30 New York Times ran an article headlined “U.S. Suspends Haitian Airlift in Cost Dispute” which described how the US had stopped medical evacuations to Miami because of a state vs. federal arguments over who would pay for their medical care. The comments reveal a deeply divided country:

The richest country in the world bickering about who is going to pay before it treats patients who need critical care from the poorest country in the Western hemisphere! We Americans should be very proud of ourselves!

Another wave of third world, uneducated people of an alien culture is about to hit our shores, helped this time by the Obama administration’s desire to show compassion. Unfortunately, the tax-paying citizens of this country will have to pay, in more ways than one.

Meanwhile, Nigerians are talking about the Underwear Bomber. Global Voices has helpfully collected some of the blogger reactions. For America, the story was about fear and terrorism and security. For Nigeria, it was about reputation and identity:

Be honest, when you heard a Nigerian man tried to commit a terrorist act in America, how many of you immediately thought ‘Please don’t let him be [insert your ethnic group]?

There’s an Igbo proverb that says, “If one finger touches palm oil, it spreads to all the other fingers.” This is indicative of how Nigerians the world over felt when they heard the news of a young man who attempted to detonate a bomb on U.S. soil in the name of Al Qaeda. Many of us worried that the actions of this one finger would spread to cover the entire 150 million of us.

How does disowning him help Nigerians understand what role extreme Islamic ideology played in causing him to attempt detonating an explosive device on board a US-bound airliner? How does it help Nigerians understand the complex interplay of religious faith, access to extremist religious groups and ideological brainwashing?

Meanwile, the ever-wonderful ChinaSMACK took a break from pop-culture scandal (on the site right now: Hong Kong Girl Shows Off C Cup Breasts To Ex-Boyfriend) to translate online Chinese reactions to news of US sales of arms to Taiwan. And the Chinese, often portrayed as uniformly nationalist, are just as diverse and divided as any other country:

We don’t need to fear America selling arms to Taiwan, as soon as a war started these advanced weapons would be quickly consumed by our lower-quality but numerous weapons and many soldiers.

I only know that without America, the whole world would be chaotic.

As long as America exists, the world cannot be peaceful.

What I want to know is, where are the foreign voices in these conversations? Right now each culture is talking about the others like they’re not in the room. And they’re right. Our global conversation is fragmented and unmapped. It’s not a small world after all.

Escaping the News Hall of Mirrors

We live in a cacaphony of news, but most of it is just echoes. Generating news is expensive; collecting it is not. This is the central insight of the news aggregator business model, be it a local paper that runs AP Wire and Reuters stories between ads, or web sites like Topix, Newser, and Memeorandum, or for that matter Google News. None of these sites actually pay reporters to research and write stories, and professional journalism is in financial crisis. Meanwhile there are more bloggers, but even more re-blogging. Is there more or less original information entering the web this year than last year? No one knows.

A computer could answer this question. A computer could trace the first, original source of any particular article or statement. The effect would be like donning special glasses in the hall of mirrors that is current news coverage, being able to spot the true sources without distraction from reflections. The required technology is nearly here.

This is more than geekery if you’re in a position of needing to know the truth of something. Last week I was researching a man named Michael D. Steele, after reading a newly leaked document containing his name. Steele gained fame as one of the stranded commanders in Black Hawk Down, but several of his soldiers later killed three unarmed Iraqi men. I rapidly discovered many news stories (1, 2, 3, 4, 5, 6, 7, etc.) claiming that Steele had ordered his men to “kill all military-age males.” This is a serious accusation, and widely reprinted — but no number of news articles, blog posts, and reblogs can make a false statement more true. I needed to know who first reported this statement, and its original source.

Continue reading Escaping the News Hall of Mirrors

How Many World Wide Webs Are There?

newblog-crop

How much overlap is there between the web in different languages, and what sites act as gateways for information between them? Many people have constructed partial maps of the web (such as the  blogosphere map by Matthew Hurst, above) but as far as I know, the entire web has never been systematically mapped in terms of language.

Of course, what I actually want to know is, how connected are the different cultures of the world, really? We live in an age where the world seems small, and in a strictly technological sense it is. I have at my command this very instant not one but several enormous international communications networks; I could email, IM, text message, or call someone in any country in the world. And yet I very rarely do.

Similarly, it’s easy to feel like we’re surrounded by all the international information we could possibly want, including direct access to foreign news services, but I can only read articles and watch reports in English. As a result, information is firewalled between cultures; there are questions that could very easily be answered by any one of tens or hundreds of millions of native speakers, yet are very difficult for me to answer personally. For example, what is the journalistic slant of al-Jazeera, the original one in Arabic, not the English version which is produced by a completely different staff?  Or, suppose I wanted to know what the average citizen of Indonesia thinks of the sweatshops there, or what is on the front page of the Shanghai Times today– and does such a newspaper even exist? What is written on the 70% of web pages that are not in English?

Continue reading How Many World Wide Webs Are There?

Are They Right?

I’ve been reading StopTheACLU.com, because I want to get into their heads, because I want to avoid the classic mistake of intellectual isolation, and because I want to be challenged. Sure, they’re weirdos, but that doesn’t mean they don’t make sense. But there’s at least one thing in the StopTheACLU worldview that I find very hard to method-act: in their universe, global warming is a myth.

Okay, but how did I end up on this side and not that side?

Continue reading Are They Right?