<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Jonathan Stray &#187; belief</title>
	<atom:link href="http://jonathanstray.com/tag/belief/feed" rel="self" type="application/rss+xml" />
	<link>http://jonathanstray.com</link>
	<description>Information, Culture, and Belief</description>
	<lastBuildDate>Fri, 27 Jan 2012 18:21:35 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>Visualizing communities</title>
		<link>http://jonathanstray.com/visualizing-communities</link>
		<comments>http://jonathanstray.com/visualizing-communities#comments</comments>
		<pubDate>Mon, 01 Aug 2011 21:54:02 +0000</pubDate>
		<dc:creator>Jonathan Stray</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[belief]]></category>
		<category><![CDATA[community]]></category>
		<category><![CDATA[knowledge]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://jonathanstray.com/?p=2964</guid>
		<description><![CDATA[There are in fact no masses; there are only ways of seeing people as masses. –Raymond Williams Who are the masses that the &#8220;mass media&#8221; speaks to? What can it mean to ask what &#8220;teachers&#8221; or &#8220;blacks&#8221; or &#8220;the people&#8221; of a country think? These words are all fiction, a shorthand which covers over our [...]]]></description>
			<content:encoded><![CDATA[<blockquote><p>There are in fact no masses; there are only ways of seeing people as masses.<br />
<em>–<a href="http://books.google.com/books?id=pkWbhOwxt30C&amp;lpg=PA46&amp;dq=%22there%20are%20in%20fact%20no%20masses%22&amp;pg=PA46#v=onepage&amp;q=%22there%20are%20in%20fact%20no%20masses%22&amp;f=false">Raymond Williams</a></em></p></blockquote>
<p>Who are the masses that the &#8220;mass media&#8221; speaks to? What can it mean to ask what &#8220;teachers&#8221; or &#8220;blacks&#8221; or &#8220;the people&#8221; of a country think? These words are all fiction, a shorthand which covers over our inability to understand large groups of unique individuals. Real people don&#8217;t move in homogeneous herds, nor can any one person be neatly assigned to a single category. Someone might view themselves simultaneously as the inhabitant of a town, a new parent, and an active amateur astronomer. Now multiply this by a million, and imagine trying to describe the overlapping patchwork of beliefs and allegiances.</p>
<p>But patterns of association leave digital traces. Blogs link to each other, we have &#8220;friends&#8221; and &#8220;followers&#8221; and &#8220;circles,&#8221; we share interesting tidbits on social networks, we write emails, and we read or buy things. We can visualize this data, and each type of visualization gives us a different answer to the question &#8220;what is a community?&#8221; This is different from the other ways we know how to describe groups. Anecdotes are tiny slices of life that may or may not be representative of the whole, while statistics are often so general as to obscure important distinctions. Visualizations are unique in being both universal and granular: they have detail at all levels, from the broadest patterns right down to individuals. Large scale visualizations of the commonalities between people are, potentially, a new way to represent and understand the public &#8212; that is, ourselves.</p>
<p>I&#8217;m going to go through the major types of community visualizations that I&#8217;ve seen, and then talk about what I&#8217;d like to do with them. Like most powerful technologies, large scale visualization is a capability that can also be used to oppress and to sell. But I imagine social ends, worthwhile ways of using visualization to understand the &#8220;public&#8221; not as we imagine it, but as something closer to how we really exist.</p>
<p><span id="more-2964"></span></p>
<p><strong>Social networks</strong><br />
Social networking services seem like an obvious place to go looking for communities, and I&#8217;m sure everyone has seen a social network visualization by now; they&#8217;re great eye candy. There are a lot of problems with social network visualizations &#8212; for example, what does it really mean to say that two people are &#8220;connected&#8221;? But let&#8217;s dive right in and see what we can see.</p>
<p><a href="http://jonathanstray.com/wp-content/uploads/2011/07/fb-social-graph.png"><img class="size-full wp-image-3108 aligncenter" title="fb-social-graph" src="http://jonathanstray.com/wp-content/uploads/2011/07/fb-social-graph.png" alt="" width="478" height="465" /></a></p>
<p>Here&#8217;s a visualization of the connections between my Facebook friends, which I created with the &#8220;<a href="http://www.facebook.com/apps/application.php?id=67692068407">social graph</a>&#8221; Facebook application. Every person I am friends with is included in this visualization. The layout algorithm tries to put people with lots of mutual friends close together; otherwise, the positions are random. Nothing can be learned from the fact that &#8220;Amy&#8221; is to the left or the right of &#8220;Ramone,&#8221; but clusters of people are reliable structures.</p>
<p>On this diagram I can see the following clusters: San Francisco personal friends, Hong Kong personal friends, Toronto personal friends, University of Hong Kong classmates, SF circus people, HK circus people, former Adobe colleagues, and a few others. The independent nodes floating around are mostly people I met traveling but never got to know too well, while clusters form when lots of people know their friends&#8217; friends. Clusters are so fundamental to this type of analysis that this Facebook app has tried to identify them by overlaying colored circles. I can see a lot more here than the algorithm can, which is a warning about the limitations of blind, acontextual analysis. Nonetheless, several major aspects of my life and personal history are immediately apparent. When you think about it, this is pretty amazing.</p>
<p>But this is a tiny little world. Rather than centering the visualization on a single person, you can make up some other sort of rule that determines which nodes are included. Here&#8217;s part of a lovely <a href="http://www.visualizing.org/full-screen/29391">visualization of the visualization community</a> on Twitter</p>
<p><a href="http://www.visualizing.org/full-screen/29391"><img class="size-full wp-image-3110 aligncenter" title="Visualization of the visualization community" src="http://jonathanstray.com/wp-content/uploads/2011/07/Picture-1.png" alt="" width="492" height="285" /></a></p>
<p>Creator Moriz Stefaner chose who appears on this graph with a simple algorithm: starting with a small list of names who he considered central to the visualization community, he included every person who followed or was followed by at least five of those people to produce a larger set. Within this limited network, the size of each node represents the number of followers. Which shows, again, the importance of context. <a href="http://www.youtube.com/watch?v=jbkSRLYSojo">Hans Rosling</a> may not be a big fish in the larger Twitter universe &#8212; he&#8217;s no <a href="http://mashable.com/2009/04/16/ashton-twitter-million/">Ashton Kutcher</a> &#8212; but he&#8217;s a superstar in the visualization community.</p>
<p>But is there really one &#8220;visualization community&#8221;? I&#8217;m involved in visualization and know many of the folks on this map, and it looks like a pretty good map to me, but it seems to skew heavily toward the design, art, and infographics world. That&#8217;s probably because of the seed accounts chosen, and this chart misses a number of folks coming at visualization from the open government, journalism, scientific, and academic points of view. It also certainly excludes many prominent visualizers who don&#8217;t use Twitter. This is a universal problem: a visualization must either include or exclude each node; it&#8217;s a binary, black-and-white sort of decision process about a fixed set of nodes drawn from available data, but reality isn&#8217;t like that. Real communities are porous and overlapping and span multiple communication networks.</p>
<p><strong>Co-consumption</strong><br />
We can also map &#8220;communities&#8221; by what they read or view or buy. This was first done by large online merchants, such as Amazon. Their famous &#8220;customers who bought this also bought that&#8221; feature, and indeed all automated recommendation engines, can be viewed as cluster detection algorithms. In this case, people are clustered by the books they bought or the movies they watched. Your personal recommendations are nothing more than the patterns of the cluster you fall into. Google News&#8217; personalization system represents these clusters explicitly in its <a href="http://www.ra.ethz.ch/CDstore/www2007/www2007.org/papers/paper570.pdf">core algorithm</a>.</p>
<p>To make this a little more concrete, here&#8217;s an <a href="http://www.orgnet.com/divided.html">analysis of US political booksales</a> on Amazon during the 2008 presidential election, as plotted by Orgnet. Rather than representing people, each node is now a book, and the arrows represent Amazon&#8217;s &#8220;customers who bought A also bought B&#8221; recommendations.  The striking finding is that people bought red books or blue books, but not both.<br />
<a href="http://www.orgnet.com/divided.html"><img class="aligncenter" title="PoliticalBooksales" src="http://jonathanstray.com/wp-content/uploads/2011/04/PoliticalBooksales.gif" alt="" width="600" height="400" /></a><br />
This amazing visualization is political polarization made manifest. There is little overlap in the networks of political books, so the &#8220;left&#8221; and the &#8220;right&#8221; emerge as features of reality in this context, which is fascinating. But a word of caution: this chart actually shows three clusters, two of which are assigned to the &#8220;left.&#8221; What do we make of that? Are there actually three &#8220;sides&#8221; here? Also, the visualization includes only books deemed &#8220;political&#8221; from the outset. This looks at the world through a very narrow lens because it ignores all other books &#8212; and therefore the rest of the network structure around the books shown here, which is presumably dense and interesting. But  how do we decide what is &#8220;political?&#8221; And what about every other way we could examine the relationships between books and people? Is this kind of polarization apparent and important in broader contexts? We need to be very wary of projecting our preconceptions onto the interpretation of a visualization.</p>
<p>Also note that this map doesn&#8217;t depend at all on the &#8220;content&#8221; of books or blogs or articles &#8212; there&#8217;s no text processing or semantic analysis here. Amazon infers similarity in an entirely social fashion, based on how groups of people show similar buying behavior. iTunes&#8217; Genius playlists and <a href="http://en.wikipedia.org/wiki/Netflix_Prize">Netflix&#8217;s movie recommendations</a> work the same way &#8212; but we can&#8217;t see the structures of any of these data sets, because they aren&#8217;t visualized.</p>
<p><strong>Communication networks</strong><br />
There&#8217;s often a difference between what people say and what they do. Looking at social network connections is a little like asking someone who their friends are &#8212; relevant, but subject to little white lies, perceptual biases, the limitations of memory, and complicated personal judgements. Better, perhaps, to look at the data streams generated by online activity. For example, email.</p>
<p>Email network analysis seems to have come to popularity with the Enron <a href="http://www.cs.cmu.edu/~enron/">emails</a> released in 2003. The simplest way to visualize a huge pile of emails is to plot each email address as a node and draw edges when one person emailed another. Here&#8217;s such an image from Jeffery Heer&#8217;s <a href="http://hci.stanford.edu/jheer/projects/enron/v1/">Exploring Enron</a> project:</p>
<p><a href="http://jonathanstray.com/wp-content/uploads/2011/07/enron_title.png"><img class="size-full wp-image-3117 aligncenter" title="enron_title" src="http://jonathanstray.com/wp-content/uploads/2011/07/enron_title.png" alt="" width="478" height="417" /></a></p>
<p>There&#8217;s more going on this picture, such as some color coding via the <a href="http://aps.arxiv.org/abs/cond-mat/0309508/">modularity algorithm</a>, which claims to be about &#8220;detecting community structure&#8221; but is actually about detecting clusters. But no matter how you visualize it, there&#8217;s something interesting here. Analysis of email networks within organization can reveal organizational structure that varies significantly from formal hierarchies, and there&#8217;s at least one <a href="http://books.google.com/books/about/The_hidden_power_of_social_networks.html?id=vQ3mM4Vpix8C">book</a> which claims that this informal structure is how things actually get done.</p>
<p>The main email analysis techniques are all based around counting the number of emails exchanged by each pair of people. This is a powerful idea, even if it&#8217;s not necessarily a very clear one. We don&#8217;t really know how to interpret facts such as &#8220;Joanna emails Hugo more than anyone else.&#8221; Are they colleagues, or lovers, or does Hugo work in tech support? But again, in almost every visualization of this type we get clusters, more or less tight groups of people who talk or act more with each other than they do with others. There is at least one <a href="http://faculty.chicagobooth.edu/workshops/orgs-markets/archive/pdf/Aven.pdf">research technique</a> which attempts to detect conspiracies based partially on this type of network structure. There has also been some interesting <a href="http://repository.cmu.edu/cgi/viewcontent.cgi?article=1015&amp;context=isr&amp;sei-redir=1#search=%22Communication%20Networks%20from%20Enron%20Email%20Corpus%20%C3%A2%C2%80%C2%9CIts%20Always%20About%20People.%20Enron%20no%20Different%C3%A2%C2%80%C2%9D%22">analysis of the dynamic structure </a>of the network &#8212; how people&#8217;s communication patterns changed as the crisis deepened. I like that, because time is so often overlooked in network analysis. Ideally, every network visualization would include a time slider that allows the user to scrub back and forth to see how things evolved.</p>
<p><strong>Web structure</strong><br />
What if we take &#8220;website&#8221; instead of &#8220;person&#8221; as the atomic component of a community? The first maps of the web were made in the late 1990s by spidering the links between pages. My favorite modern example is the 2008 <a href="John Kelly and Bruce Etling">map of the Persian-language blogosphere</a> by John Kelly and Bruce Etling of the Berkman Center. Every node is a blog. The size represents the number of other blogs that link to it. The color shows the subject of the blog, as categorized by a Persian-speaking researcher. Again, the visualization algorithms places blogs that frequently link to each closer together. And like people, blogs tend to form clusters.</p>
<p><a href="http://cyber.law.harvard.edu/publications/2008/Mapping_Irans_Online_Public/Iranian_blogosphere_map"><img class="aligncenter" title="Iran_blogosphere_map" src="http://jonathanstray.com/wp-content/uploads/2011/04/Iran_blogosphere_map.jpg" alt="" width="525" height="480" /></a></p>
<p>In this map, humans chose the color for each dot &#8212; each blog &#8212; by manually reading the blog and coding the topic. The researchers didn&#8217;t know that the blogs on similar topics would be in the same cluster when they were reading them, and the computer didn&#8217;t know the assigned topics when clustering them. In other words, there is an amazing discovery here: an algorithm that can tell that two blogs have a different perspective &#8212; say, secular vs. religious politics, or perhaps poetry instead &#8212; just by looking at where these two blogs sit in the web of links. Link structure is here a proxy for worldview. It may also be a proxy for information flow, which must be closely related.</p>
<p>It would also be possible to visualize the web in terms of language. I imagine that this would reveal a geography of continent-clusters separated not by oceans but by language, so that Spain and Mexico would be neighbors, somewhat apart from the United States. As far as I know, no one has done this yet. It might tell us something about <a href="http://jonathanstray.com/how-many-webs">how information flows between cultures</a>, or reveal useful <a href="http://globalvoicesonline.org/2005/07/07/seeking-bridge-bloggers/">bridge-bloggers</a>.</p>
<p><strong>Location-based community</strong><br />
By this I don&#8217;t mean where you live, though that&#8217;s part of it. Rather, I mean what can be inferred by analyzing people&#8217;s real-time location history. There are many sources for this information: check-in apps like FourSquare, tracking services like Google Latitude, geo-Tweets, or just the location recorded by mobile phone companies and individual phones. Suppose you had millions of these person-at-location-at-time data points. Could you segment users into different groups based on, say, the bars they hang out in? There&#8217;s money betting that the answer is yes, because Sense Networks is aiming to <a href="http://radar.oreilly.com/2008/06/citysense-reality-mining-iphone.html">do this commercially</a>. For more, see this <a href="http://www.freepatentsonline.com/20100079336.pdf">patent</a>. In 2008 they released the <a href="http://techcrunch.com/2008/06/09/location-tracking-startup-sense-networks-emerges-from-stealth-to-answer-the-question-where-is-everybody/">CitySense app</a> showing, collectively, where everyone is within a city:</p>
<p><img class="aligncenter" title="citysense" src="http://jonathanstray.com/wp-content/uploads/2011/07/citysense.jpg" alt="" width="382" height="292" /></p>
<p>But this little phone app is just a demonstration, a toy. The point of this work isn&#8217;t to say where people are, but how their patterns of movement relate over time. This is another type of clustering, of understanding who people are and how they are the same or different. I bet you could locate the members of, say, an underground party community by looking for a cluster of people who frequently gathered together in supposedly abandoned warehouses in industrial areas. Sense Networks CTO Tony Jebara has <a href="http://www.cs.columbia.edu/~jebara/research.html">written</a> about visualizing these path clusters directly, but I haven&#8217;t been able to find any examples.</p>
<p><strong>What&#8217;s a community?</strong><br />
I believe that I am part of many communities, and that each of these communities has enriched my life tremendously. But there&#8217;s no simple definition. Very often when someone says &#8220;community&#8221; what they mean is &#8220;geographic community,&#8221; the set of people who live in the same town or polity. I&#8217;d like to include this definition, because face-to-face contact is very important, and many of the most pressing collective action problems are local. But community must now mean something more. Consider Clay Shirky&#8217;s <a href="http://www.niemanlab.org/2010/04/clay-shirky-on-the-necessity-of-waste-the-power-of-institutions-and-the-safety-of-the-infinite-time-horizon/">anecdote</a> about the Boston Globe&#8217;s coverage of sexual abuse within the Catholic Church, in 1992 versus 2002:</p>
<blockquote><p>In April of 2002 &#8230; this Spotlight story was most largest, most global thing that ever came out of Boston.com, the Boston Globe website, and the circulation for that one story was larger than the nominal circulation of The Boston Globe. Because [when] the stories in the ’90s had come out, the audience of the story was Bostonians, whether Catholic or no. But in 2002, the audience was Catholics whether Bostonian or no.</p></blockquote>
<p>The only definition of &#8220;community&#8221; that makes any sense to me is &#8220;a group of people who think or act collectively.&#8221; This is the central theme of these visualizations. People don&#8217;t act truly independently, randomly spreading themselves out across geography and belief and behavior. Our lives are clustered along many disparate dimensions, which is just another way of saying that humans are social creatures. There must be as many different ways to visualize communities as there are types of human action. Each is an answer to &#8220;what is a community?&#8221; How these different answers relate, and how they relate to our intuitive, experiential understanding of face-to-face communities, I don&#8217;t think anyone really knows. Many people are trying to understand this right now, from <a href="http://mashable.com/2010/07/13/google-social-slide-deck/">industry</a> to <a href="http://storify.com/tcarmody/the-day-zeynep-tufecki-dropped-a-bundle-of-knowled">academia</a>, and no doubt intelligence and law enforcement.</p>
<p>Note that many of these types of visualizations can group people who are not in contact with one another. This is particularly true of co-consumption and co-location visualizations. Maybe everyone has read the same books, or they hang out at the same coffee shop but have never met. There is some similarity between people that the algorithm reveals, a pattern we can see, but these people don&#8217;t necessarily know that they&#8217;re similar. We might call this a &#8220;latent community,&#8221; a group of like-minded folks who might act together if they were to come into communication &#8212; and the internet is great at allowing people to self-organize and define a common identity.</p>
<p><strong>What do we do with this?</strong><br />
The list of people working on identifying communities through data is long. Finance, intelligence, law enforcement, politics, and especially marketing are already hard at work in these areas. Marketing is starting to <a href="http://mashable.com/2011/06/30/psychographics-marketing">turn away</a> from classic measures like age, location, and gender because they are not terribly good predictors of purchasing, and there are already social-network based <a href="http://klout.com">influence predictors</a>. Advertising based on personal data is powerful, but imagine advertising based on an analysis of where you fit into the broader fabric of society. I imagine scarily good predictions of what your friends will find cool. Or your colleagues, or your family.</p>
<p>But I&#8217;m more concerned with the public-interest applications, which I see all over the social sciences: in journalism, sociology, conflict resolution, representative governance, urban planning, epidemiology. This is especially true when the clusters in these visualizations are good proxies for belief or worldview, as they seemed to be in the Iran blogosphere map. Knowing who believes what seems a critical building block for collective action of all types.</p>
<p>Being a journalist I&#8217;ve thought about journalism most, and I&#8217;d like to use community visualization to target journalism to actual people. If I had a live map of the web broken down by interests in some way, there are all sorts of ways I could focus my reporting. I could look at the map to see where the people affected by the story congregate online, and find sources there. When I had something to publish, I would know where to post the link. And I could discover who I&#8217;m missing, who I&#8217;m not thinking of and not serving in my reporting, and challenge the categories by which I group people. I tend to think of journalism in terms of empowerment; it&#8217;s a service we perform for members of a community. Community used to mean &#8220;town&#8221; but the definition is and must be more complex now. I want to get closer the audience and farther from &#8220;mass media.&#8221; Media became mass during the broadcast era because of technology and economics, not because it was the right way to do journalism.</p>
<p>In the most general sense, I am concerned with community visualization because I am concerned with representation. That is why I want these maps of the masses to be available to all. It is vital to represent the public to itself, and mapping how people are already acting together, out there in the world, seems like a critical feature for anyone who wants to participate broadly in society. It is especially critical because we can expect that various interests will expend huge sums pursuing this mapping for their own ends; in these maps there is the ability to influence, and to divide or unite, and I don&#8217;t think we want that entirely in a few powerful hands. But there is also the ability to understand who we, collectively, are. It&#8217;s easy to toss around labels like &#8220;left&#8221; and &#8220;right&#8221; or &#8220;Hispanic&#8221; or &#8220;drug abuser&#8221; but who are these people, actually, and what other identities do they have? And who are we not thinking of at all?</p>
]]></content:encoded>
			<wfw:commentRss>http://jonathanstray.com/visualizing-communities/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>A computational journalism reading list</title>
		<link>http://jonathanstray.com/a-computational-journalism-reading-list</link>
		<comments>http://jonathanstray.com/a-computational-journalism-reading-list#comments</comments>
		<pubDate>Tue, 01 Feb 2011 02:29:28 +0000</pubDate>
		<dc:creator>Jonathan Stray</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[belief]]></category>
		<category><![CDATA[computational journalism]]></category>
		<category><![CDATA[journalism]]></category>
		<category><![CDATA[knowledge]]></category>
		<category><![CDATA[media]]></category>
		<category><![CDATA[minds]]></category>
		<category><![CDATA[misinformation]]></category>
		<category><![CDATA[politics]]></category>
		<category><![CDATA[social media]]></category>

		<guid isPermaLink="false">http://jonathanstray.com/?p=2596</guid>
		<description><![CDATA[[Last updated: 18 April 2011 -- added statistical NLP book link] There is something extraordinarily rich in the intersection of computer science and journalism. It feels like there&#8217;s a nascent field in the making, tied to the rise of the internet. The last few years have seen calls for a new class of  &#8220;programmer journalist&#8221; [...]]]></description>
			<content:encoded><![CDATA[<p><em>[Last updated: 18 April 2011 -- added statistical NLP book link]</em></p>
<p>There is something extraordinarily rich in the intersection of computer science and journalism. It feels like there&#8217;s a nascent field in the making, tied to the rise of the internet. The last few years have seen calls for a new class of  &#8220;<a href="http://www.niemanlab.org/2011/01/dave-winer-how-can-universities-educate-journo-programmers/">programmer journalist</a>&#8221; and the birth of a community of <a href="http://hackshackers.com/">hacks and hackers</a>. Meanwhile, several schools are now <a href="http://www.wired.com/epicenter/2010/04/will-columbia-trained-code-savvy-journalists-bridge-the-mediatech-divide/">offering joint degrees</a>. But we&#8217;ll need more than competent programmers in newsrooms. What are the key problems of computational journalism? What other fields can we draw upon for ideas and theory? For that matter, what is it?</p>
<p>I&#8217;d like to propose a working definition of computational journalism as the application of computer science to the problems of public information, knowledge, and belief, by practitioners who see their mission as outside of both commerce and government. This includes the journalistic mainstay of &#8220;reporting&#8221; &#8212; because information not published is information not known &#8212; but my definition is intentionally much broader than that. To succeed, this young discipline will need to draw heavily from social science, computer science, public communications, cognitive psychology and other fields, as well as the traditional values and practices of the journalism profession.</p>
<p>&#8220;Computational journalism&#8221; has no textbooks yet. In fact the term barely is barely recognized. The phrase seems to have emerged at Georgia Tech in 2006 or <a href="http://www.cc.gatech.edu/classes/AY2007/cs4803cj_spring/">2007</a>. Nonetheless I feel like there are already important topics and key references.</p>
<p><strong>Data journalism</strong><br />
Data journalism is obtaining, reporting on, curating and publishing data in the public interest. The practice is often more about spreadsheets than algorithms, so I&#8217;ll suggest that not all data journalism is &#8220;computational,&#8221; in the same way that a novel written on a word processor isn&#8217;t &#8220;computational.&#8221; But data journalism is interesting and important and dovetails with computational journalism in many ways.</p>
<ul>
<li>The Nieman Journalism Lab&#8217;s <a href="http://www.niemanlab.org/2010/08/how-the-guardian-is-pioneering-data-journalism-with-free-tools/">interview with Guardian Data Blog editor Simon Rogers</a> remains a solid introduction to (one kind of) contemporary practice.</li>
<li>The best practical guides I know are Rogers&#8217; &#8220;<a href="http://www.journalism.co.uk/skills/how-to-get-to-grips-with-data-journalism/s7/a542402/">How to: get to grips with data journalism</a>&#8221; and Dan Nguyen&#8217;s <a href="http://www.propublica.org/nerds/item/doc-dollars-guides-collecting-the-data">series of data-scraping tutorials at ProPublica</a>.</li>
<li>Stanford&#8217;s <a href="http://datajournalism.stanford.edu/">Journalism in the Age of Data</a> is an hour-long documentary on data journalism and visualization.</li>
<li>The web is a linked system of human-readable documents. Now Tim Berners-Lee wants to create a web of machine-readable <a href="http://blog.ted.com/2009/03/13/tim_berners_lee_web/">linked data</a>. The full potential is unclear, but it&#8217;s a big idea that may come to be the backbone of <a href="http://en.wikipedia.org/wiki/Semantic_Web">semantic web</a> visions. The <a href="http://data.nytimes.com/">New York Times</a>, <a href="http://www.guardian.co.uk/open-platform">The Guardian</a>, and others are experimenting with open data APIs.</li>
<li>Everyblock creator Adrian Holovaty seems to have been the first to suggest that reporters file structured data in his 2006 &#8220;<a href="http://www.holovaty.com/writing/fundamental-change/">A Fundamental Way Newspaper Websites Need to Change</a>.&#8221; This idea is beautifully expanded in Stijn Debrouwere&#8217;s &#8220;<a href="http://stdout.be/2010/information-architecture-for-news-websites/">Information Architecture for News Websites</a>&#8221; series.</li>
</ul>
<p><strong>Visualization</strong><br />
Big data requires powerful exploration and storytelling tools, and increasingly that means visualization. But there&#8217;s good visualization and bad visualization, and the field has advanced tremendously since Tufte wrote <a href="http://www.edwardtufte.com/tufte/books_vdqi">The Visual Display of Quantitative Information</a>. There is lots of good science that is too little known, and many open problems here.</p>
<ul>
<li>Tamara Munzner&#8217;s <a href="http://www.cs.ubc.ca/labs/imager/tr/2009/VisChapter/">chapter on visualization</a> is the essential primer. She puts visualization on rigorous perceptual footing, and discusses all the major categories of practice. Absolutely required reading for anyone who works with pictures of data.</li>
<li>Ben Fry invented the Processing language and wrote his <a href="http://benfry.com/phd/">PhD thesis on &#8220;computational information design</a>,&#8221; which is his powerful conception of the iterative, interactive practice of designing useful visualizations.</li>
<li>How do we make visualization statistically rigorous? How do we know we&#8217;re not just fooling ourselves when we see patterns in the pixels? This <a href="http://jonathanstray.com/papers/wickham.pdf">amazing paper by Wickham</a> et. al. has some answers.</li>
<li>Is a visualization a story? Segal and Heer explore this question in &#8220;<a href="http://vis.stanford.edu/files/2010-Narrative-InfoVis.pdf">Narrative Visualization: Telling Stories with Data</a>.&#8221;</li>
</ul>
<p><strong>Computational linguistics</strong><br />
Data is more than numbers. Given that the web is designed to be read by humans, it makes heavy use of human language. And then there are all the world&#8217;s books, and the archival recordings of millions of speeches and interviews. Computers are slowly getting better at dealing with language.</p>
<ul>
<li>Word frequency techniques like <a href="http://en.wikipedia.org/wiki/Tfidf">tf-idf</a> and the <a href="http://en.wikipedia.org/wiki/Vector_space_model">vector space document model</a> are very simple and very useful. See also <a href="http://en.wikipedia.org/wiki/Stemming">stemming</a>. Lots more in the wonderful (and free!) <em><a href="http://nlp.stanford.edu/IR-book/information-retrieval-book.html">Introduction to Information Retrieval</a></em>. This book explains how search engines are built, and  discusses tf-idf etc. in great technical detail.</li>
<li>Statistical language models are increasingly important for all kinds of applications. Michael Nielsen has a great <a href="http://michaelnielsen.org/blog/introduction-to-statistical-machine-translation/">introduction to statistical machine translation</a>. Google&#8217;s Peter Norvig discusses how he implemented <a href="http://norvig.com/spell-correct.html">statistical spelling correction</a> on his laptop during a long plane flight. For the full deal, see the book <em><a href="http://books.google.com/books?id=YiFDxbEX3SUC&amp;lpg=PP1&amp;dq=Foundations%20of%20statistical%20language%20processing%22&amp;pg=PP1#v=onepage&amp;q&amp;f=false">Foundations of Statistical Natural Language Processing</a></em>.</li>
<li>On a related note, <a href="http://ngrams.googlelabs.com/">Google N-gram viewer</a> lets you look at the frequency of short phrases within 4% of all books published, ever. The <a href="http://mfi.uchicago.edu/publications/papers/Science_Culturomics.pdf">excellent paper</a> gives examples of how to use this for cultural research. Dan Cohen has <a href="http://www.dancohen.org/2010/12/19/initial-thoughts-on-the-google-books-ngram-viewer-and-datasets/">important criticisms</a>.</li>
<li>Speech-to-text algorithms enable automated transcription, and Matt Thompson explores the <a href="http://www.niemanlab.org/2010/12/coming-soon-to-journalism-matt-thompson-sees-the-speakularity-and-universal-instant-transcription/">huge implications for journalism</a>.</li>
<li>Reuters maintains the <a href="http://www.opencalais.com/">OpenCalais</a> entity extraction service, which parses text to contextually determine who and what is referenced.</li>
<li>IBM&#8217;s Watson project built a question-answering system that reads reference books and wins at Jeopardy. Imagine how useful to journalists and curious readers this could be! This <a href="http://www.stanford.edu/class/cs124/AIMagzine-DeepQA.pdf">paper on the DeepQA system</a> describes how they did it.</li>
</ul>
<p><strong>Communications technology and free speech</strong><br />
<a href="http://harvardmagazine.com/2000/01/code-is-law.html">Code is law</a>. Because our communications systems use software, the underlying mathematics of communication lead to staggering political consequences &#8212; including whether or not it is possible for governments to verify online identity or remove things from the internet. The key topics here are networks, cryptography, and information theory.</p>
<ul>
<li>The <a href="http://www.cacr.math.uwaterloo.ca/hac/index.html">Handbook of Applied Cryptography</a> is a classic, and free online. But despite the title it doesn&#8217;t really explain how crypto is used in the real world, <a href="http://en.wikipedia.org/wiki/Cryptography">like Wikipedia does</a>.</li>
<li>It&#8217;s important to know how the internet routes information, using <a href="http://en.wikipedia.org/wiki/Transmission_Control_Protocol">TCP/IP</a> and <a href="http://en.wikipedia.org/wiki/Border_Gateway_Protocol">BGP</a>, or at a somewhat higher level, things like the <a href="http://www.ittc.ku.edu/~niehaus/classes/750-s06/documents/BT-description.pdf">BitTorrent protocol</a>. The technical details determine how hard it is to do things like block websites, suppress the dissemination of a file, or <a href="http://blog.torproject.org/blog/recent-events-egypt">remove entire countries from the internet</a>.</li>
<li>Anonymity is deeply important to online free speech, and very hard. The <a href="http://www.torproject.org/">Tor project</a> is the outstanding leader in anonymity-related research.</li>
<li>Information theory is stunningly useful across almost every technical discipline. Pierce&#8217;s <a href="http://www.amazon.com/Introduction-Information-Theory-Symbols-Signals/dp/0486240614/ref=pd_rhf_p_t_1">short textbook</a> is the classic introduction, while Tom Schneider&#8217;s <a href="http://www-lmmb.ncifcrf.gov/~toms/paper/primer/">Information Theory Primer</a> seems to be the best free online reference.</li>
</ul>
<p><strong>Tracking the spread of information (and misinformation)</strong><br />
What do we know about how information spreads through society? Very little. But one nice side effect of our increasingly digital public sphere is the ability to track such things, at least in principle.</p>
<ul>
<li><a href="http://memetracker.org/">Memetracker</a> was (AFAIK) the first credible demonstration of whole-web information tracking, following quoted soundbites through blogs and mainstream news sites and everything in between. Zach Seward has cogent <a href="http://www.niemanlab.org/2009/07/in-the-news-cycle-memes-spread-more-like-a-heartbeat-than-a-virus/">reflections on their findings</a>.</li>
<li>The <a href="http://truthy.indiana.edu/">Truthy Project</a> aims for automated detection of astro-turfing on Twitter. They specialize in covert political messaging, or as I like to call it, computational propaganda.</li>
<li>We badly need tools to help us determine the source of any given online &#8220;fact.&#8221; There are many existing techniques that could be applied to the problem, as I discussed in a <a href="http://jonathanstray.com/escaping-the-news-hall-of-mirrors">previous post</a>.</li>
<li>If we had information provenance tools that worked across a spectrum of media outlets and feed types (web, social media, etc.) it would be much cheaper to do the sort of <a href="http://www.journalism.org/analysis_report/how_news_happens">information ecosystem studies</a> that Pew and others occasionally undertake. This would lead to a much better understanding of <a href="http://www.niemanlab.org/2010/02/the-googlechina-hacking-case-how-many-news-outlets-do-the-original-reporting-on-a-big-story/">who does original reporting</a>.</li>
</ul>
<p><strong>Filtering and recommendation</strong><br />
With <a href="http://techcrunch.com/2010/08/04/schmidt-data/">vastly more information than ever before</a> available to us, attention becomes the scarcest resource. Algorithms are an essential tool in filtering the flood of information that reaches each person. (Social media networks also <a href="http://jonathanstray.com/whats-the-point-of-social-news">act as filters</a>.)</p>
<ul>
<li>The paper on <a href="http://crpit.com/confpapers/CRPITV70Truyen.pdf">preference networks</a> by Turyen et. al. is probably as good an introduction as anything to the state of the art in recommendation engines, those algorithms that tell you what articles you might like to read or what <a href="http://en.wikipedia.org/wiki/Netflix_Prize">movies you might like to watch</a>.</li>
<li>Before Google News there was Columbia News Blaster, which incorporated a number of interesting algorithms such as multi-lingual article clustering, automatic summarization, and more as described in <a href="http://www.cs.columbia.edu/~sable/research/hlt-blaster.pdf">this paper</a> by McKeown et. al.</li>
<li>Anyone playing with clustering algorithms needs to have a deep appreciation of the <a href="http://en.wikipedia.org/wiki/Ugly_duckling_theorem">ugly duckling theorem</a>, which says that there is no categorization without preconceptions. King and Grimmer explore this with their technique for <a href="http://gking.harvard.edu/files/abs/discov-abs.shtml">visualizing the space of clusterings</a>.</li>
<li>Any digital journalism product which involves the audience to any degree &#8212; that should be all digital journalism products &#8212; is a piece of social software, well defined by Clay Shirky in his classic essay, &#8220;<a href="http://www.shirky.com/writings/group_enemy.html">A Group Is Its Own Worst Enemy</a>.&#8221; It&#8217;s also a &#8220;<a href="http://cdixon.org/2010/01/17/collective-knowledge-systems/">collective knowledge system</a>&#8221; as articulated by Chris Dixon.</li>
</ul>
<p><strong>Measuring public knowledge</strong><br />
If journalism is about &#8220;informing the public&#8221; then we must consider what happens to stories after publication &#8212; this is the <a href="http://jonathanstray.com/does-journalism-work">&#8220;last mile&#8221; problem in journalism</a>. There is almost none of this happening in professional journalism today, aside from basic traffic analytics. The key question here is, how does journalism change ideas and action? Can we apply computers to help answer this question empirically?</p>
<ul>
<li>World Public Opinion&#8217;s recent <a href="http://www.worldpublicopinion.org/pipa/articles/brunitedstatescanadara/671.php?nid=&amp;id=&amp;pnt=671&amp;lb=">survey of misinformation among American voters</a> solves this problem in the classic way, by doing a randomly sampled opinion poll. I discuss their bleak results <a href="http://jonathanstray.com/american-journalism-failed-to-inform-voters">here</a>.</li>
<li>Blogosphere maps and other kinds of visualizations can help us understand the public information ecosystem, such as this <a href="http://cyber.law.harvard.edu/publications/2008/Mapping_Irans_Online_Public/interactive_blogosphere_map">interactive visualization of Iranian blogs</a>. I have previously suggested using such maps as a navigation tool that might <a href="http://jonathanstray.com/mapping-the-daily-me">broaden our information horizons</a>.</li>
<li> <a href="http://www.unglobalpulse.org/">UN Global Pulse</a> is a serious attempt to create a real-time global monitoring system to detect humanitarian threats in crisis situations. They plan to do this by mining the &#8220;data exhaust&#8221; of entire societies &#8212; social media postings, online records, news reports, and whatever else they can get their hands on. Sounds like <a href="http://www.unglobalpulse.org/blog/real-time-information-everyone-journalists-perspective-un-global-pulse">key technology for journalism</a>.</li>
<li><a href="http://sm.rutgers.edu/vox/event/">Vox Civitas</a> is an ambitious social media mining tool designed for journalists. Computational linguistics, visualization, and more.</li>
</ul>
<p><strong>Research agenda</strong><br />
I know of only one work which proposes a research agenda for computational journalism.</p>
<ul>
<li>&#8220;<a href="http://www.eecs.umich.edu/~congy/work/cidr11.pdf">Computational Journalism: A Call to Arms for Database Researchers</a>&#8221; by Sarah Cohen et. al. raises the very intriguing possibility of building systems that automatically or semi-automatically scan databases for stories, document the rationale for believing certain facts, etc.</li>
</ul>
<p>This paper presents a broad vision and is really a must-read. However, it deals almost exclusively with reporting, that is, finding new knowledge and making it public. I&#8217;d like to suggest that the following unsolved problems are also important:</p>
<ul>
<li>Tracing the source of any particular &#8220;fact&#8221; found online, and generally tracking the spread and mutation of information.</li>
<li>Cheap metrics for the state of the public information ecosystem. How accurate is the web? How accurate is a particular source?</li>
<li>Techniques for mapping public knowledge. What is it that people actually know and believe? How polarized is a population? What is under-reported? What is well reported but poorly appreciated?</li>
<li>Information routing and timing: how can we route each story to the set of people who might be most concerned about it, or best in a position to act, at the moment when it will be most relevant to them?</li>
</ul>
<p>This sort of attention to the health of the public information ecosystem as a whole, beyond just the traditional surfacing of new stories, seems essential to the project of <a href="http://jonathanstray.com/does-journalism-work">making journalism work</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://jonathanstray.com/a-computational-journalism-reading-list/feed</wfw:commentRss>
		<slash:comments>44</slash:comments>
		</item>
		<item>
		<title>By the numbers, American journalism failed to inform voters</title>
		<link>http://jonathanstray.com/american-journalism-failed-to-inform-voters</link>
		<comments>http://jonathanstray.com/american-journalism-failed-to-inform-voters#comments</comments>
		<pubDate>Thu, 30 Dec 2010 00:40:01 +0000</pubDate>
		<dc:creator>Jonathan Stray</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[belief]]></category>
		<category><![CDATA[fact checking]]></category>
		<category><![CDATA[journalism]]></category>
		<category><![CDATA[knowledge]]></category>
		<category><![CDATA[polarization]]></category>
		<category><![CDATA[politics]]></category>

		<guid isPermaLink="false">http://jonathanstray.com/?p=2439</guid>
		<description><![CDATA[A recent study by World Public Opinion.org shows that the majority of the American population believed false things about basic national issues, right before the 2010 mid-term elections. I don&#8217;t know how to interpret this as anything other than a catastrophic failure of American journalism, in its most fundamental, clichéd, &#8220;inform the public&#8221; role. The [...]]]></description>
			<content:encoded><![CDATA[<p>A recent <a href="http://www.worldpublicopinion.org/pipa/articles/brunitedstatescanadara/671.php?nid=&amp;id=&amp;pnt=671&amp;lb=">study</a> by World Public Opinion.org shows that the majority of the American population believed false things about basic national issues, right before the 2010 mid-term elections. I don&#8217;t know how to interpret this as anything other than a catastrophic failure of American journalism, in its most fundamental, clichéd, &#8220;inform the public&#8221; role.</p>
<p>The most damning section of the report (<a href="http://www.worldpublicopinion.org/pipa/pdf/dec10/Misinformation_Dec10_rpt.pdf">PDF</a>) is titled &#8220;Evidence of Misinformation Among Voters.&#8221;</p>
<blockquote><p>The poll found strong evidence that voters were substantially misinformed on many of the issues prominent in the election campaign, including the stimulus legislation, the healthcare reform law, TARP, the state of the economy, climate change, campaign contributions by the US Chamber of Commerce and President Obama’s birthplace.  In particular, voters had perceptions about the expert opinion of economists and other scientists that were quite different from actual expert opinion.</p></blockquote>
<p>This study also found that Fox viewers were significantly more misinformed than average on many issues, which is mostly how this survey was covered in the blogosphere and mainstream news outlets. I think this Fox thing is a terrible diversion from the core problem: the American press did not succeed in informing the public. Not even right before an election, not even on the narrow set of issues that, by survey, voters cared to base their votes on.</p>
<p>The travesty here is that the relevant facts were instantly available from primary sources, such as the Congressional Budget Office and the Intergovernmental Panel on Climate Change. I interpret this failure in the following way: for many kinds of issues, the web makes it easy to find true information. But it doesn&#8217;t solve the problem of making people go look. That, perhaps, is a key role for modern journalism. Unfortunately, modern American journalism seems to be very bad at it. I imagine the same problem exists in the journalism of many other countries.</p>
<p><strong>What the study actually says</strong><br />
The study compares what voters think experts believe with what those experts actually believe. This is a bit tricky, and the study isn&#8217;t saying that the experts are necessarily right, but we&#8217;ll get to that. First, some example findings:</p>
<ul>
<li>68% of voters thought that &#8220;most economists&#8221; believe that the stimulus package &#8220;saved or created a few jobs&#8221; and 20% thought most economists believe that the stimulus caused job losses, whereas only 8% correctly said that most economists think it &#8220;saved or created several million jobs.&#8221; (The Congressional Budget Office estimates that <a href="http://cboblog.cbo.gov/?p=1617">the stimulus saved several millions jobs,</a> as do 75% of <a href="http://online.wsj.com/article/SB10001424052748703625304575115674057260664.html">economists interviewed by the Wall Street Journal</a>.)</li>
<li>53% of voters thought that economists believe that Obama&#8217;s health care reform plan will increase the deficit, while 29% said that economists were evenly divided on this issue. Only 13% said correctly that a majority of economists think that health care reform will not increase the deficit. (The Congressional Budget Office <a href="http://cboblog.cbo.gov/?p=546">estimates</a> a net reduction in deficits of $143 billion over 2010-2019, and Boards of Trustees of the Medicare Fund also <a href="http://www.washingtontimes.com/news/2010/aug/5/social-security-red-first-time-ever/">believe</a> that the Affordable Care act will &#8220;postpone the exhaustion of &#8230; trust fund assets.&#8221;)</li>
<li>12% of voters thought that &#8220;most scientists believe&#8221; that climate change is not occurring, while 33% thought scientists were evenly divided on the issue. That&#8217;s 45% with an incorrect perception, as opposed to the 54% who said, correctly, that most scientists think climate change is occurring. (Aside from the <a href="http://en.wikipedia.org/wiki/IPCC_Fourth_Assessment_Report">IPCC reports</a> and virtually every governmental study of the issue worldwide, an April 2010 <a href="http://www.pnas.org/content/107/27/12107.full">survey of climate scientists</a> showed that 97% believe that human-caused climate change is occurring.)</li>
</ul>
<p>A fussy but necessary digression: all of this rests on the reliability of the WorldPublicOpinion.org survey results. The survey was conducted by Knowledge Networks, Inc. using an online response panel randomly selected from the US population. Those without internet access were apparently provided it for free. I have been unable to find any serious independent evaluation of Knowledge Networks&#8217; methodology, but their <a href="http://www.knowledgenetworks.com/ganp/reviewer-info.html">many research papers</a> on sample design certainly talk the talk. All of the basic sampling errors, such as self-selection and language bias (what about Hispanics?) are at least addressed on paper. The margin of error is reported as 3.9%.</p>
<p>So let&#8217;s take these survey results as accurate, for the moment. This means that the majority of the American public had an incorrect conception of expert opinion on the issues that they voted on. That&#8217;s a mouthful. It&#8217;s not the same as &#8220;believed false things,&#8221; and in fact asking &#8220;what do you think experts believe&#8221; deliberately dodges the tricky question of what is true. If there is some misperception of expert belief, then in the strictest terms the public is misinformed. The study addresses this point as follows:</p>
<blockquote><p>In most cases we inquired about respondents’ views of expert opinion, as well as the respondents’  own views. While one may argue that a respondent who had a belief that is at odds with expert opinion is misinformed, in designing this study we took the position that some respondents may have had correct information about prevailing expert opinion but nonetheless came to a contrary conclusion, and thus should not be regarded as ‘misinformed.’</p></blockquote>
<p>So this study does not say &#8220;the American public are wrong about the economy and climate change.&#8221; It says that they haven&#8217;t really looked into it. I&#8217;m all for questioning authority&#8217;s claim to truth &#8211; anyone who follows my work knows that I&#8217;m generally a fan of Wikipedia, for example &#8212; but I believe we must take lifelong study and rigorous methodology seriously. To put it another way: voting contrary to the opinions of economists may be a fine thing, but voting without any awareness of their work is just silly. Yet that seems to be exactly what happened in the last election.</p>
<p><strong>The role of the press, then and now</strong><br />
Of course, voting is hard and stuff is complex, which is why we rely on the media to break it all down for us. The sad part is that economics and climate change are familiar ground for journalists. It&#8217;s not like the facts of these issues were not published in mainstream news outlets. For that matter, journalists were not even necessary here. Any citizen with a web browser could have found out exactly what the Affordable Care Act was predicted to do to the deficit. The Congressional Budget Office published their report and then <a href="http://cboblog.cbo.gov/?p=546">blogged about it</a> in plain language.</p>
<p>Maybe publishing the truth was never enough. Maybe journalism never actually &#8220;informed the public,&#8221; but merely created conditions where the curious could get themselves informed by diligently reading the news. But on big issues like whether a piece of national legislation will affect the deficit, we no longer need professionals to enable this kind of self-motivated discovery. The <a href="http://scripting.com/stories/2009/03/19/theRebootOfJournalism.html">sources go direct</a> in such cases, as the Congressional Budget Office did. And do we really expect that the social media sphere &#8212; that&#8217;s all of us &#8212; will remain silent about the next big global warming study? We&#8217;re all going to use Facebook etc. to share links to the next IPCC report when it comes out.</p>
<p>If the problem of having access to true information about these sorts of &#8220;votable issues&#8221; is solved by the web, what isn&#8217;t solved by the web is getting every voter to go look at least once. <em>That</em> might be a job for informed professionals at the helm of big media channels. This is a big responsibility for a news organization to try to take, but I don&#8217;t see how it&#8217;s anything but the corollary to the responsibility to only publish true information. Presumably some of that information is important enough to know, so consumers would probably appreciate the idea that your mission is to ensure they are informed.</p>
<p>I suspect that paper-based habits are holding journalism back here. There is a deeply ingrained newsroom emphasis on reporting only what&#8217;s &#8220;new.&#8221; A budget report only gets to be news once, even if what it says is relevant for years. But there are no &#8220;editions&#8221; online; the same headline can float on the hot topics list for as long as it&#8217;s relevant. There is even more reason to keep directing attention to an issue if people are actively discussing it, if it is greatly polarized, or if there&#8217;s a lot of spin around it (see: the <a href="http://www.ajr.org/Article.asp?id=4980">rise of fact-check journalism</a>). In any case, journalists have long been good at keeping an issue in the news, by advancing the story daily in one way or another. But first they have to know what the public doesn&#8217;t know.</p>
<p>So the burning question that the World Public Opinion study leaves me with is just this: why wasn&#8217;t it a news organization that commissioned this survey?</p>
<p><em>See also: </em><em><a href="jonathanstray.com/does-journalism-work">Does journalism work?</a></em></p>
]]></content:encoded>
			<wfw:commentRss>http://jonathanstray.com/american-journalism-failed-to-inform-voters/feed</wfw:commentRss>
		<slash:comments>18</slash:comments>
		</item>
		<item>
		<title>Does journalism work?</title>
		<link>http://jonathanstray.com/does-journalism-work</link>
		<comments>http://jonathanstray.com/does-journalism-work#comments</comments>
		<pubDate>Tue, 14 Dec 2010 16:08:26 +0000</pubDate>
		<dc:creator>Jonathan Stray</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[belief]]></category>
		<category><![CDATA[information]]></category>
		<category><![CDATA[journalism]]></category>

		<guid isPermaLink="false">http://jonathanstray.com/?p=2305</guid>
		<description><![CDATA[How do we know that the work that journalists do accomplishes anything at all? And what does journalism do, exactly, beyond vague statements like &#8220;supports democracy&#8221; and trivial ones like &#8220;gives me movie reviews&#8221;? I made this image a couple months ago to introduce the question at a conference. A reporter researches and writes a [...]]]></description>
			<content:encoded><![CDATA[<p>How do we know that the work that journalists do accomplishes anything at all? And what does journalism do, exactly, beyond vague statements like &#8220;supports democracy&#8221; and trivial ones like &#8220;gives me movie reviews&#8221;?</p>
<p style="text-align: center;"><a href="http://jonathanstray.com/wp-content/uploads/2010/12/Picture-2.png"><img class="aligncenter" title="Last Mile" src="http://jonathanstray.com/wp-content/uploads/2010/12/Picture-2-1024x306.png" alt="" width="614" height="184" /></a></p>
<p>I made this image a couple months ago to introduce the question at a conference. A reporter researches and writes a story. The first arrow represents the process that gets that story published. We understand that process quite well, and the internet makes publishing really cheap and easy. Then there&#8217;s a process that takes published, accurate information and turns it into truth and justice for all. That&#8217;s the part that&#8217;s fuzzy. In fact I don&#8217;t think we understand it at all. I call this &#8220;the <a href="http://en.wikipedia.org/wiki/Last_mile_problem">last mile</a> problem&#8221; in journalism &#8212; how does journalism actually reach people?</p>
<p>Journalists occasionally claim a scalp, such as by embarrassing a politician enough to force them to resign, or focussing attention on some issue long enough to get legislation passed. Journalism also theoretically informs citizens so they can vote responsibly, in the elections which happen every few years. As I&#8217;ve <a href="http://jonathanstray.com/designing-journalism-to-be-used">argued before</a>, these are weak levers by which to shift society. I&#8217;m less interested in what journalism does in extraordinary times, and more interested in how the journalist&#8217;s work improves the day-to-day operation of a society, and the experiences of the people living in it.</p>
<p>It&#8217;s possible that much of the journalism we have is effective. Maybe the mere existence of consistent reporting on the machinations of the powerful keeps them in line, and we&#8217;ll only know what journalism really gave us when it disappears and civilization collapses into a mire of secrecy and corruption. Or maybe that&#8217;s already happened. How would we know? How can we tell whether journalism, as a local or a global endeavor, is doing better this year than last?</p>
<p><strong>Other fields have goals</strong><br />
I like to hang around the international development community, and those people have real problems. People working in public health are charged with improving access to clean water or preventing the spread of HIV. Others try to get more girls into school, or to raise entire communities out of poverty.</p>
<p>There are lots of ways to attack such complex social problems. An NGO or a foundation or a UN organ could lobby local politicians, produce research reports, provide services directly to affected populations, or launch a public awareness campaign. The way in which an organization proposes to have an effect is called their &#8220;theory of change.&#8221; This is a term I hear frequently at gatherings of development workers, and from the staff of NGOs and international organizations. Such organizations must continually develop and articulate their theory of change in order to secure philanthropic funding.</p>
<p>Journalism has no theory of change &#8212; at least not at the level of practice.</p>
<p>I&#8217;ve taken to asking editors, &#8220;what do you want your work to change in society?&#8221; The answer is generally along the lines of, &#8220;we aren&#8217;t here to change things. We are only here to publish information.&#8221; I don&#8217;t think that&#8217;s an acceptable answer. Journalism without effect does not deserve the special place in democracy that it tries to claim.</p>
<p>The question of &#8220;what change should journalism produce&#8221; is hard because it is unavoidably a <a href="http://en.wikipedia.org/wiki/Norm_(philosophy)">normative</a> question, a question about how journalists envision a &#8220;better&#8221; world. At the moment, the field of professional journalism is mired in <a href="http://www.buzzmachine.com/2007/09/14/objectivityimpartiality-cowardice-boredom-obsolescence/">intense confusion</a> about its role and the meaning of classic standards such as &#8220;objectivity.&#8221; This has obscured discussion of the field&#8217;s goals at a moment of <a href="http://archive.pressthink.org/2008/06/26/pdf.html">great transition</a> brought on by new communications technology, precisely the time when clarity is most needed.</p>
<p>It&#8217;s telling that discussions of journalism&#8217;s fundamentals frequently harken back to the great debate of <a href="http://en.wikipedia.org/wiki/Journalism#Role_of_journalism">Lippman vs. Dewey</a>. That happened in the 1920s. This was not only before live television and before the internet, it was before bastions of modern reasoning such as <a href="http://en.wikipedia.org/wiki/Statistical_inference">statistical inference</a>, the study of <a href="http://en.wikipedia.org/wiki/Cognitive_bias">cognitive biases</a>, and the <a href="http://en.wikipedia.org/wiki/Social_constructionism">social construction of knowledge</a> were fully developed. Other fields have done much better in adapting to the philosophical and technological revolutions of the last century.</p>
<p>Medicine in general and public health in particular have become relentlessly <a href="http://en.wikipedia.org/wiki/Evidence-based_medicine">evidence-based</a>. It&#8217;s no longer enough to run anti-smoking ads; we now require those responsible for public health to show that their preferred method of behavior modification actually reduces disease. Meanwhile, marketers have rallied around the idea that purpose of their work is to get targeted individuals to <em>do something</em>, whether that&#8217;s purchasing a product or voting for a particular candidate. That may not be an appropriate goal for non-advocacy journalism, but marketing and public relations researchers have made very careful <a href="http://www.amazon.com/gp/product/1572307269/">studies</a> of communication, recall, and belief.</p>
<p>Similar concerns over how messages are received arise in many fields, from crisis communications to public diplomacy. But not in journalism. If journalism does not change action it must change minds, but the tools and language of belief change seem to be entirely missing from the profession.</p>
<p><strong>Journalism as surveillance of ignorance</strong><br />
It used to be the job of an editor to decide what to publish. Maybe it is now the job of an editor to decide what needs to be known. These are not at all the same thing. They used to be, when nothing could be done with a story after the ink hit paper. The internet allows so much more &#8212; promotion within specific communities, feedback on readership and reception, conversation as opposed to oratory. And potentially, cheap techniques to determine what people already believe.</p>
<p>We should expect that users will largely be choosing for themselves what to read and view. That&#8217;s reality, and that&#8217;s fine, and systems that make it easy to satisfy curiosity are systems that will make us smarter (even though we&#8217;ll mostly use them for entertainment.) But I believe there will still be an identifiable set of common content, the few things that the public &#8212; or some targeted fraction of it &#8212; absolutely has to know to participate meaningfully in the civic issues of the day. This is more or less what editors put on the front page today. But rather than the headlines reflecting the most important events, perhaps they should reflect the most pernicious misconceptions. Good journalists already have some sense of this, and every so often we learn of an alarming gap in public knowledge. A majority of Americans <a href="http://www.usatoday.com/news/washington/2003-09-06-poll-iraq_x.htm">believed for years</a> that Saddam Hussein was linked to 9/11, for example. Today, most Americans <a href="http://www.msnbc.msn.com/id/39294608/ns/health-health_care/">don&#8217;t know</a> what&#8217;s actually in Obama&#8217;s new health care laws. (I apologize again to my international readers for the US-centric examples; I&#8217;d love to hear of similarly woeful tales from other countries.)</p>
<p>Combatting ignorance is harder than publishing. It&#8217;s my best guess for the second, mysterious arrow in the diagram above. Fortunately we also have new tools. We have reams and reams of data that people voluntarily put online, the &#8220;<a href="http://www.vlab.org/article.html?aid=304">data exhaust</a>&#8221; of entire societies. We also have old-fashioned public opinion polls, and their lightweight cousin online polls (though self-selection bias may render online surveys <a href="http://onlinelibrary.wiley.com/doi/10.1111/j.1751-5823.2010.00112.x/full">useless</a> for all but the most casual work.) Somewhere in all this data and all this communication, it must be possible to figure out what it is that people actually believe &#8212; and where those beliefs are factually wrong in an uncomplicated way, precisely the way that an editor would say &#8220;that&#8217;s not true, we can&#8217;t print it.&#8221;</p>
<p>There are many possibilities for understanding the beliefs of an audience. I am particularly intrigued by <a href="http://opinion.berkeley.edu/landing/">opinion mapping</a>, <a href="http://cdd.stanford.edu/polls/docs/summary/">deliberative polling</a>, and the attempts of <a href="http://www.unglobalpulse.org/">UN Global Pulse</a> to create data-driven societal monitoring systems. It may actually be possible to cheaply measure the state of public knowledge, which would also give us concrete metrics for improvement. We need new ways of thinking about the surveillance of ignorance, and we need software to implement them. But more than anything else, we need journalists attuned to what it is that people don&#8217;t know. Good journalists already are; they can see what is missing from discussion &#8212; whether that&#8217;s a question that no one has answered or a challenge to a prevalent belief &#8212; and do the hard work of adding it.</p>
<p>This effort applies at all scales. Each journalist has an audience or audiences, their communities of concern. Each could track what their audience already knows and believes. The job of the journalist, so conceived, is not merely to report the happenings, but to ensure that the audience is aware of and understands the most crucial of them. That won&#8217;t be easy. Aside from the challenges of determining what an audience already knows, people don&#8217;t like to be told they&#8217;re uninformed or wrong. This is why I believe a journalist needs to learn everything there is know about public communication, borrowing and adapting from marketing experts and public health planners. Genuine honesty and humility seems to me the ethical core, and newsroom transparency is a critical check on this power.</p>
<p>Of course, decisions would have to be made about what are misconceptions and which of them are important enough to combat. Decisions have to be made already about what to cover and promote with limited resources, and these hard choices are the iceberg that sinks any hope of a truly &#8220;impartial&#8221; journalism. It&#8217;s a reality that the profession has to deal with every day, and I wish we would get on with the work of crafting and communicating our normative stance, rather than insisting that &#8220;objectivity&#8221; means we don&#8217;t have one. (Even Wikipedia <a href="http://en.wikipedia.org/wiki/Wikipedia:Neutral_point_of_view">explains its norms</a> in great detail.) I&#8217;d like to start with a list of things that journalists wish were better known. Be honest. I know you&#8217;ve already thought about this.</p>
<p>But if we can get over that hurdle &#8212; if we can admit that journalism needs concrete goals &#8212; then we stand a chance of doing better journalism, and knowing when we&#8217;re doing it. For me, the insane possibility of new communications technology carries with it the obligation to do better than we ever have before.</p>
<p><em><strong>UPDATE:</strong> As if on cue, a major study was released four days after I published this, showing that a majority of American voters were misinformed about the issues they voted on in the recent mid-term elections. I discuss what that means <a href="http://jonathanstray.com/american-journalism-failed-to-inform-voters">here</a>.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://jonathanstray.com/does-journalism-work/feed</wfw:commentRss>
		<slash:comments>20</slash:comments>
		</item>
		<item>
		<title>Countries Seen Through Comments</title>
		<link>http://jonathanstray.com/countries-seen-through-comments</link>
		<comments>http://jonathanstray.com/countries-seen-through-comments#comments</comments>
		<pubDate>Wed, 03 Mar 2010 16:08:08 +0000</pubDate>
		<dc:creator>Jonathan Stray</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[belief]]></category>
		<category><![CDATA[comments]]></category>
		<category><![CDATA[journalism]]></category>

		<guid isPermaLink="false">http://jonathanstray.com/?p=1635</guid>
		<description><![CDATA[The comments on the news are more revealing of a culture than the news itself. Journalism too often has a commitment to a sort of sanitized neutrality, and certainly tries for clarity, smoothing away complex disagreements. That has its uses, but the comments are a much messier, more divided, more personal look at a culture. [...]]]></description>
			<content:encoded><![CDATA[<p>The comments on the news are more revealing of a culture than the news itself. Journalism too often has a commitment to a sort of sanitized neutrality, and certainly tries for clarity, smoothing away complex disagreements. That has its uses, but the comments are a much messier, more divided, more personal look at a culture. They have the texture of life at street level.</p>
<p>First, America. On January 30 New York Times ran an article headlined &#8220;<a href="http://www.nytimes.com/2010/01/30/us/30airlift.html">U.S. Suspends Haitian Airlift in Cost Dispute</a>&#8221; which described how the US had stopped medical evacuations to Miami because of a state vs. federal arguments over who would pay for their medical care. The <a href="http://community.nytimes.com/comments/www.nytimes.com/2010/01/30/us/30airlift.html">comments</a> reveal a deeply divided country:</p>
<blockquote><p>The richest country in the world bickering about who is going to pay before it treats patients who need critical care from the poorest country in the Western hemisphere! We Americans should be very proud of ourselves!</p></blockquote>
<blockquote><p>Another wave of third world, uneducated people of an alien culture is about to hit our shores, helped this time by the Obama administration&#8217;s desire to show compassion. Unfortunately, the tax-paying citizens of this country will have to pay, in more ways than one.</p></blockquote>
<p>Meanwhile, Nigerians are talking about the Underwear Bomber. <a href="http://globalvoicesonline.org/">Global Voices</a> has helpfully collected <a href="http://globalvoicesonline.org/2010/01/15/nigeria-nigerian-bloggers-take-on-would-be-bomber-umar-abdulmutallab/">some of the blogger reactions</a>. For America, the story was about fear and terrorism and security. For Nigeria, it was about reputation and identity:</p>
<blockquote><p>Be honest, when you heard a Nigerian man tried to commit a terrorist act in America, how many of you immediately thought ‘Please don&#8217;t let him be [insert your ethnic group]?</p></blockquote>
<blockquote><p>There’s an Igbo proverb that says, “If one finger touches palm oil, it spreads to all the other fingers.” This is indicative of how Nigerians the world over felt when they heard the news of a young man who attempted to detonate a bomb on U.S. soil in the name of Al Qaeda. Many of us worried that the actions of this one finger would spread to cover the entire 150 million of us.</p></blockquote>
<blockquote><p>How does disowning him help Nigerians understand what role extreme Islamic ideology played in causing him to attempt detonating an explosive device on board a US-bound airliner? How does it help Nigerians understand the complex interplay of religious faith, access to extremist religious groups and ideological brainwashing?</p></blockquote>
<p>Meanwile, the ever-wonderful ChinaSMACK took a break from pop-culture scandal (on the site right now: <a href="http://www.chinasmack.com/videos/hong-kong-girl-c-cup-breasts/">Hong Kong Girl Shows Off C Cup Breasts To Ex-Boyfriend</a>) to <a href="http://www.chinasmack.com/stories/america-taiwan-arms-sales-chinese-netizen-reactions/">translate online Chinese reactions</a> to news of US sales of arms to Taiwan. And the Chinese, often portrayed as uniformly nationalist, are just as diverse and divided as any other country:</p>
<blockquote><p>We don’t need to fear America selling arms to Taiwan, as soon as a war started these advanced weapons would be quickly consumed by our lower-quality but numerous weapons and many soldiers.</p></blockquote>
<blockquote><p>I only know that without America, the whole world would be chaotic.</p></blockquote>
<blockquote><p>As long as America exists, the world cannot be peaceful.</p></blockquote>
<p>What I want to know is, where are the foreign voices in these conversations? Right now each culture is talking about the others like they&#8217;re not in the room. And they&#8217;re right. Our global conversation is fragmented and <a href="jonathanstray.com/we-have-no-maps-of-the-web">unmapped</a>. It&#8217;s not a small world after all.</p>
]]></content:encoded>
			<wfw:commentRss>http://jonathanstray.com/countries-seen-through-comments/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Escaping the News Hall of Mirrors</title>
		<link>http://jonathanstray.com/escaping-the-news-hall-of-mirrors</link>
		<comments>http://jonathanstray.com/escaping-the-news-hall-of-mirrors#comments</comments>
		<pubDate>Thu, 05 Mar 2009 07:40:35 +0000</pubDate>
		<dc:creator>Jonathan Stray</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[belief]]></category>
		<category><![CDATA[information]]></category>
		<category><![CDATA[information visualization]]></category>
		<category><![CDATA[journalism]]></category>

		<guid isPermaLink="false">http://jonathanstray.com/?p=412</guid>
		<description><![CDATA[We live in a cacaphony of news, but most of it is just echoes. Generating news is expensive; collecting it is not. This is the central insight of the news aggregator business model, be it a local paper that runs AP Wire and Reuters stories between ads, or web sites like Topix, Newser, and Memeorandum, [...]]]></description>
			<content:encoded><![CDATA[<p>We live in a cacaphony of news, but most of it is just echoes. Generating news is expensive; collecting it is not. This is the central insight of the news aggregator business model, be it a local paper that runs <a href="http://www.ap.org/">AP Wire</a> and <a href="http://reuters.com">Reuters</a> stories between ads, or web sites like <a href="http://topix.com">Topix</a>, <a href="http://Newser.com">Newser</a>, and <a href="http://memeorandum.com">Memeorandum</a>, or for that matter <a href="http://news.google.com">Google News</a>. None of these sites actually pay reporters to research and write stories, and <a title="Not a good time for traditional media" href="http://www.csmonitor.com/2008/1210/p02s01-usgn.html">professional journalism is in financial crisis</a>. Meanwhile there are more bloggers, but even more re-blogging. Is there more or less original information entering the web this year than last year? No one knows.</p>
<p>A computer could answer this question. A computer could trace the first, original source of any particular article or statement. The effect would be like donning special glasses in the hall of mirrors that is current news coverage, being able to spot the true sources without distraction from reflections. The required technology is nearly here.</p>
<p>This is more than geekery if you&#8217;re in a position of needing to know the truth of something. Last week I was researching a man named <a href="http://en.wikipedia.org/wiki/Michael_D._Steele">Michael D. Steele</a>, after reading a newly <a href="http://wikileaks.org/wiki/DOD:_Appeal_to_Evaluation_Report_of_US_Department_of_the_Army%2C_Feb_11_2008">leaked</a> document containing his name. Steele gained fame as one of the stranded commanders in Black Hawk Down, but several of his soldiers later killed three unarmed Iraqi men. I rapidly discovered many news stories (<a href="http://www.editorandpublisher.com/eandp/news/article_display.jsp?vnu_content_id=1002877916">1</a>, <a href="http://blogs.abcnews.com/theworldnewser/2006/08/army_murder_cas.html">2</a>, <a href="http://www.nytimes.com/2007/01/21/world/middleeast/21abuse.html?_r=2">3</a>, <a href="http://www.msnbc.msn.com/id/13974639/">4</a>, <a href="http://abcnews.go.com/Nightline/IraqCoverage/Story?id=2265742&amp;page=1">5</a>, <a href="http://abcnews.go.com/Nightline/IraqCoverage/Story?id=2265742&amp;page=1">6</a>, <a href="http://counterpunch.org/leupp08052006.html">7</a>, etc.) claiming that Steele had ordered his men to &#8220;kill all military-age males.&#8221; This is a serious accusation, and widely reprinted &#8212; but no number of news articles, blog posts, and reblogs can make a false statement more true. I needed to know who first reported this statement, and its original source.</p>
<p><span id="more-412"></span></p>
<p>First I had to deal with straight-up duplication of stories. The <a href="http://www.editorandpublisher.com/eandp/news/article_display.jsp?vnu_content_id=1002877916">first reference</a> above is an Assoicated Press (AP) story which included the quote, saying it was from &#8220;sworn statements obtained by the Associated Press.&#8221; The subsequent MSNBC <a href="http://www.msnbc.msn.com/id/13974639/">article</a> is in fact just a reprint of the AP story. There were other reprints, each on a different outlet but credited to AP in the standard practice of newswire syndication. I can&#8217;t argue with the ethics or legality of the practice, but this type of mirroring does amplify a story&#8217;s apparent significance on the web.</p>
<p>The second level of indirection is the hyperlink. One of the references above is an <a href="http://blogs.abcnews.com/theworldnewser/2006/08/army_murder_cas.html">ABC News Blog story</a> which refers to the AP article, linking to it and one other related story. Although the text is new, this article is nothing more than a rehashing of facts presented elsewhere. For research or authentication purposes, it&#8217;s basically worthless.</p>
<p>Finally, there is the uncredited reblog, exemplified by a <a href="http://dekerivers.wordpress.com/2007/01/22/army-colonel-said-kill-all-iraqi-military-age-males/">post</a> on the blog Caffienated Politics where the key phrase is repeated without links or attribution. Even the <a href="http://counterpunch.org/leupp08052006.html">article</a> on CounterPunch &#8212; headlined  &#8220;Kill All Military Age Men&#8221; &#8212; does not provide any sources at all.</p>
<p>In my manual analysis, only the <a href="http://www.editorandpublisher.com/eandp/news/article_display.jsp?vnu_content_id=1002877916">AP article</a> and a <a href="http://query.nytimes.com/gst/fullpage.html?res=9A07E5D8103FF930A3575BC0A9609C8B63&amp;sec=&amp;spon=&amp;pagewanted=all">piece</a> in the New York Times were original research. Out of the dozens (or hundreds?) of articles, blog posts, and screaming headlines, only two people/organizations had actually bothered to obtain original information. This doesn&#8217;t mean that Michael D. Steele did not, in fact, order his troops to &#8220;kill all military-age males.&#8221; In fact, the NYT article names four soldiers under his command who testified, on August 2nd in a military <a href="http://en.wikipedia.org/wiki/Article_32_hearing">Article 32 hearing</a>, that he did. This is what makes the statement reliable, not the ten thousand reblogs.</p>
<p>I shouldn&#8217;t have to do this sort of analysis by hand.</p>
<p>We&#8217;re getting there. In 2007, Google News <a href="http://googlenewsblog.blogspot.com/2007/08/original-stories-from-source.html">introduced a feature </a>that elimates duplicated stories from its default results display. This is simple elimination of textual duplicates, a reaction to newswire syndication. Slightly more advanced <a href="http://www.google.com/url?sa=t&amp;source=web&amp;ct=res&amp;cd=1&amp;url=http%3A%2F%2Fportal.acm.org%2Fcitation.cfm%3Fid%3D956946&amp;ei=AiOvScXVBoGEsQOR-53XAQ&amp;usg=AFQjCNEynopNiBiOr_zdkHw96gIgOQgR7Q&amp;sig2=vxTdEBg-AToebtj3tKX0OQ">algorithms</a> can be used to detect and cull <em>near</em> duplicates, such as the techniques Google has long used for web pages (near duplicates shouldn&#8217;t count as more than one item in the results list.)</p>
<p>I want more. For any particular paragraph, phrase or statement, I want to know exactly who said it first and where they got it from. I want  automatic culling of cut-and-paste &#8220;reporting&#8221; and unattributed quotations (and plagiarism.) I want my computer to automatically track back through hyperlinks when they&#8217;re present, and do deep textual analysis to determine who references whom even when the content is unattributed. The software should also analyze publication dates, where available, to see who said what first.</p>
<p>What I want is the <a href="http://en.wikipedia.org/wiki/Phylogenetic_tree">phylgenetic tree</a> of any particular story or post, a graph which shows which articles &#8220;evolved&#8221; from which ancestors, and therefore which article or articles constitute the originals, the raw input of real-world information into the &#8216;net. In fact, phylogenetic trees have already been applied to documents. In an <a title="This article is cool!" href="www.ceng.metu.edu.tr/~tcan/ceng465/Spring2006/Schedule/chainLetters.pdf">article</a> published in Scientific American in 2003, the authors analyzed 33 different versions of a chain letter with algorithms originally designed to track evolutionary changes in genetic sequences, and were able to deduce which was the original version.</p>
<p style="text-align: center;"><a href="http://jonathanstray.com/wp-content/uploads/2009/03/chain-letter-tree.png"><img class="size-medium wp-image-426 aligncenter" title="chain-letter-tree" src="http://jonathanstray.com/wp-content/uploads/2009/03/chain-letter-tree-300x197.png" alt="chain-letter-tree" width="300" height="197" /></a></p>
<p>This type of analysis only works with identical snippets of text &#8212; copied articles that are modified, paragraphs cut and pasted, quotations. More sophisticated text analysis algorithms will be able to handle paraphrased reports, where an article is rewritten without adding substantial new information. General semantic analysis of news stories is coming, even <a title="whoa!" href="prestospace.org/training/images/WWW05.pdf ">for audio and video</a>, at which point it will be possible to track a single statement through all its rewritings and rewordings as it passes from article to article to blog. Combined with information from hyperlinks and posting times, we will be able to construct a &#8220;source tree&#8221; like the one above for any given story.</p>
<p>We&#8217;ll finally be able to tell how much content we actually have, and where it came from. (We could even track the evolution of memes.)</p>
<p>It&#8217;s not that repeated coverage and discussion of the same story adds nothing. A major story <em>should</em> be covered by multiple outlets, and quotation, paraphrasing, and reblogging is how interesting or important stories spread; telling others about what we know is fundamentally how societal awareness comes to be. However, yelling something louder doesn&#8217;t make it more significant, or more true. In the balance between awareness and vacuous repetition, I refer to Ethan Zuckerman&#8217;s web 2.0 maxim: <a href="http://www.ethanzuckerman.com/blog/2006/05/30/my-talk/">don&#8217;t speak, point.</a></p>
<p>But I don&#8217;t want to set guidelines for authors. I want software that is smart enough to parse the anarchy of the web and tell me what is a reflection and what is not, and I want everyone else to have this software too. I want to be able to see the source tree for every article or fact of interest to me, and I want filtered views on my news aggregators that show only the primary reports. It&#8217;s not important (or remotely realistic) that every reader scrutinize the sources for every article, but it is important that it is <em>possible </em>to do so easily. The interested ameteur should be able to trace statements in a few clicks; this should be a deterrent to the spreading of un-sourced lies as truth, and a stumbling block for would-be propaganda campaigns. In traditional journalism, the tracking and validation of sources was the responsibility of the media monopolies. If we are witnessing the dawning of the era where we all get to have our say &#8212; if the infosphere is going to be radically democratized and expanded a million fold  &#8212; then it is suddenly the responsibility of all of us in general to monitor the quality of our information. For this we need tools.</p>
<p><strong>UPDATE (October 2010): </strong>Since I wrote this, the<strong> </strong><a href="http://memetracker.org">Memetracker</a> project demonstrated a whole-web news tracking service that has much of the capability I wished for. It even works by building text mutation trees. More on Memetracker and what it means for news at the <a href="http://www.niemanlab.org/2009/07/in-the-news-cycle-memes-spread-more-like-a-heartbeat-than-a-virus/">Nieman Journalism Lab</a>.  My original post also missed the significance of social networking tools for the spread of news. There is now a fascinating project that aims to detect and track the source of political smear campaigns on Twitter, the <a href="http://truthy.indiana.edu/">Truthy</a> project. We&#8217;re getting there technologically speaking. Now we just need to get the technology into our everyday news reading apps.</p>
]]></content:encoded>
			<wfw:commentRss>http://jonathanstray.com/escaping-the-news-hall-of-mirrors/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>How Many World Wide Webs Are There?</title>
		<link>http://jonathanstray.com/how-many-webs</link>
		<comments>http://jonathanstray.com/how-many-webs#comments</comments>
		<pubDate>Wed, 04 Feb 2009 23:53:51 +0000</pubDate>
		<dc:creator>Jonathan Stray</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[belief]]></category>
		<category><![CDATA[information]]></category>
		<category><![CDATA[information visualization]]></category>
		<category><![CDATA[language]]></category>
		<category><![CDATA[technology]]></category>
		<category><![CDATA[translation]]></category>

		<guid isPermaLink="false">http://jonathanstray.com/?p=257</guid>
		<description><![CDATA[How much overlap is there between the web in different languages, and what sites act as gateways for information between them? Many people have constructed partial maps of the web (such as the  blogosphere map by Matthew Hurst, above) but as far as I know, the entire web has never been systematically mapped in terms [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: center;"><a href="http://datamining.typepad.com/gallery/blog-map-gallery.html"></a><a href="http://datamining.typepad.com/gallery/blog-map-gallery.html"><img class="alignnone size-medium wp-image-328" title="newblog-crop" src="http://jonathanstray.com/wp-content/uploads/2009/02/newblog-crop-300x274.png" alt="newblog-crop" width="300" height="274" /></a></p>
<p>How much overlap is there between the web in different languages, and what sites act as gateways for information between them? Many people have constructed partial maps of the web (such as the  <a href="http://datamining.typepad.com/gallery/blog-map-gallery.html">blogosphere map</a> by Matthew Hurst, above) but as far as I know, the entire web has never been systematically mapped in terms of language.</p>
<p>Of course, what I actually want to know is, how connected are the different cultures of the world, really? We live in an age where the world seems small, and in a strictly technological sense it is. I have at my command this very instant not one but several enormous international communications networks; I could email, IM, text message, or call someone in any country in the world. And yet I very rarely do.</p>
<p>Similarly, it&#8217;s easy to feel like we&#8217;re surrounded by all the international information we could possibly want, including direct access to foreign news services, but I can only read articles and watch reports in English. As a result, information is firewalled between cultures; there are questions that could very easily be answered by any one of tens or hundreds of millions of native speakers, yet are very difficult for me to answer personally. For example, what is the journalistic slant of <a href="http://www.aljazeera.net/portal">al-Jazeera</a>, the original one in Arabic, not the <a href="http://www.aljazeera.com/index.html">English version</a> which is produced by a completely different staff?  Or, suppose I wanted to know what the average citizen of Indonesia thinks of the sweatshops there, or what is on the front page of the Shanghai Times today&#8211; and does such a newspaper even exist? What is written on the 70% of web pages that are not in English?</p>
<p><span id="more-257"></span>We all live on the same physical planet, but the information worlds we inhabit must be vastly different. This are many reasons for this other than language, but language alone is enough to isolate humanity from itself.</p>
<p>And so, my question: how many islands are there in our multi-cultural information space, and how are they connected? I am willing to bet that a full-scale web map would show several large networks in the <a title="What languages are web pages in?" href="http://www.internetworldstats.com/stats7.htm">main languages of the web</a> &#8212; English, Chinese, Spanish, Japanese, German, etc. &#8212; but few connections between them, web sites frequented by bilingual or bi-cultural individuals, who after all are the true gateways between cultures. The structure of the interconnections might tell us something about the relationships between cultures, and the actual number of links might provide some measure of how close or how far apart we actually are. The individual URLs themselves would also be extremely valuable information, representing high-bandwidth links between cultures, the trans-occeanic fiber between continents in the infosphere.</p>
<p>There is a second geography to the world that we&#8217;ve never seen. I don&#8217;t even know what I&#8217;m missing.</p>
<p>Creating such a map would be a trick, but by no means out of the reach of an academic project or a small company. <a href="http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html">Google</a> says there are currently over one trillion (10^12) unique web pages (for their particular definition of &#8220;unique&#8221;, which is more complex than it might seem.) Unlike a search engine, a language-based web map does not require the full contents of every page, merely the outgoing URLs and a discrete categorization of the language (which can be <a title="TextCat, mmrow!" href="http://odur.let.rug.nl/~vannoord/TextCat/">automatically determined</a> even without any document meta-data.)  Assuming that each URL  is assigned a unique 32 bit ID, another 32 bits for language and other info, and then links to an average of 20 other pages (<a href="http://uclue.com/index.php?xq=1015">estimates vary</a>), this is about 100 terrabytes of data &#8212; or perhaps $15000 worth of storage at current prices. This index could be created from a fresh crawl, or by parsing an existing one, such as from the folks at the brand new and very awesome <a title="Sooo Awesome!" href="http://www.dotnetdotcom.org/">DotBot open index of the web</a>.</p>
<p>The next step would be to generate the visualization of such a massive data set. The complete graph could be laid out in two or three dimensions using existing <a href="http://www.informatik.tu-cottbus.de/~an/GD/">clustering methods</a>. The resulting map could be traversed using GPU-accelerated rendering techniques for very large data sets, probably after some sort of hierarchical pre-processing that produces proxies for zoomed-out views of the network. A usuable UI would be crucial; the entire map needs to be navigable at multiple scales and composed of live, hyperlinked objects. The right visualization also depends on what you are trying to discover;  ultimately, there can be no single map because the choice of visualization is dependent upon usability and aesthetics, as the huge variety of beautiful maps at <a href="http://www.visualcomplexity.com/vc/">Visual Complexity</a> demonstrate.</p>
<p>The analysis could go much deeper with more computing power. Machine translation is currently poor, but it is probably good enough to detect whether one document is a translation of another. With this capability, we would actually be able to quantify the percentage of (public) textual information that makes it from one language into another and identify the key organizations that act as conduits. Further study might reveal fascinating things, such as selection biases in the types of news or information that get translated. The implications for differences in belief between cultures are obvious.</p>
<p>Yet even  a &#8220;links only&#8221; data set could still answer some highly revealing questions,  such as &#8220;what percentage of web sites are visited by people from multiple cultures?&#8221; or even &#8220;what is the best gateway between Polish and English film reviews?&#8221; This could be done without visualization, but it would be a mistake not to draw the actual maps.  Not only do pictures engage our spatial reasoning in a way that raw bits never can, but such a map would re-make an obvious point that is too often lost: in terms of communication between cultures, the world is not nearly as small or interconnected as we&#8217;d like to think it is.</p>
]]></content:encoded>
			<wfw:commentRss>http://jonathanstray.com/how-many-webs/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Knowing is Not Enough</title>
		<link>http://jonathanstray.com/knowing-is-not-enough</link>
		<comments>http://jonathanstray.com/knowing-is-not-enough#comments</comments>
		<pubDate>Wed, 17 Dec 2008 02:05:32 +0000</pubDate>
		<dc:creator>Jonathan Stray</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[belief]]></category>
		<category><![CDATA[internet]]></category>
		<category><![CDATA[world peace]]></category>

		<guid isPermaLink="false">http://jonathanstray.com/?p=159</guid>
		<description><![CDATA[Wikipedia will save the world. Information is tolerance. When the internet succeeds and all humanity finally has egalitarian access to all information everywhere, a new era of enlightenment will dawn. Oh really? We&#8217;ll just weigh all available evidence and come to reasonable conclusions about how the world should be, right? Or, I&#8217;ve got it this [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: center;"><a href="http://jonathanstray.com/wp-content/uploads/2008/12/stringtelephone.jpg"><img class="alignnone size-medium wp-image-160 aligncenter" title="stringtelephone" src="http://jonathanstray.com/wp-content/uploads/2008/12/stringtelephone.jpg" alt="" width="278" height="154" /></a></p>
<p>Wikipedia will save the world. Information is tolerance. When the internet succeeds and all humanity finally has egalitarian access to all information everywhere, a new era of enlightenment will dawn.</p>
<p>Oh really?</p>
<p><span id="more-159"></span>We&#8217;ll just weigh all available evidence and come to reasonable conclusions about how the world should be, right? Or, I&#8217;ve got it this time: the real problem is that we all grew up in isolation. We never met a Muslim until we were seventeen, we never saw a picture of the whale that&#8217;s going extinct five thousand miles from our home. The internet will fix this. When we can get the true numbers on starving African children with a flick of the wrist, we will suddenly care. On the day that Israelis and Palestinians begin to IM each other about the new coolness on YouTube, there will be peace.</p>
<p>Humans do not seem to be naturally talented at bridging disagreements. Suppose you put a bunch of people with diverse opinions into a room. They discuss. When they walk out, instead of converging towards some sort of moderate position, the individuals often come out with more extreme views. This is called <a title="Betcha you wouldn't have expected this" href="http://en.wikipedia.org/wiki/Group_polarization">group polarization</a>. Or, take someone who already believes in capital punishment and show them supporting evidence. Their conviction strengthens. Fair enough. Show them contradictory evidence. Their conviction still <a title="Biased Assimilation and Attitude Polarization: The Effects of Prior Theories on Subsequently Considered Evidence" href="http://www.psych.umn.edu/courses/spring07/borgidae/psy5202/readings/lord,%20ross%20&amp;%20lepper%20(1979).pdf">strengthens</a>. What the fuck?</p>
<p>We are not built for reaching consensus, and probably a lot of what we hold dear is arbitrary anyway, which is to say that no principle of nature will ever really referee our disagreements over what is right. What we <em>are</em> built for is unclear, but pattern recognition seems to be important &#8212; we more readily see the patterns we&#8217;ve already recognized, which makes us much more likely to see evidence that already supports our beliefs. We  also respond peer pressure, because we have to live with the people that we have to live with. And we like to divide us from them at all scales, maybe so that we can hog all the good bits for &#8220;us&#8221;, but maybe just because it&#8217;s more fun to believe we&#8217;re better. Those guys are dweebs.</p>
<p>This is why I believe that mere communication &#8212; up to and including the global awesomeness of the internet &#8212; is not enough. Talking about it, getting everyone to the table, the deliberation of <a title="This would be a nice system of government" href="http://en.wikipedia.org/wiki/Deliberative_democracy">deliberative democracy</a> &#8212; well, when we run the <a title="Why not just test this?" href="http://cdd.stanford.edu/research/papers/2003/experimenting.pdf">experiments</a>, good discussions aren&#8217;t enough. They do seem to get everyone thinking about things in the same underlying framework, but usually still disagreeing. That is, talking about it lets us agree on the terms, such &#8220;left&#8221; versus &#8220;right&#8221; in Western politics or even &#8220;Muslim versus West&#8221; in global politics, but it doesn&#8217;t often produce a consensus on<em> what the world should look like</em>.</p>
<p>It is interesting to me that there are people whose entire life is brokering agreements: moderators, diplomats, those who work in conflict resolution of all kinds. It makes me wonder how successful they are, and what they know that I don&#8217;t. Turns out you can get an entire degree on this subject, for example. (Or at least read a <a title="A book about conflict resolution" href="http://books.google.com/books?hl=en&amp;lr=&amp;id=dZ6au557QYEC&amp;oi=fnd&amp;pg=PR9&amp;dq=%22Miall%22+%22Contemporary+conflict+resolution%22+&amp;ots=cqoLqWH2ec&amp;sig=AocodHS7f7cEtz-zpxXHdAIc594">book</a>?) Anyway, I think this is probably an important thing to know. The internet is so new and so exciting, it&#8217;s easy to forget that we don&#8217;t actually know how to use it.</p>
]]></content:encoded>
			<wfw:commentRss>http://jonathanstray.com/knowing-is-not-enough/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Are They Right?</title>
		<link>http://jonathanstray.com/are-they-right</link>
		<comments>http://jonathanstray.com/are-they-right#comments</comments>
		<pubDate>Wed, 10 Dec 2008 07:22:52 +0000</pubDate>
		<dc:creator>Jonathan Stray</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[belief]]></category>
		<category><![CDATA[climate change]]></category>
		<category><![CDATA[economics]]></category>
		<category><![CDATA[knowledge]]></category>
		<category><![CDATA[politics]]></category>
		<category><![CDATA[science]]></category>

		<guid isPermaLink="false">http://jonathanstray.com/?p=152</guid>
		<description><![CDATA[I&#8217;ve been reading StopTheACLU.com, because I want to get into their heads, because I want to avoid the classic mistake of intellectual isolation, and because I want to be challenged. Sure, they&#8217;re weirdos, but that doesn&#8217;t mean they don&#8217;t make sense. But there&#8217;s at least one thing in the StopTheACLU worldview that I find very [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been reading <a href="http://www.stoptheaclu.com/">StopTheACLU.com</a>, because I want to get into their heads, because I want to avoid the classic mistake of intellectual isolation, and because I want to be challenged. Sure, they&#8217;re weirdos, but that doesn&#8217;t mean they don&#8217;t make sense. But there&#8217;s at least one thing in the StopTheACLU worldview that I find very hard to method-act: in their universe, global warming is a myth.</p>
<p>Okay, but how did I end up on this side and not that side?</p>
<p><span id="more-152"></span></p>
<p>I went through this in Russia last year, when I was hosted in Moscow by a global warming skeptic; apparently it&#8217;s politically popular there to deny global warming, which sounds like a slight to Russia except when you remember that it&#8217;s politically popular here, too. But anyway, I was plunged headfirst into the debate with an ambitious little snot of a web-startup wannabe millionaire (&#8220;You should see our new offices! The Mafia used to operate out of there! They still visit someimes.&#8221;) Running through the arguments in great detail (as I previously reported <a href="http://www.equivocality.net/why-do-i-believe-this/">here</a>) I was forced to ask the very pertinent question, why do <em>I</em> believe that global warming is real, and man-made, and a serious problem?</p>
<p>The quick answer is that I believe the <a title="Intergovernmental Panel on Climate Change" href="http://en.wikipedia.org/wiki/Ipcc">Intergovernmental Panel on Climate Change</a> reports, but that&#8217;s also just ink on paper. Why do I trust them?</p>
<p>It has to do with process. To begin with, I know what the IPCC process actually is. They have devoted almost as much dead tree to <a title="IPCC process" href="http://www.ipcc.ch/about/how-the-ipcc-is-organized.htm">how they reached their conclusions</a> as to the conclusions themselves. In short, they collected something like 600 climate scientists from 40 countries, locked them in a library with a complete and current set of all relevant academic and scientific publications, and threw raw meat at them through the bars until they reached consensus. Actually, that&#8217;s not quite how it happened. Some of them were vegetarians.</p>
<p>The 2007 <a title="It's a big document" href="http://en.wikipedia.org/wiki/IPCC_Fourth_Assessment_Report">Fourth Assessment Report</a> was then further reviewed by another 600-odd people, corrected, argued over, politicized, and finally published. Although there is no way to guarantee against bias in the author and reviewer selection process, at least a very diverse range of viewpoints could be expected to be represented, and at least the people involved have some reason to know what they&#8217;re talking about, having spent significant chunks of their lives asking questions about the ecosystem. This is as global and as sincere an effort to answer a question as humanity has ever seen, and it was all meticulously open and transparent.</p>
<p>There is a moral here.</p>
<p>One of the great things that thorough education teaches you &#8212; any education &#8212; is just how deep the rabbit hole of knowledge goes. It&#8217;s a smart person who realizes how big and complex and subtle any real discipline is; and I am absolutely at a loss to answer the tricky questions of someone else&#8217;s field, be they about global warming, the effectiveness of acupuncture, or whether cutting taxes will really help with unemployment (or not.) The only truly universal approach, our only hope for living in a world too big for reason, is to learn to evaluate how any given body of knowledge decides what is true and what is not true. In painful depth and detail.</p>
<p>The method, philosophy, and process of coming to believe: that is everything. I can&#8217;t say I even understand this process in myself, let alone an entire civilization, but I can say with conviction that it&#8217;s my favorite field of study.</p>
]]></content:encoded>
			<wfw:commentRss>http://jonathanstray.com/are-they-right/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Two Sages</title>
		<link>http://jonathanstray.com/two-sages</link>
		<comments>http://jonathanstray.com/two-sages#comments</comments>
		<pubDate>Sun, 30 Nov 2008 20:09:40 +0000</pubDate>
		<dc:creator>Jonathan Stray</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[belief]]></category>
		<category><![CDATA[knowledge]]></category>
		<category><![CDATA[philosophy]]></category>
		<category><![CDATA[religion]]></category>
		<category><![CDATA[science]]></category>

		<guid isPermaLink="false">http://jonathanstray.com/?p=148</guid>
		<description><![CDATA[The North Sage and the South Sage met at the crossroads. Or on, let&#8217;s say, a mountaintop. They began to discuss what they knew about the world, in the hopes of becoming wiser. Neither would call what they believed a religion. The North Sage said that he had learned through meditation that each person was [...]]]></description>
			<content:encoded><![CDATA[<p>The North Sage and the South Sage met at the crossroads. Or on, let&#8217;s say, a mountaintop. They began to discuss what they knew about the world, in the hopes of becoming wiser. Neither would call what they believed a religion.</p>
<p><span id="more-148"></span></p>
<p>The North Sage said that he had learned through meditation that each person was connected to the cosmos. The South Sage said that his people had developed powerful tools that could penetrate the heart of the invisible. The North Sage insisted that all knowledge would come from within. The South Sage asked how that could be possible, and claimed that one could only truly learn from observing nature.</p>
<p>Neither was stupid enough to insist that the other was wrong. The South Sage understood that if he had been born in the North, he would have learned all that the North Sage had. And the North Sage could imagine forgetting everything he knew; he saw that only with a beginner&#8217;s mind would he ever be able to  comprehend the wisdom of the South.</p>
<p>They stayed many days at this crossroads, on the mountain top. Each day they sat in the shade of an enormous old tree which was neither the native Willow of the North nor the sweeping Banyan of the South. They talked all day, then each returned to their own camp at nightfall.  The North Sage watched the stars and meditated. He sought not understanding but clarity. The South Sage wrote by moonlight in a huge old book. He wrote not what he had learned, but questions he had discovered.</p>
<p>After many months, it came to this.</p>
<p>&#8220;You,&#8221; said the South Sage, &#8220;you believe that all answers come from within. In your life you have found this method a far more reliable guide than mere tabulation of nature. But you cannot convince me that this method is better, because all your proofs come from where I cannot see them.&#8221;</p>
<p>&#8220;And you, my friend,&#8221; said the North Sage, &#8220;everything you have learned shows you that reason and study can unravel any mystery. But reason alone cannot show me that reason is all-encompassing, and so I must look elsewhere for deeper truths.&#8221;</p>
<p>&#8220;We cannot ever convince the other of our truths,&#8221; nodded the South Sage, &#8220;because we each ask for a type of proof that the other does not believe in.&#8221;</p>
<p>&#8220;What shall we do?&#8221; moaned the North Sage. &#8220;How can we resolve this dilemma? How can we learn from each-other? Must we stay at this crossroads forever?&#8221;</p>
<p>&#8220;No,&#8221; said the South Sage. &#8220;We are free to leave. We may each return to our homes. Or, you may continue walking South, and I may continue walking North.&#8221;</p>
<p>&#8220;Alternatively,&#8221; said the North Sage, &#8220;we could each walk in any direction we pleased, and hope to discover new lands to the East and West.&#8221;</p>
<p>&#8220;That is a good idea,&#8221; said the South Sage. &#8220;We will never know all the places in between, but it would help us both to understand the boundaries of the map. Do you think we will ever meet again, my wise friend?&#8221;</p>
<p>The North Sage blinked. &#8220;Of course,&#8221; he said, &#8220;for neither of us can escape this world.&#8221;</p>
]]></content:encoded>
			<wfw:commentRss>http://jonathanstray.com/two-sages/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

