<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Jonathan Stray &#187; journalism</title>
	<atom:link href="http://jonathanstray.com/tag/journalism/feed" rel="self" type="application/rss+xml" />
	<link>http://jonathanstray.com</link>
	<description>Information, Culture, and Belief</description>
	<lastBuildDate>Tue, 15 May 2012 20:13:21 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>What should the digital public sphere do?</title>
		<link>http://jonathanstray.com/what-should-the-digital-public-sphere-do</link>
		<comments>http://jonathanstray.com/what-should-the-digital-public-sphere-do#comments</comments>
		<pubDate>Wed, 30 Nov 2011 01:12:46 +0000</pubDate>
		<dc:creator>Jonathan Stray</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[journalism]]></category>
		<category><![CDATA[knowledge]]></category>
		<category><![CDATA[media]]></category>
		<category><![CDATA[politics]]></category>
		<category><![CDATA[public sphere]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://jonathanstray.com/?p=2740</guid>
		<description><![CDATA[Earlier this year, I discovered there wasn&#8217;t really a name for the thing I wanted to talk about. I wanted a word or phrase that includes journalism, social media, search engines, libraries, Wikipedia, and parts of academia, the idea of all these things as a system for knowledge and communication. But there is no such word. [...]]]></description>
			<content:encoded><![CDATA[<p>Earlier this year, I discovered there wasn&#8217;t really a name for the thing I wanted to talk about. I wanted a word or phrase that includes journalism, social media, search engines, libraries, Wikipedia, and parts of academia, the idea of all these things as a system for knowledge and communication. But there is no such word. Nonetheless, this is an essay asking what all this stuff should do together.</p>
<p>What I see here is an ecosystem. There are narrow real-time feeds such as expertly curated Twitter accounts, and big general reference works like Wikipedia. There are armies of reporters working in their niches, but also colonies of computer scientists. There are curators both human and algorithmic. And I have no problem imagining that this ecosystem includes certain kinds of artists and artworks. Let&#8217;s say it includes all public acts and systems which come down to one person trying to tell another, &#8220;I didn&#8217;t just make this up. There&#8217;s something here of the world we share.&#8221;</p>
<p>I asked people what to call it. Some said &#8220;media.&#8221; That captures a lot of it, but I&#8217;m not really talking about the art or entertainment aspects of media. Also I wanted to include something of where ideas come from, something about discussions, collaborative investigation, and the generation of new knowledge. Other people said &#8220;information&#8221; but there is much more here than being informed. Information alone doesn&#8217;t make us care or act. It is part of, but only part of, what it means to connect to another human being at a distance.  Someone else said &#8220;the fourth estate&#8221; and this is much closer, because it pulls in all the ideas around civic participation and public discourse and speaking truth to power, loads of stuff we generally file under &#8220;democracy.&#8221; But the fourth estate today means &#8220;the press&#8221; and what I want to talk about is broader than journalism.</p>
<p>I&#8217;m just going to call this the &#8220;digital public sphere&#8221;, building on Jürgen Habermas&#8217; <a href="http://en.wikipedia.org/wiki/Public_sphere">idea</a> of a place for the discussion of shared concerns, public yet apart from the state. Maybe that&#8217;s not a great name &#8212; it&#8217;s a bit dry for my taste &#8212; but perhaps it&#8217;s the best that can be done in three words, and it&#8217;s already in use as a phrase to refer to many of the sorts of things I want to talk about. &#8220;Public sphere&#8221; captures something important, something about the societal goals of the system, and &#8220;digital&#8221; is a modifier that means we have to account for interactivity, networks, and computation. Taking inspiration from Michael Schudson&#8217;s <a href="http://books.google.com/books?id=Q2Dg55cxgfUC&amp;lpg=PA11&amp;ots=8b9gP-33fs&amp;dq=Six%20or%20seven%20things%20that%20journalism%20can%20do%20for%20democracy%20schudson&amp;pg=PA11#v=onepage&amp;q&amp;f=false">essay</a> &#8221;Six or seven things that news can do for democracy,&#8221; I want to ask what the digital public sphere can do for us. I think I see three broad categories, which are also three goals to keep in mind as we build our institutions and systems.</p>
<p>1. Information. It should be possible for people to find things out, whatever they want to know. Our institutions should help people organize to produce valuable new knowledge. And important information should automatically reach each person at just the right moment.</p>
<p>2. Empathy. The vast majority of people in the world, we will only know through media. We must strive to represent the &#8220;other&#8221; to each-other with compassion and reality. We can&#8217;t forget that there are <em>people</em> on the other end of the wire.</p>
<p>3. Collective action. What good is public deliberation if we can&#8217;t eventually come to a decision and act? But truly enabling the formation of broad agreement also requires that our information systems support conflict resolution. In this age of <a href="http://jonathanstray.com/visualizing-communities">complex overlapping communities</a>, this role spans everything from the local to the global.</p>
<p>Each of these is its own rich area, and each of these roles already cuts across many different forms and institutions of media.</p>
<p><strong>Information</strong><br />
I&#8217;d like to live in a world where it&#8217;s cheap and easy for anyone to satisfy the following desires:</p>
<ol>
<li>&#8220;I want to learn about X.&#8221;</li>
<li>&#8220;How do we know that about X?&#8221;</li>
<li>&#8220;What are the most interesting things we <em>don&#8217;t</em> know about X?&#8221;</li>
<li>&#8220;Please keep me informed about X.&#8221;</li>
<li>&#8220;I think we should know more about X.&#8221;</li>
<li>&#8220;I know something about X and want to tell others.&#8221;</li>
</ol>
<p>These desires span everything from mundane queries (&#8220;what time does the store close?&#8221;) to complex questions of fact (&#8220;what will be the effects of global climate change?&#8221;) And they apply at all scales; I might have a burning desire to know how the city government is going to deal with bike lanes, or I might be curious about the sum total of humanity&#8217;s knowledge of breast cancer &#8212; everything we know today, plus all the good questions we can&#8217;t yet answer. Different institutions exist to address each of these needs in various ways. Libraries have historically served the need to answer specific questions, desires number #1 and #2, but search engines also do this. Journalism strives to keep people abreast of current events, the essence of #4. Academia has focused on how we know and what we don&#8217;t yet know, which is #2 and #3.</p>
<p>This list includes two functions related to the production of new knowledge, because it seems to me that the public information ecosystem should support people working together to become collectively smarter. That&#8217;s why I&#8217;ve included #5, which is something like casting a vote for an unanswered question, and #6, the peer-to-peer ability to provide an answer. These seem like key elements in the democratic production of knowledge, because the resources which can be devoted to investigating answers are limited. There will always be a finite number of people well placed to answer any particular question, whether those people are researchers, reporters, subject matter experts, or simply well-informed. I like to imagine that their collective output is dwarfed by human curiosity. So efficiency matters, and we need to find ways to aggregate the questions of a community, and route each question to the person or people best positioned to find out the answer.</p>
<p>In the context of professional journalism, this amounts to asking what unanswered questions are most pressing to the community served by a newsroom. One could devise systems of asking the audience (like Quora and StackExchange) or analyze search logs (ala <a href="http://www.wired.com/magazine/2009/10/ff_demandmedia/all/1">Demand Media</a>.) That newsrooms don&#8217;t frequently do these things is, I think, an artifact of industrial history &#8212; and an unfilled niche in the current ecosystem. Search engines know where the gaps between supply and demand lie, but they&#8217;re not in the business of researching new answers. Newsrooms can produce the supply, but they don&#8217;t have an understanding of the demand. Today, these two sides of the industry do not work together to close this loop. Some symbiotic hybrid of Google and The Associated Press might be an uncannily good system for answering civic questions.</p>
<p>When new information does become available, there&#8217;s the issue of timing and routing. This is #4 again, &#8220;please keep me informed.&#8221; Traditionally, journalism has answered the question &#8220;who should know when?&#8221; with &#8220;everyone everything as fast as possible&#8221; but this is ridiculous today. I really don&#8217;t want my phone to vibrate for every news article ever written, which is why only &#8220;important&#8221; stories generate alerts. But taste and specialization dictate different definitions of &#8220;important&#8221; for each person, and old answers delivered when I need them might be just as valuable as new information delivered hot and fresh. Google is far down this track with its thinking on <a href="http://www.telegraph.co.uk/technology/google/8606477/Soon-Google-will-know-what-you-want-before-you-do.html">knowing what I want</a> before I search for it.</p>
<p><strong>Empathy </strong><br />
There is no better way to show one person to another, across a distance, than the human story. These stories about other people may be informative, sure, but maybe their real purpose is to help us feel what it is like to be someone else. This is an old art; one journalist friend credits Homer with the last major innovation in the form.</p>
<p>But we also have to show whole groups to each other, a very &#8220;mass media&#8221; goal. If I&#8217;ve never met a Cambodian or hung out with a union organizer, I only know what I see in the media. How can and should entire communities, groups, cultures, races, interests or nations be represented?</p>
<p>A good journalist, anthropologist, or writer can live with a community for a while, observing and learning, then articulate generalizations. This is important and useful. It&#8217;s also wildly subjective. But then, so is empathy. Curation and amplification can also be empathetic processes: someone can direct attention to the genuine voices of a community. This &#8220;don&#8217;t speak, point&#8221; role has been articulated by <a href="http://www.ethanzuckerman.com/blog/2006/05/30/my-talk/">Ethan Zuckerman</a> and practiced by <a href="http://www.guardian.co.uk/technology/2011/mar/14/andy-carvin-tunisia-libya-egypt-sxsw-2011">Andy Carvin</a>.</p>
<p>But these are still at the level of individual stories. Who is representative? If I can only talk to five people, which five people should I know? Maybe a human story, no matter how effective, is just a single <a href="http://stats.stackexchange.com/questions/269/what-is-the-difference-between-a-population-and-a-sample">sample</a> in the sense of a tiny part standing for the whole. Turning this notion around, making it personal, I come to an ideal: If I am to be seen as part of some group, then I want representations of that group to include me in some way. This is an argument that mass media coverage of a community should try to account for every person in that community. This is absurd in practical terms, but it can serve as a signpost, a core idea, something to aim for.</p>
<p>Fortunately, more inclusive representations are getting easier. Most profoundly, the widespread availability of peer-to-peer communication networks makes it easier than ever for a single member of a community to speak and be heard widely.</p>
<p>We also have data. We can compile the demographics of social movements, or conduct polls to find &#8220;public opinion.&#8221; We can learn a lot from the numbers that describe a particular population, which is why surveys and censuses persist. But data are terrible at producing the emotional response at the core of empathy. For most people, learning that 23% of the children in some state live in poverty lacks the gut-punch of a story about a child who goes hungry at the end of every month. In fact there is <a href="http://csi.gsb.stanford.edu/increase-charitable-donations-appeal-heart">evidence</a> that making someone think analytically about an issue actually makes them less compassionate.</p>
<p>The best reporting might combine human stories with broader data. I am impressed by CNN&#8217;s <a href="http://www.cnn.com/SPECIALS/war.casualties/index.html">interactive exploration</a> of American casualties in Iraq, which links mass visualization with photographs and stories about each individual. But that piece covers a comparatively small population, only a few thousand people. There are emerging techniques to understand much larger groups, such as by visualizing the data trails of online life, all of the <a href="http://www.forbes.com/sites/kashmirhill/2010/12/01/brief-takeaways-and-a-pretty-diagram-from-the-ftcs-online-privacy-recommendations/">personal information</a> that we leave behind. We can <a href="http://jonathanstray.com/visualizing-communities">visualize communities</a>, using aggregate information to see the patterns of human association at all scales. I suspect that mass data visualization represents a fundamentally new way of understanding large groups, a way that is perhaps more inclusive than anecdotes yet richer than demographics. Also, visualization forces us into conversations about who exactly is a member of the community in question, because each person is either included in a particular visualization or not. Drawing such a hard boundary is often difficult, but it&#8217;s good to talk about the meanings of our labels.</p>
<p>And yet, for all this new technology, empathy remains a deeply human pursuit. Do we really want statistically unbiased samples of a community? My friend Quinn Norton says that journalism should &#8220;strive to show us our better selves.&#8221; Sometimes, what we need is brutal honesty. At other times, what we need is kindness and inspiration.</p>
<p><strong>Collective action</strong></p>
<blockquote><p>What a difficult challenge advances in communication have become in recent decades. On the one hand they are definitely bringing us closer to each other, but are they really bringing us <em>together</em>?</p>
<p>- Ryszard Kapuściński, <em><a href="http://www.guardian.co.uk/books/2008/nov/16/ryszard-kapuscinski-review-books">The Other</a></em></p></blockquote>
<p>I am sensitive to the idea of <a href="http://www.ted.com/talks/eli_pariser_beware_online_filter_bubbles.html">filter bubbles</a> and <a href="http://gnovisjournal.org/2009/12/22/does-habermas-understand-internet-algorithmic-construction-blogopublic-sphere/">concerns</a> about the fragmentation of media, the worry that the personalization of information will create a series of insular and homogenous communities, but I cannot abide the implied nostalgia for the broadcast era. I do not see how one-size-fits-all media can ever serve a diverse and specialized society, and so: let a million micro-cultures bloom! But I do see a need for powerful unifying forces within the public sphere, because everything from keeping a park clean to tackling global climate change requires the agreement and cooperation of a community.</p>
<p>We have long had decision making systems at all scales &#8212; from the neighborhood to the United Nations &#8212; and these mechanisms span a range from very lightweight and informal to global and ritualized. In many cases decision-making is built upon voting, with some majority required to pass, such as 51% or 66%. But is a vicious, hard-fought 51% in a <a href="http://www.csupomona.edu/~smemerson/business318/AbramCulWarMythVSFIORINA.pdf">polarized</a> society really the best we can do? And what about all the issues that we will not be voting on &#8212; that is to say, most of them?</p>
<p>Unfortunately, getting agreement among even very moderate numbers of people seems phenomenally difficult. People disagree about methods, but in a pluralistic society they often disagree even more strongly about goals. Sometimes presenting all sides with credible information is enough, but strongly held disagreements usually cannot be resolved by shared facts; experimental work shows that, in many circumstances, <a href="http://en.wikipedia.org/wiki/Attitude_polarization">polarization deepens with more information</a>. This is the painful truth that blows a hole in ideas like &#8220;informed public&#8221; and &#8220;deliberative democracy.&#8221;</p>
<p>Something else is needed here. I want to bring the field of <a href="http://en.wikipedia.org/wiki/Conflict_resolution">conflict resolution</a> into the digital public sphere. As a named pursuit with its own literature and community, this is a young subject, really only begun after World War II. I love the field, but it&#8217;s in its infancy; I think it&#8217;s safe to say that we really don&#8217;t know very much about how to help groups with incompatible values find acceptable common solutions. We know even less about how to do this in an online setting.</p>
<p>But we can say for sure that &#8220;moderator&#8221; is an important role in the digital public sphere. This is old-school internet culture, dating back to the pre-web Usenet days, and we have evolved very many tools for keeping online discussions well-ordered, from classic comment moderation to collaborative filtering, reputation systems, online polls, and various other <a href="http://www.codinghorror.com/blog/2011/06/suspension-ban-or-hellban.html">tricks</a>. At the edges, moderation turns into conflict resolution, and there are tools for this too. I&#8217;m particularly intrigued by <a href="http://www.nytimes.com/interactive/2011/11/09/us/ows-grid.html">visualizations</a> that show where a community agrees or disagrees along multiple axes, because the conceptually similar process of &#8220;<a href="http://www.peacepolls.org/cgi-bin/default?section=about">peace polls</a>&#8221; has had some success in real-world conflict situations such as Northern Ireland. I bet we could also learn from the arduously evolved <a href="http://en.wikipedia.org/wiki/Wikipedia:Dispute_resolution">dispute resolution</a> processes of Wikipedia.</p>
<p>It seems to me that the ideal of legitimate community decision making is consensus, 100% agreement. This is very difficult, another unreachable goal, but we could define a scale from 51% agreement to 100%, and say that the goal is  &#8221;as consensus as possible&#8221; decision making, which would also be &#8220;as legitimate as possible.&#8221; With this sort of metric &#8212; and always remembering that the goal is to reach a decision on a collective action, not to make people agree for the sake of it &#8212; we could undertake a systematic study of online consensus formation. For any given community, for any given issue, how fragmented is the discourse? Do people with different opinions hang out in different places online? Can we document examples of successful and unsuccessful online consensus formation, as has been done in the <a href="http://rhizomenetwork.wordpress.com/2011/06/18/a-brief-history-of-consenus-decision-making/">offline case</a>? What role do human moderators play, and how can well-designed social software contribute? How do the processes of online agreement and disagreement play out at different <a href="http://en.wikipedia.org/wiki/Dunbar's_number">scales</a> and under different circumstances? How we do know when the process has converged to a &#8220;good&#8221; answer, and when it has degraded into hegemony or <a href="http://en.wikipedia.org/wiki/Groupthink">groupthink</a>? These are mostly unexplored questions. Fortunately, there&#8217;s a huge amount of related work to draw on: voting systems and <a href="http://en.wikipedia.org/wiki/Public_choice_theory">public choice theory</a>, social network analysis, cognitive psychology, information flow and <a href="http://www.ethanzuckerman.com/blog/2011/11/07/mapping-media-ecosystems-at-center-for-civic-media/">media ecosystems</a>, social software design, issues of identity and culture, language and semiotics, epistemology&#8230;</p>
<p>I would like conflict resolution to be an explicit goal of our media platforms and processes, because we cannot afford to be polarized and grid-locked while there are important collective problems to be solved. We may have lost the unifying narrative of the front page, but that narrative was neither comprehensive nor inclusive: it didn&#8217;t always address the problems of concern to me, nor did it ask me what I thought. Effective collective action, at all relevant scales, seems a better and more concrete goal than &#8220;shared narrative.&#8221; It is also an exceptionally hard problem &#8212; in some ways it is the problem of democracy itself &#8212; but there&#8217;s lots to try, and our public sphere must be designed to support this.</p>
<p><strong>Why now?</strong><br />
I began writing this essay because I wanted to say something very simple: all of these things &#8212; journalism, search engines, Wikipedia, social media and the lot &#8212; have to work together to common ends. There is today no one profession which encompasses the entirety of the public sphere. Journalism used to be the primary bearer of these responsibilities &#8212; or perhaps that was a well-meaning illusion sprung from near monopolies on mass information distribution channels. Either way, that era is now approaching two decades gone. Now what we have is an ecosystem, and in true networked fashion there may not ever again be a central authority. From algorithm designers to dedicated curators to, yes, traditional on-the-scene pro journalists, a great many people in different fields now have a part in shaping the digital public sphere. I wanted try to understand what all of us are working toward. I hope that I have at least articulated goals that we can agree are important.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://jonathanstray.com/what-should-the-digital-public-sphere-do/feed</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>Journalism for makers</title>
		<link>http://jonathanstray.com/journalism-for-makers</link>
		<comments>http://jonathanstray.com/journalism-for-makers#comments</comments>
		<pubDate>Thu, 22 Sep 2011 20:56:59 +0000</pubDate>
		<dc:creator>Jonathan Stray</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[financial crisis]]></category>
		<category><![CDATA[journalism]]></category>
		<category><![CDATA[making]]></category>

		<guid isPermaLink="false">http://jonathanstray.com/?p=3197</guid>
		<description><![CDATA[I find myself wondering what it would take to fix the global financial system, but most financial journalism doesn&#8217;t help me to answer this question. Something seems wrong here. The modern world is built on a series of vast systems, intricate combinations of people and machines, but our journalism isn&#8217;t really built to help us [...]]]></description>
			<content:encoded><![CDATA[<p>I find myself wondering what it would take to fix the global financial system, but most financial journalism doesn&#8217;t help me to answer this question. Something seems wrong here. The modern world is built on a series of vast systems, intricate combinations of people and machines, but our journalism isn&#8217;t really built to help us understand them. It&#8217;s not a journalism for the people who will put together the next generation of civic institutions.</p>
<p>My friend Maha Atal &#8211; whose <a href="http://www.forbes.com/sites/mahaatal/2011/08/24/it-takes-courage-christine-lagarde/">profile</a> of incoming IMF chief Christine Lagarde recently graced the cover of Forbes &#8211; tells me there are two histories of financial journalism, two types that have been practiced since the dawn of newsprint. One tradition began with lists of market prices and insurance rates and evolved into the financial data services and newswires we have today, a journalism of utility for people who want to make money. The other tradition she called &#8220;muckraking,&#8221; a journalism which interests itself in shady deals, insider trading, and undue influence. It looks for hypocrisy and offenses against the interests of the broader public.</p>
<p>Service to the status quo, and zealous suspicion of power. Are these really the only two stands that a journalist can take? When I imagine the global financial system improving, actually improving in the sense of changing in a way that makes the lives of very many people better &#8212; say, becoming less prone to the sort of systemic collapse that puts <a href="http://www.ilo.org/wcmsp5/groups/public/---dgreports/---dcomm/---publ/documents/publication/wcms_150440.pdf">tens of millions out of work</a> &#8212; I don&#8217;t see it much assisted by either of these approaches to reporting, necessary though they might be.</p>
<p>The financial system is just that: a system, sprawling, messy, very complex, built of people and laws and machines. It serves a great many ends, both humane and monstrously avaricious. It won&#8217;t be much improved by forcing a few traders to resign in disgrace, or focusing public fury on the bonuses of bank executives (which, obscene though they may be, remain just a <a href="http://xkcd.com/558/">drop in the bucket</a>.) It seems rather that improvement will require international agreement on arcane solutions both political and technical, things like risk models, capital reserve requirements, and trading platforms. This is regulation both in the sense of law and in the sense of <a href="http://harvardmagazine.com/2000/01/code-is-law.html">code-as-law</a>, because software is a deep part of the infrastructure of modern finance. Markets don&#8217;t just happen; they are human desires channeled by what we have agreed to allow, and by what our technology has been built to support. Markets are designed things.</p>
<p>So maybe what we need are designers. Geeks who like to understand very complex systems, and tinker with them. I want to borrow from the culture of &#8220;makers,&#8221; because maker culture plants a flag on this idea. It draws on the hacker tradition of technical mastery, the DIY aesthetic perfected by the punks, and the best disruptive tendencies of global counter-culture. It lives in online forums and nerdy meetups and on the dingy couches of hack spaces. This is the chaotic ecosystem that powers Silicon Valley, and I bet it&#8217;s the secret ingredient that government planners miss when they build huge technology parks that <a href="http://www.lasvegassun.com/news/2009/jan/25/research-park-running-through-cash-still-empty/">end</a> <a href="http://www.technologyreview.com/biomedicine/13953/">up</a> <a href="http://www.nctimes.com/news/local/escondido/article_f6868c7b-1f2e-5e71-9428-5e8b2b538189.html">empty</a>.</p>
<p>But most of all, makers are deeply participatory. Where the political activist sees persuasion as the ultimate goal, the maker wants to personally rewire the system. This requires a deep love of the inner workings of things, the finicky, empirical details of how the real world is put together. A maker combines the democratic instinct with the technologist&#8217;s hands-on ability. And increasingly, makers are directing their attention to social problems. Efforts such as <a href="http://crisismappers.net/">crisis mapping</a> and <a href="http://codeforamerica.org/">Code For America</a> and the whole information and communication technologies for development (<a href="http://en.wikipedia.org/wiki/Information_and_communication_technologies_for_development">ICT4D</a>) movement are evidence of this. Maker language has recently been spotted at the <a href="http://radar.oreilly.com/2010/10/innovation-education-and-the-m.html">White House</a> and the <a href="http://www.unglobalpulse.org/blog/un-secretary-general-appeals-global-open-source-community">United Nations</a>.</p>
<p>The global financial system is just the sort of complex, intricate, part technical and part social system that makers would love, if only they could open it up and look inside. There are textbooks, but you can&#8217;t learn how the world actually works from textbooks. What would it take to open the global financial system to independent minds? Because it will be these independent minds &#8212; smart, deeply informed, creative &#8212; who will pore over the arcania of today in order to conceive of the better worlds to come.</p>
<p>Consider the latest draft of the <a href="http://en.wikipedia.org/wiki/Basel_III">Basel III standards</a> for international banking. Who reads such dense and technical stuff? The professional regulator is obliged to sit at their desk and study this document. The financier wants only to understand how these rules will make or cost them money. The muckraker might ask who is making the rules and why. Another journalist will look for headlines of broad interest, but almost certainly won&#8217;t have the technical background to trace the subtle implications. But a maker would read these standards because they are changes in the operating system of global finance. And of these, it might be the maker, the specialized outsider, who is most qualified to understand the detailed, systemic effects on <em>everyone else</em>. The systems that underlie finance have become so fast and so complex that <a href="http://www.wired.com/magazine/2010/12/ff_ai_flashtrading/all/1">we don&#8217;t really understand the interactions</a>. The people who know it best are mostly too busy making money to explain it to the rest of us. The public interest is in dire need of geeks who are not on the payroll.</p>
<p>There is a journalism to be done here, but it&#8217;s not the journalism of making people money, penning morality tales, or interesting articles in the Sunday paper. It&#8217;s a techno-social investigative journalism for those who have chosen to use their specialized knowledge in the interests of the rest of us. It&#8217;s a journalism that generalist reporters may be ill equipped to do.</p>
<p>We already have models for this. <a href="http://dowser.org">Dowser.org</a> practices &#8220;solutions journalism,&#8221; writing about how best to solve societal problems. I appreciate that, but I don&#8217;t think they&#8217;ve conceived of their audience as the policy and technology geeks who will one day flesh out and implement those solutions. The contemporary science journalism ecosystem might be a better example. There are science reporters at news organizations, but the best science reporting now tends to come from elsewhere. Science, like finance, is absurdly specialized, and so its chronicling has been taken over by networks of specialists &#8212; very often scientists themselves, the ones who have learned to write. Science blogging is <a href="http://www.cjr.org/the_observatory/the_hottest_thing_in_science_b.php?page=all">thriving</a>. Its audience is the general public, yes, but also other scientists, because it&#8217;s the real thing. Even better, science writing exists in a web of knowledge: you can follow the links and go arbitrarily deep into the original research papers. And if you still have questions, the experts are already active online. Compare this to the experience of reading an economics article in the paper.</p>
<p>We don&#8217;t have much truly excellent journalism on deep, intricate topics, issues with enormous technical and institutional complexity. There&#8217;s some, but it&#8217;s mostly in trade publications with little sense of the social good, or tucked away in expensive journals which speak to us in grown-up tones and don&#8217;t know how to listen for the questions of the uninitiated. And yet our world abounds in complex problems! Sustainability, climate change, and energy production. Security, justice, and the delicate tradeoffs of civic freedoms. Health care for the individual, and for entire countries. The policies of government from the international to the municipal. And governments themselves, in all their gruesome operational detail. These things are not toys. But when journalists write about such issues, they satisfy themselves with discovering some flavor of corruption, or they end up removing so much of the substance that readers cannot hope to make a meaningful contribution. Perhaps this is because it has always been assumed that there is no audience for wonkish depth. And perhaps that&#8217;s true. Perhaps there won&#8217;t ever be a &#8220;mainstream&#8221; audience for this type of reporting, because the journalism of makers is directed to those who have some strange, burning desire to know the gory details, and are willing to invest years of their life acquiring background knowledge and building relationships. Can we not help these people? Could we encourage more of them to exist, if we served them better?</p>
<p>This is a departure from the broadcast-era idea of &#8220;the public.&#8221; It gives up on the <a href="http://www.nytimes.com/2010/04/25/books/review/Keller-t.html?pagewanted=all">romantic notion</a> of great common narratives and tries instead to serve particular pieces of the vast <a href="jonathanstray.com/visualizing-communities">mosaic of communities</a> that comprise a society. But we are learning that when done well, this kind of deep, specialist journalism can strike surprising chords in a global population that is more educated than it has ever been. And the internet is very, very good at routing niche information to communities of interest. We have the data to show this. As Atlantic editor Alexis Madrigal <a href="http://www.theatlantic.com/technology/archive/2011/08/the-impact-of-next-generation-data-on-the-practice-of-journalism/242870/">put it</a>, &#8220;I love analytics because I owe them my ability to write weird stories on the Internet.&#8221;</p>
<p>Where is the journalism for the idealist doer with a burning curiosity? I don&#8217;t think we have much right now, but we can imagine what it could be. The journalism of makers aligns itself with the tiny hotbeds of knowledge and practice where great things emerge, the nascent communities of change. Its aim is a deep understanding of the complex systems of the real world, so that plans for a better world may constructed one piece at a time by people who really know what they&#8217;re talking about. It never takes itself too seriously, because it knows that play is necessary for exploration and that a better understanding will come along tomorrow. It serves the talent pools that give rise to the people who are going to do the work of bringing us a potentially better world &#8212; regardless of where in society these people may be found, and whether or not they are already within existing systems of power. This is a theory of civic participation based on empowering the people who like to get their hands dirty tinkering with the future. Maybe that&#8217;s every bit as important as informing voters or getting politicians fired.</p>
]]></content:encoded>
			<wfw:commentRss>http://jonathanstray.com/journalism-for-makers/feed</wfw:commentRss>
		<slash:comments>21</slash:comments>
		</item>
		<item>
		<title>The new structure of stories: a reading list</title>
		<link>http://jonathanstray.com/the-new-structure-of-stories-a-reading-list</link>
		<comments>http://jonathanstray.com/the-new-structure-of-stories-a-reading-list#comments</comments>
		<pubDate>Tue, 26 Jul 2011 17:50:29 +0000</pubDate>
		<dc:creator>Jonathan Stray</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[inter]]></category>
		<category><![CDATA[journalism]]></category>
		<category><![CDATA[linked data]]></category>
		<category><![CDATA[storytelling]]></category>

		<guid isPermaLink="false">http://jonathanstray.com/?p=3091</guid>
		<description><![CDATA[Different medium, different story form. It&#8217;s clear that each new technology &#8212; photography, radio, television &#8212; has brought with it different ways of constructing a narrative, and different ways those narratives fit into the audience&#8217;s lives. Online media are no different, and the step between analog and digital is in many ways much larger than any [...]]]></description>
			<content:encoded><![CDATA[<div>
<p>Different medium, different story form. It&#8217;s clear that each new technology &#8212; photography, radio, television &#8212; has brought with it different ways of constructing a narrative, and different ways those narratives fit into the audience&#8217;s lives. Online media are no different, and the step between analog and digital is in many ways much larger than any that has come before, because the internet connects the audience to each other as well as to the newsroom.</p>
<p>Here&#8217;s my attempt at a little reading list of recent work on the structure of stories. Pointers to additional material are welcome!</p>
<p><strong>The Big Picture</strong><br />
What&#8217;s wrong with the article, anyway? Jeff Jarvis explores this question in &#8220;<a href="http://www.buzzmachine.com/2011/05/28/the-article-as-luxury-or-byproduct/">The article as luxury or by-product</a>.&#8221; This essay provoked lots of interesting reaction, such as from <a href="http://gigaom.com/2011/05/29/no-twitter-is-not-a-replacement-for-journalism/">Mathew Ingram</a>.</p>
<p>So how do we understand ways to fix this? Vadim Lavrusik <a href="http://www.niemanlab.org/2011/07/vadim-lavrusik-five-key-building-blocks-to-incorporate-as-were-rethinking-the-structure-of-stories/">takes a shot</a> at this question and comes up with the building blocks of context, social, personalization, mobile, participation. It&#8217;s a good taxonomy so I&#8217;m going to partially steal it for this post.</p>
<p>At the  NYC Hacks Hackers <a href="http://meetupnyc.hackshackers.com/events/25021511/">meetup</a> last week, Trei Brundrett took us through SB Nation&#8217;s &#8220;story stream&#8221; product, and Gideon Lichfield of The Economist gave a really nice run through of the &#8220;news thing&#8221; concept that was fleshed-out collaboratively last month at Spark Camp by Gideon, Matt Thompson, and a room full of others. Very meaty, detailed, up-to-the-minute discussions, for serious news nerds. Video <a rel="nofollow" href="http://www.ustream.tv/recorded/16134902">here</a>.</p>
<p><strong>Context</strong><br />
You just can&#8217;t do better than Matt Thompson&#8217;s &#8220;<a href="http://www.nieman.harvard.edu/reportsitem.aspx?id=101886">An antidote for web overload</a>.&#8221; I also recommend Matt&#8217;s wonderful &#8220;<a href="http://www.poynter.org/latest-news/top-stories/97913/the-three-key-parts-of-news-stories-that-are-usually-missing/">The three key parts of news stories that are usually missing</a>.&#8221; Another good primer is Jay Rosen&#8217;s &#8220;<a href="http://pressthink.org/2010/03/news-without-the-narrative-needed-to-make-sense-of-the-news-what-i-will-say-at-south-by-southwest/">Future of Context</a>&#8221; talk at SXSW.</p>
<p>See also my &#8220;<a href="/short-doesnt-mean-shallow">Short doesn&#8217;t mean shallow</a>,&#8221; about hyperlinks as a contextual storytelling form.</p>
<p>For an example of these ideas in action, consider Mother Jone&#8217;s <a href="http://www.niemanlab.org/2011/01/mojos-egypt-explainer-future-of-context-ideas-in-action/">Egypt Explainer page</a> &#8212; which Gideon Lichfield critiques in the video linked above.</p>
<p><strong>Social</strong><br />
What does it mean for news to be social anyway? Henry Jenkins argues for the power of &#8220;<a href="http://www.niemanlab.org/2010/11/why-spreadable-doesnt-equal-viral-a-conversation-with-henry-jenkins/">spreadable media</a>&#8221; as a new distribution model.</p>
<p>In &#8220;<a href="http://jonathanstray.com/whats-the-point-of-social-news">What&#8217;s the point of social news</a>?&#8221; I discuss two areas where social media have a huge impact on news: the use of social networks as a personalized filter, and distributed sourcing of tips and material.</p>
<p><strong>Personalization</strong><br />
News is now personalized by a variety of filters, both social and algorithmic. Eli Pariser argues this puts us in a &#8220;<a href="http://www.niemanlab.org/2011/06/eli-pariser-how-do-we-recreate-a-front-page-ethos-for-a-digital-world/">filter bubble</a>.&#8221; He may be right, but research by <a href="http://www.journalism.org/node/7493">Pew</a> and others [<a title="1" href="http://jonathanstray.com/what-is-news-when-the-audience-is-editor">1</a>,<a href="http://jonathanstray.com/papers/divergent%20online%20news%20preferences%20of%20journalists%20and%20readers.pdf">2</a>] consistently shows that when users are allowed to recommend any URL to one other, the &#8220;news agenda&#8221; that the audience constructs has only 5%-30% of stories in common with mainstream media.</p>
<p>A <a href="http://www.boston.com/news/politics/specials/tweets_for_obama/">comparison</a> of questions asked of the White House by a Twitter audience vs. by journalists shows a remarkable difference in focus.  All of this this suggests to me that whatever else is happening, personalization meets an audience need that traditional broadcast journalism does not.</p>
<p>Besides, maybe not every person needs to see every story, if we view the goal of <a href="http://jonathanstray.com/designing-journalism-to-be-used">journalism as empowerment</a>.</p>
<p><strong>Participation</strong><br />
What do we know and what don&#8217;t we know about public participation in the journalism project, and what has worked or failed so far? Jay Rosen has an <a href="http://pressthink.org/2011/06/from-write-us-a-post-to-fill-out-this-form-progress-in-pro-am-journalism/">invaluable summary</a>.</p>
<p>I also recommend <a href="http://www.niemanlab.org/2009/03/five-tips-for-citizen-journalism-from-propublicas-new-crowdsorcerer/">the work of Amanda Michel</a> as someone who does crowd-based reporting every day, and my own speculations on <a href="http://jonathanstray.com/the-challenges-of-distributed-investigative-journalism">distributed investigative reporting</a>.</p>
<p><strong>Structured information</strong><br />
Is the product of journalism narratives or (potentially machine-readable) facts? Adrian Holovaty seems to be the first to have explored this in his 2006 essay &#8220;<a href="http://www.holovaty.com/writing/fundamental-change/">A fundamental way newspaper websites need to change</a>.&#8221; This mantle has been more recently taken up by Stijn Debrouwere in his “<a href="http://stdout.be/2010/information-architecture-for-news-websites/">Information Architecture for News Websites</a>” series, and in Reg Chua&#8217;s &#8220;<a href="http://structureofnews.wordpress.com/structured-journalism/">structured journalism</a>,&#8221; and in a wide-ranging series at <a href="http://xark.typepad.com/my_weblog/2011/01/standards-based-journalism-in-a-semantic-economy.html">Xark</a>.</p>
<p>There are close connections here to semantic web efforts, and occasional <a href="http://dev.iptc.org/rNews-Video-Tutorials">overlap</a> between the semweb and journalism communities.</p>
<p><strong>Mobile</strong><br />
I haven&#8217;t seen any truly good roundup posts on what mobile will mean for news story form, but there are some bits and pieces. Mobile is by-definition location aware, and Mathew Ingram examines <a href="http://gigaom.com/2011/05/13/google-adds-news-near-you-newspapers-still-nowhere/">how location is well used by Google News</a> (and not by newsrooms.)</p>
<p>Meanwhile, Zach Seward of the Wall Street Journal has done some interesting <a href="http://www.niemanlab.org/2010/05/location-location-etc-what-does-the-wsj%E2%80%99s-foursquare-check-in-say-about-the-future-of-location-in-news/">news-related things with Foursquare</a>.</p>
<p><strong>Real time</strong><br />
Emily Bell, formerly of the Guardian and now at Columbia, explains why <a href="http://emilybellwether.wordpress.com/2011/05/02/real-time-all-the-time-why-every-news-organisation-has-to-be-live/">every news organization needs to be real-time</a>.</p>
<p>For a granular look at how informations spreads in real time, consider Mathew Ingram on &#8220;<a href="http://gigaom.com/2011/05/02/osama-bin-laden-and-the-new-ecosystem-of-news/">Osama bin Laden and the new ecosystem of news</a>.&#8221; For a case study of real-time mobile reporting, we have Brian Stelter&#8217;s &#8220;<a href="http://thedeadline.tumblr.com/post/5904630983/what-i-learned-in-joplin">What I learned in Joplin</a>.&#8221;</p>
</div>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://jonathanstray.com/the-new-structure-of-stories-a-reading-list/feed</wfw:commentRss>
		<slash:comments>18</slash:comments>
		</item>
		<item>
		<title>A job posting that really doesn&#8217;t suck</title>
		<link>http://jonathanstray.com/a-job-posting-that-really-doesnt-suck</link>
		<comments>http://jonathanstray.com/a-job-posting-that-really-doesnt-suck#comments</comments>
		<pubDate>Fri, 08 Jul 2011 21:04:57 +0000</pubDate>
		<dc:creator>Jonathan Stray</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[computational journalism]]></category>
		<category><![CDATA[computational linguistics]]></category>
		<category><![CDATA[journalism]]></category>
		<category><![CDATA[overview]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://jonathanstray.com/?p=3070</guid>
		<description><![CDATA[I just got a pile of money to build a piece of state-of-the-art open-source visualization software, to allow journalists and curious people everywhere to make sense of enormous document dumps, leaked or otherwise. Huzzah! Now I am looking for a pair of professional developers to make it a reality. It won&#8217;t be hard for the [...]]]></description>
			<content:encoded><![CDATA[<p>I just got a pile of money to build a piece of state-of-the-art open-source visualization software, to allow journalists and curious people everywhere to make sense of enormous document dumps, leaked or otherwise.</p>
<p>Huzzah!</p>
<p>Now I am looking for a pair of professional developers to make it a reality. It won&#8217;t be hard for the calibre of person I&#8217;m trying to find to get <em>some</em> job, but I&#8217;m going to try to convince you that this is the <em>best</em> job.</p>
<p>The project is called Overview. You can read about it at <a href="http://overview.ap.org">overview.ap.org</a>. It&#8217;s going to be a system for the exploration of large to very large collections of unstructured text documents. We&#8217;re building it in New York in the main newsroom of The Associated Press, the original all-formats global news network. The AP has to deal with document dumps constantly. We download them from government sites. We file over 1000 freedom of information requests each year. We look at <em>every single leak</em> from Wikileaks, Anonymous, Lulzsec. We&#8217;re drowning in this stuff. We need better tools. So does everyone else.</p>
<p>So we&#8217;re going make the killer app for document set analysis. Overview will start with a visual programming language for computational linguistics algorithms. Like Max/MSP for text. The output of that will be connected to some large-scale visualization. All of this will be backed by a distributed file store and computed through map-reduce. Our target document set size is 10 million. The goal is to design a sort of visualization sketching system for large unstructured text document sets. Kinda like <a href="http://processing.org/">Processing</a>, maybe, but data-flow instead of procedural.</p>
<p>We&#8217;ve already got a prototype working, which we pointed at the Wikileaks Iraq and Afghanistan data sets and learned some <a href="http://overview.ap.org/blog/2010/12/a-full-text-visualization-of-the-iraq-war-logs/">interesting things</a>. Now we have to engineer an industrial-strength open-source product. It&#8217;s a challenging project, because it requires production implementation of state-of-the-art, research-level algorithms for distributed computing, statistical natural language processing, and high-throughput visualization. And, oh yeah, a web interface. So people can use it anywhere, to understand their world.</p>
<p>Because that&#8217;s what this is about: a step in the direction of applied transparency. Journalists badly need this tool. But everyone else needs it too. Transparency is not an end in itself &#8212; it&#8217;s what you can do with the data that counts. And right now, we suck at making sense of piles of documents. Have you ever looked at what comes back from a FOIA request? It&#8217;s not pretty. Governments have to give you the documents, but they don&#8217;t have to organize them. What you typically get is a 10,000 page PDF. Emails mixed in with meeting minutes and financial statements and god-knows what else. It&#8217;s like being let into a decrepit warehouse with paper stacked floor to ceiling. No boxes. No files. Good luck, kiddo.</p>
<p>Intelligence agencies have the necessary technology, but you can&#8217;t have it. The legal profession has some pretty good &#8220;e-discovery&#8221; software, but it&#8217;s wildly expensive. Law enforcement won&#8217;t share either. There are a few cheapish commercial products but they all choke above 10,000 documents because they&#8217;re not written with scalable, distributed algorithms. (Ask me how I know.) There simply isn&#8217;t an open, extensible tool for making sense of huge quantities of unstructured text. Not <em>searching</em> it, but finding the patterns you didn&#8217;t know you were looking for. The big picture. The Overview.</p>
<p>So we&#8217;re making one. Here are the buzzwords we are looking for in potential hires:</p>
<ul>
<li>We&#8217;re writing this in Java or maybe Scala. Plus JavaScript/WebGL on the client side.</li>
<li>Be a genuine computer scientist, or at least be able to act like one. Know the technologies above, and know your math.</li>
<li>But it&#8217;s not just research. We have to ship production software. So be someone who has done that, on a big project.</li>
<li>This stuff is complicated! The UX has to make it simple for the user. Design, design, design!</li>
<li>We&#8217;re open-source. I know you&#8217;re cool with that, but are you good at leading a distributed development community?</li>
</ul>
<p>And that&#8217;s pretty much it. We&#8217;re hiring immediately. We need two. It&#8217;s a two-year contract to start. We&#8217;ve got a pair of desks in the newsroom in New York, with really nice views of the Hudson river. Yeah, you could write high-frequency trading software for a hedge fund. Or you could spend your time analyzing consumer data and trying to get people to click on ads. You could code any of a thousand other sophisticated projects. But I bet you&#8217;d rather work on Overview, because what we&#8217;re making has never been done before. And it will make the world a better place.</p>
<p>For more information, see :</p>
<ul>
<li>Writeups in <a href="http://www.niemanlab.org/2011/06/the-drupal-of-dataviz-overview-aps-news-challenge-winner-wants-to-make-sense-of-big-document-sets/">Nieman Journalism Lab</a>, <a href="http://radar.oreilly.com/2011/07/data-journalism-tools-newsroom-stack.html">O&#8217;Reilly Radar</a>, <a href="http://www.journalism.co.uk/news-features/-knc-q-a-with-ap-s-interactive-technology-editor-on-data-journalism-tool/s5/a544887/">Journalism.co.uk</a></li>
<li>Video of a talk and <a href="http://overview.ap.org/blog/2011/06/investigating-thousands-or-millions-of-documents-by-visualizing-clusters/">live demo</a> of the prototype.</li>
<li>The official <a href="http://overview.ap.org/blog/2011/06/overview-is-hiring/">job posting</a>.</li>
</ul>
<p>Thanks for your time. Please contact jstray@ap.org if you&#8217;d like to work on this.</p>
]]></content:encoded>
			<wfw:commentRss>http://jonathanstray.com/a-job-posting-that-really-doesnt-suck/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>The challenges of distributed investigative journalism</title>
		<link>http://jonathanstray.com/the-challenges-of-distributed-investigative-journalism</link>
		<comments>http://jonathanstray.com/the-challenges-of-distributed-investigative-journalism#comments</comments>
		<pubDate>Wed, 25 May 2011 20:26:05 +0000</pubDate>
		<dc:creator>Jonathan Stray</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[intelligence]]></category>
		<category><![CDATA[investigation]]></category>
		<category><![CDATA[journalism]]></category>
		<category><![CDATA[social news]]></category>
		<category><![CDATA[social software]]></category>

		<guid isPermaLink="false">http://jonathanstray.com/?p=3017</guid>
		<description><![CDATA[One of the clearest ideas to emerge from the excitement around the new media transformation of journalism is the notion that the audience should participate in the process. This two way street has been nicely described by Guardian editor Alan Rusbridger as the &#8220;mutualization of journalism.&#8221; But how to do it? What&#8217;s missing from what has been [...]]]></description>
			<content:encoded><![CDATA[<p>One of the clearest ideas to emerge from the excitement around the new media transformation of journalism is the notion that the audience should participate in the process. This two way street has been nicely described by Guardian editor Alan Rusbridger as the &#8220;<a href="http://www.guardian.co.uk/sustainability/report-mutualisation-citizen-journalism">mutualization of journalism</a>.&#8221; But how to do it? What&#8217;s missing from what has been tried so far? Despite many experiments, the territory is still so unexplored that it&#8217;s almost impossible to say what will work without trying it. With that caveat, here are some more or less wild speculations about the sorts of tools that &#8220;open&#8221; investigative journalism might need to work.</p>
<p>There have been many <a href="http://jmsc.hku.hk/blogs/internetstrategy/archives/45">collaborative journalism projects</a>, from the Huffington Post&#8217;s landmark &#8220;<a href="http://www.huffingtonpost.com/arianna-huffington/offthebus-huffposts-citiz_b_52712.html">Off The Bus</a>&#8221; election campaign coverage to the BBC&#8217;s sophisticated &#8220;<a href="http://www.niemanlab.org/2010/05/drawing-out-the-audience-inside-bbc%E2%80%99s-user-generated-content-hub/">user-generated content hub</a>&#8221; to CNN&#8217;s <a href="http://www.niemanlab.org/2011/03/how-cnns-ireport-enhanced-the-networks-coverage-of-the-japan-earthquake-and-its-aftermath/">iReport</a>. One lesson in all of this is that form matters. Take the lowly comment section. News site owners have long <a href="http://www.mediabistro.com/10000words/a-reporters-view-on-the-news-industrys-broken-commenting-system_b4097">complained</a>, often with good reason, that comments are a mess of trolls and flame wars. But the <a href="http://www.poynter.org/latest-news/top-stories/106766/5-ways-to-get-people-to-contribute-good-content-for-your-site/">prompt is supremely important</a> in asking for online collaboration. Do journalists really want &#8220;comments&#8221;? Or do they want <a href="http://jonathanstray.com/measuring-and-increasing-accuracy-in-journalism">error corrections</a>, smart additions, leads, and evidence that furthers the story?</p>
<p>Which leads me to investigative reporting. It&#8217;s considered a specialty within professional journalism, dedicated to getting answers to difficult questions &#8212; often answers that are embarrassing to those in power. I don&#8217;t claim to be very good at journalistic investigations, but I&#8217;ve done enough reporting to understand the basics. Investigative reporting is as much about convincing a source to talk as it is about filing a FOIA request, or running a statistical analysis on a government data feed. At heart, it seems to be a process of assembling widely dispersed pieces of information &#8212; connecting the distributed dots. Sounds like a perfect opportunity for collaborative work. How could we support that?</p>
<p><strong>A system for tracking what&#8217;s already known</strong><br />
Reporters keep notes. They have files. They write down what was said in conversations, or make recordings. They collect documents. All of this material is typically somewhere on or around a reporter&#8217;s desk or sitting on their computer. That means it&#8217;s not online, which means no one else can build on it. Even within the same newsroom, notes and source materials are seldom shared. We have long had customer relationship management systems that track every contact with a customer. Why not a &#8220;source relationship management&#8221; system that tracks every contact with every source by every reporter in the newsroom? Ideally, such a system would be integrated into the reporter&#8217;s communications tools: when I make a phone call and hit record (after getting the source&#8217;s permission of course) that recording could be automatically entered into system&#8217;s files, stamped by time, date, and source, then <a href="http://www.niemanlab.org/2011/05/pbs-plays-googles-word-game-transcribing-thousands-of-hours-of-video-into-crawler-friendly-text/">transcribed by machine</a> to make it searchable. Primary documents would be also be filed in the system, along with notes and links and comments from everyone working on the story. The entire story of the story could be in one place.</p>
<p>There have been experiments in collaborative journalistic files, such as <a href="http://www.niemanlab.org/2010/05/always-collaborate-say-hello-to-openfile-the-local-news-site-putting-those-new-media-maxims-to-the-test/">OpenFile.ca</a> or even <a href="http://www.nytimes.com/2010/06/26/us/26crying.html?_r=1">good local wikis</a>. But I don&#8217;t believe there has yet been a major professional newsroom which operated with open files. For that matter, I am not aware of this type of information filing system in existence anywhere in journalism, though I suspect it&#8217;s what intelligence services <a href="http://www.iarpa.gov/offices.html">do</a>.</p>
<p><strong>Public verification processes</strong><br />
Journalism aims to be &#8220;true,&#8221; a goal which requires <a href="http://stevebuttry.wordpress.com/2011/01/04/my-version-of-craig-silvermans-accuracy-checklist/">elaborate</a> <a href="http://www.bbc.co.uk/journalism/blog/2011/05/bbcsms-bbc-procedures-for-veri.shtml">verification processes</a>. But in every newsroom I&#8217;ve worked with, essential parts of the verification standards are not codified. &#8220;At least two sources&#8221; is a common maxim, but are there any situations where one is enough? For that matter, who counts as a definitive source? When is a conflict of interest serious enough to disqualify what someone is telling you? The answers to these questions and many more are a matter of professional practice and culture. This is confusing enough for a new reporter joining staff, let alone outsiders who might want to help.</p>
<p>Verification is necessarily contextual. Both the costs of verification and the consequences of being in error vary widely with circumstance, so journalists must make situational choices. How sure do we have to be before we say something is true, how do we measure that certainty, and what would it take to be more sure? Until this sort of nuanced guidance is made public, and the public is provided with experienced support to encourage good calls in complex or borderline cases, it won&#8217;t be possible to bring enthusiastic outsiders fully into the reporting process. They simply won&#8217;t know what&#8217;s expected of them, to be able to participate in the the production of a product to certain standards. Those standards depend on what accuracy/cost/speed tradeoffs best serve the communities that a newsroom writes for, which means that there is audience input here too.</p>
<p><strong>What is secret, or, who gets to participate?</strong><br />
Traditionally, a big investigative story is kept completely secret until it&#8217;s published. This is shifting, as <a href="http://www.propublica.org/article/editors-note-dollars-for-docs/single">some journalists</a> begin to view investigation as more of a <a href="http://www.buzzmachine.com/2006/07/05/networked-journalism/">process than a product</a>. However, you may not want the subject of an investigation to know what you already know. It might, for example, make your interview with a bank CEO tricky if they know you&#8217;ve already got the goods on them from a former employee. There are also off-the-record interviews, embargoed material, documents which cannot legally be published, and a multitude of concerns around the privacy rights of individuals. I agree with Jay Rosen when he <a href="http://pressthink.org/2010/12/from-judith-miller-to-julian-assange/">says</a> that &#8220;everything a journalist learns that he cannot tell the public alienates him from the public,&#8221; but that doesn&#8217;t mean that complete openness is the solution in all cases. There are complex tradeoffs here.</p>
<p>So access to at least some files must be controlled, for at least some period of time. Ok then &#8212; who gets to see what, when? Is there a private section that only staff can see and a public section for everyone else? Or, what about opening some files up to trusted outsiders? That might be a powerful way to extend investigations outside the boundaries of the newsroom, but it brings in all the classic problems of <a href="http://gigaom.com/2010/03/18/craig-newmark-on-the-webs-next-big-problem/">distributed trust</a>, and more generally, all the <a href="http://www.shirky.com/writings/group_enemy.html">issues</a> of &#8220;membership&#8221; in online communities. I can&#8217;t say I know any good answers. But because the open flow of information can be so dramatically productive, I&#8217;d prefer to start open and close down only where needed. In other words, probably the fastest way to learn what <em>truly</em> needs to be secret is to blow a few investigations when someone says something they shouldn&#8217;t have, then design processes and policies to minimize those failure modes.</p>
<p>There is also a professional cultural shift required here, towards <a href="http://en.wikipedia.org/wiki/Open_innovation">open collaboration</a>. Newsrooms don&#8217;t like to get scooped. Fair enough, but my answer to this is to ask what&#8217;s more important: being first, or collectively getting as much journalism done as possible?</p>
<p><strong>Safe places for dangerous hypotheses</strong><br />
Investigative journalism requires speculation. &#8220;What if?&#8221; the reporter must say, then go looking for evidence. (And equally, &#8220;what if not?&#8221; so as not to fall prey to <a href="http://en.wikipedia.org/wiki/Confirmation_bias">confirmation bias</a>.) Unfortunately, &#8220;what if the district attorney is a child molester?&#8221; is not a question that most news organizations can tolerate on their web site. In the worst case, the news organization could be sued for libel. How can we make a safe and civil space &#8212; both legally and culturally &#8212; for following speculative trains of thought about the wrongdoings of the powerful? One idea, which is probably a good idea for many reasons, is to have very explicit marking of what material is considered &#8220;confirmed,&#8221; &#8220;vetted,&#8221; &#8220;verified,&#8221; etc. and what material is not. For example, iReport has such an endorsement system. A report marked &#8220;verified&#8221; would of course have been vetted according to the public verification process. In the US, that marking plus <a href="http://www.niemanlab.org/2009/01/david-ardia-why-news-orgs-can-police-comments-and-not-get-sued/">CDA section 230</a> might solve the legal issues.</p>
<p><strong>A proposed design goal: maximum amplification of staff effort</strong><br />
There are very many possible stories, and very few paid journalists. The massive amplification of staff effort that community involvement can provide may be our only hope for getting the quantity and quality of journalism that we want. Consider, for example, Wikipedia. With a paid staff of about 35 they produce millions of near-real time topic pages in dozens of languages.</p>
<p>But this is also about the usability of the social software designed to facilitate collaborative investigations. We&#8217;ll know we have the design right when lots of people want to use it. Also: just how much and what types of journalism could volunteers produce collaboratively? To find out, we could try to get the audience to scale faster than newsroom staff size. To make that happen, communities of all descriptions would need to find the newsroom&#8217;s public interface a useful tool for uncovering new information about themselves even when very little staff time is available to help them. Perhaps the best way to design a platform for collaborative investigation would be to imagine it as encouraging and coordinating as many people as possible in the production of journalism in the broader society, with as few full time staff as possible. These staff would be experts in community management and information curation. I don&#8217;t believe that all types of journalism can be produced this way or that anything like a majority of people will contribute to the process of journalism. Likely, only a <a href="http://en.wikipedia.org/wiki/1%25_rule_(Internet_culture)">few percent</a> will. But helping the audience to inform itself on the topics of its choice on a mass scale sounds like civic empowerment to me, which I believe to be a fundamental goal of journalism.</p>
]]></content:encoded>
			<wfw:commentRss>http://jonathanstray.com/the-challenges-of-distributed-investigative-journalism/feed</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Measuring and improving accuracy in journalism</title>
		<link>http://jonathanstray.com/measuring-and-increasing-accuracy-in-journalism</link>
		<comments>http://jonathanstray.com/measuring-and-increasing-accuracy-in-journalism#comments</comments>
		<pubDate>Wed, 20 Apr 2011 20:40:53 +0000</pubDate>
		<dc:creator>Jonathan Stray</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[accuracy]]></category>
		<category><![CDATA[journalism]]></category>
		<category><![CDATA[metrics]]></category>
		<category><![CDATA[truth]]></category>

		<guid isPermaLink="false">http://jonathanstray.com/?p=2659</guid>
		<description><![CDATA[Professional journalism is supposed to be &#8220;factual,&#8221; &#8220;accurate,&#8221; or just plain true. Is it? Has news accuracy been getting better or worse in the last decade? How does it vary between news organizations, and how do other information sources rate? Is professional journalism more or less accurate than everything else on the internet? These all seem [...]]]></description>
			<content:encoded><![CDATA[<p>Professional journalism is supposed to be &#8220;factual,&#8221; &#8220;accurate,&#8221; or just plain true. Is it? Has news accuracy been getting better or worse in the last decade? How does it vary between news organizations, and how do other information sources rate? Is professional journalism more or less accurate than everything else on the internet? These all seem like important questions, so I&#8217;ve been poking around, trying to figure out what we know and don&#8217;t know about the accuracy of our news sources. Meanwhile, the online news corrections process continues to evolve, which gives us hope that the news will become more accurate in the future.</p>
<p>Accuracy is a hard thing to measure because it&#8217;s a hard thing to define. There are subjective and objective errors, and no standard way of determining whether a reported fact is true or false. But a small group of academics has been grappling with these questions since the early 20th century, and undertaking periodic news accuracy surveys. The results aren&#8217;t encouraging. The <a href="http://www.nieman.harvard.edu/reportsitem.aspx?id=101903">last big study</a> of mainstream reporting accuracy found errors (defined below) in 59% of 4,800 stories across 14 metro newspapers. This level of inaccuracy &#8212; where about one in every two articles contains an error &#8212; has persisted for as long as news accuracy has been studied, over seven decades now.</p>
<p>With the explosion of available information, more than ever it&#8217;s time to get serious about accuracy, about knowing which sources can be trusted. Fortunately, there are emerging techniques that might help us to measure media accuracy cheaply, and then increase it. We could continuously sample a news source&#8217;s output to produce ongoing accuracy estimates, and build social software to help the audience report and filter errors. Meticulously applied, this approach would give a measure of the accuracy of each information source, and a measure of the efficiency of their corrections process (currently only <a href="http://www.nieman.harvard.edu/reportsitem.aspx?id=101903">about 3% of all errors are corrected</a>.) The goal of any newsroom is to get the first number down and the second number up. I am tired of editorials proclaiming that a news organization is dedicated to the truth. That&#8217;s so easy to say that it&#8217;s meaningless. I want an accuracy process that gives us something more than a rosy feeling.</p>
<p>This is a long post, but there are lots of pretty pictures. Let&#8217;s begin with what we know about the problem.</p>
<p><span id="more-2659"></span></p>
<p><strong>An error in every other story</strong><br />
To measure news accuracy, you need a process for counting errors in published stories. This process has to be independent of the original reporting, otherwise you can&#8217;t learn anything about the story that the reporter didn&#8217;t already know. Real world reporting isn&#8217;t always clearly &#8220;right&#8221; or &#8220;wrong,&#8221; so it will often be hard to decide whether something is an error or not. But we&#8217;re not going for ultimate Truth here,  just a general way of measuring some important aspect of the idea we call &#8220;accuracy.&#8221; In practice it&#8217;s important that the error counting method is simple, clear and repeatable, so that you can compare error rates of different times and sources.</p>
<p>The first systematic measurements of media accuracy approached these problems by asking the story&#8217;s sources to find errors in finished articles. In 1936 <a href="http://www.regrettheerror.com/2008/04/28/in-a-way-it-is-surprising-that-we-do-not-make-more-mistakes/">Mitchell V. Charnley</a> of the University of Minnesota &#8221;mailed 1,000 news items clipped from three Minneapolis dailies to persons named in the stories, asking for their perceptions of inaccuracies,&#8221; according to a description in a later <a href="http://jonathanstray.com/papers/blankenburg.pdf">similar study</a> (Charnley&#8217;s original paper isn&#8217;t online, boo.) This method of asking the sources whether the story is correct has some obvious shortcomings, and sources have their own agendas. However, it&#8217;s good at detecting basic errors of fact, such as incorrect names, dates, places, figures, occupations, etc. And because (roughly) this same methodology has been used in media accuracy studies ever since then, it&#8217;s possible to produce (rough) comparisons of accuracy over time.</p>
<div style="margin: 20px;">
<div style="margin: auto; width: 80%;">
<table style="text-align: center;" border="1" cellspacing="0">
<tbody>
<tr>
<td><strong>Year</strong></td>
<td><strong>Investigator</strong></td>
<td><strong>Number of stories</strong></td>
<td><strong>Errors per story</strong></td>
<td><strong>Percent with errors</strong></td>
</tr>
<tr>
<td>1936</td>
<td><a href="http://www.regrettheerror.com/2008/04/28/in-a-way-it-is-surprising-that-we-do-not-make-more-mistakes/">Charnley</a></td>
<td>591</td>
<td>.77</td>
<td>46%</td>
</tr>
<tr>
<td>1965</td>
<td>Brown</td>
<td>143</td>
<td>.86</td>
<td>41%</td>
</tr>
<tr>
<td>1967</td>
<td>Berry</td>
<td>270</td>
<td>1.52</td>
<td>54%</td>
</tr>
<tr>
<td>1968</td>
<td><a href="http://jonathanstray.com/papers/blankenburg.pdf">Blankenburg</a></td>
<td>332</td>
<td>1.17</td>
<td>60%</td>
</tr>
<tr>
<td>1974</td>
<td>Marshall</td>
<td>267</td>
<td>1.12</td>
<td>52%</td>
</tr>
<tr>
<td>1980</td>
<td>Tillinghast</td>
<td>270</td>
<td>.91</td>
<td>47%</td>
</tr>
<tr>
<td>1999</td>
<td>Maier</td>
<td>286</td>
<td>1.13</td>
<td>55%</td>
</tr>
<tr>
<td>2005</td>
<td><a href="http://www.aejmc.org/_scholarship/research_use/jmcq/05fall/maier.pdf">Maier/Meyer</a></td>
<td>3,287</td>
<td>1.36</td>
<td>61%</td>
</tr>
</tbody>
</table>
</div>
</div>
<p>This table is from Scott R. Maier&#8217;s 2005 <a href="http://www.aejmc.org/_scholarship/research_use/jmcq/05fall/maier.pdf">paper</a> &#8220;Accuracy Matters: A Cross-Market Assessment of Newspaper Error and Credibility,&#8221; which is both the largest and most recent news accuracy survey, and absolutely required reading for anyone with an interest in the subject. Maier checked 4,800 consecutive articles across 14 American newspapers, and,</p>
<blockquote><p>This study’s central finding is sobering: More than 60% of local news and news feature stories in a cross-section of American daily newspapers were found in error by news sources, an inaccuracy rate among the highest reported in nearly seventy years of research, and empirical evidence corroborating the public’s impression that mistakes pervade the press. In about every other article, sources identified “hard” objective errors.</p></blockquote>
<p>Maier and others counted both simple &#8220;objective&#8221; errors of fact and &#8220;subjective&#8221; errors such as over-emphasis or omission of an aspect of the story. Subjective errors are much fuzzier category and newsmakers are not necessarily neutral. But, says Maier,</p>
<blockquote><p>Subjective errors, though by definition involving judgment, should not be dismissed as merely differences in opinion. Sources found such errors to be about as common as factual errors and often more egregious [as rated by the sources.]</p></blockquote>
<p>But subjective errors are a very complex category, so for today let&#8217;s not count them at all. Purely &#8220;objective,&#8221; straightforward errors of fact were found in 48% of the stories Maier checked, whereas 61% of stories had errors of either type. There seems to be no escaping the conclusion that, according to the newsmakers, about half of all American newspaper stories contained a simple factual error in 2005. And this rate has held about steady since we started measuring it seven decades ago.</p>
<p>Maier&#8217;s work is amazing, and you should <a href="http://www.aejmc.org/_scholarship/research_use/jmcq/05fall/maier.pdf">go read it</a>. He also investigates which types of errors are most common (top three: misquotation, inaccurate headline, numbers wrong) and how error rate affects perceived story and newspaper credibility, and includes a fantastic bibliography of previous accuracy work. But all of this is still a very limited glimpse. The figures in the table above cover only newspapers, not online news sources, or television and radio. And it looks only at &#8220;old media&#8221; news organizations, not digital-only newsrooms, social media, and blogs. In the end, as extensive as academic accuracy research is, we only have accuracy measurements for a handful sources at a few points in time. This type of work is not useful for ongoing evaluation of accuracy strategies within a newsroom, and it doesn&#8217;t provide consumers with broad enough information to make good choices about where they get their news.</p>
<p><strong>Continuous error sampling</strong><br />
One of the major problems with previous news accuracy metrics is the effort and time required to produce them. In short, existing accuracy measurement methods are expensive and slow. I&#8217;ve been wondering if we can do better, and a simple idea comes to mind: sampling.</p>
<p>The core idea is this: news sources could take an ongoing random sample of their output and check it for accuracy &#8212; a fact check spot check. Stories could be checked for errors by asking sources in the traditional manner, or through independent verification by a reporter who did not originally work on the story. I&#8217;m imagining that one person could check a couple stories per day in this fashion. Although this isn&#8217;t much, over time these samples will add up to an ongoing estimate of overall newsroom accuracy.</p>
<p>Standard statistical theory tells us what the error on that estimate will be for any given number of samples (If I&#8217;ve got this right, the relevant formula is <a href="http://stattrek.com/Lesson6/SRS.aspx">standard error of a population proportion estimate without replacement</a>.) At a sample rate of a few stories per day, daily estimates of error rate won&#8217;t be worth much. But weekly and monthly aggregates will start to produce useful accuracy estimates. For example, if a newsroom produces 1,000 stories per month and checks 60 of those &#8212; two per day &#8212; the error on the monthly estimate will be about ±10% (95% <a href="http://stattrek.com/AP-Statistics-4/Confidence-Interval.aspx?Tutorial=Stat">CI</a>.) Or you could average three months of data and get an accuracy estimate to within ±6%. The result would be a graph like this:</p>
<p style="text-align: center;"><img class="aligncenter" title="NewsCoAccuracy" src="http://jonathanstray.com/wp-content/uploads/2011/04/NewsCoAccuracy.png" alt="" width="500" height="349" /></p>
<p>This graph shows that NewsCo&#8217;s ongoing accuracy efforts are effective: the estimated error rate has decreased from about 50% to 35% over the course of a year. Given the margins of error on the estimates, this is a statistically significant difference indicating a real improvement, not just sampling noise. I would really like to see a news source that displayed an accuracy graph like this on their site. Otherwise, how can anyone really claim that they&#8217;re getting the facts right? Or even claim that they&#8217;re more accurate than a random blogger?</p>
<p><strong>Fixing as many errors as possible</strong><br />
Meanwhile, the online corrections process is evolving. I now know of four different news outlets that have a &#8220;report an error&#8221; link or form on every story: <a href="http://www.editorsweblog.org/newsrooms_and_journalism/2011/04/correcting_errors_shows_that_you_care.php">The Washington Post</a>, the Huffington Post, the <a href="http://www.cjr.org/behind_the_news/a_fact_check_box_on_every_page.php">Register Citizen</a> of Lichfield County, CT, and the <a href="http://www.dailylocal.com/articles/2011/03/28/business/doc4d90fe303f5dd194172301.txt">Daily Local</a> of Chester County, PA. The <a href="http://mediabugs.org/">MediaBugs project</a> can also be used to report errors on any other news site.</p>
<p style="text-align: center;"><a href="http://www.registercitizen.com/articles/2011/04/20/news/doc4dae5daa53eb4510708049.txt"><img class="aligncenter" title="FactCheckgraphic" src="http://jonathanstray.com/wp-content/uploads/2011/04/FactCheckgraphic.jpg" alt="" width="379" height="138" /></a></p>
<p>Asking your users to report inaccuracies strikes me as a fabulous idea, and likely very productive (see: &#8220;<a href="http://xkcd.com/386/">someone is wrong on the internet!</a>&#8220;) I have no knowledge of the quantity of errors submitted using these forms, or how the corrections process works. My suspicion is that each submitted correction sends an email to some hapless Engagement Editor who than has to cull the reports and route each plausible error to the story&#8217;s original reporter, if that reporter can be bothered to deal with audience feedback. (Excuse my snark, but more than one seasoned hack has told me how much they hate the idea that responding to users might be part of their job. Thankfully this attitude does seem to be on the way out.)</p>
<p>This very manual error correction system won&#8217;t scale. We can do better by asking the users to help filter error reports. It would be straightforward to implement a user-viewable queue of submitted errors for each story. Then the user who spots an error could a) first check to see if it&#8217;s already been reported, which would cut down duplicate reports, and b) vote on the severity of existing error reports. The idea is to do <a href="http://en.wikipedia.org/wiki/Collaborative_filtering">collaborative filtering</a> on error reports, so that the most serious and plausible come to the attention of the corrections editor first. Users could be encouraged to submit supporting evidence of the error, in the form of URLs to primary sources, by automatically giving precedence to items which include links.</p>
<p>I would really like to see the day when every news story on every device includes a &#8220;submit addition or correction&#8221; button. And once the corrections process exists, we can start looking at the data it generates. One goal would be to increase the efficiency of the corrections process in catching errors. That will drive the number of detected errors up, which might be very scary for newsrooms which are in the habit of pretending that every story is perfect. No newsroom wants to be the first to let the audience see that half of their stories contain a factual error, even if most of those errors are going to be minor. And yet, if decades of news accuracy research are to be believed, this is inevitable, because those errors are already there across the industry, silent. This makes me suspect that good corrections processes &#8212; real, web-native, efficient crowd-sourced corrections &#8212; will not be quickly adopted.</p>
<p>But even if you were excited to get real corrections going, how would you know if your process was really catching all of the errors?</p>
<p><strong>Combining accuracy and corrections measures</strong><br />
Suppose a newsroom was doing random samples of accuracy, and monitoring the number of errors corrected through the user-submission process and all other correction routes. Then we could plot errors corrected versus (estimated) errors found, like this:</p>
<p style="text-align: center;"><a href="http://jonathanstray.com/wp-content/uploads/2011/04/ErrorsAndCorrections.png"><img class="alignnone size-full wp-image-2949" title="ErrorsAndCorrections" src="http://jonathanstray.com/wp-content/uploads/2011/04/ErrorsAndCorrections.png" alt="" width="500" height="355" /></a></p>
<p>A newsroom monitoring its accuracy and corrections processes in this way would have two goals. First, drive errors down, perhaps by instituting an accuracy checklist, like <a href="http://stevebuttry.wordpress.com/2011/01/04/my-version-of-craig-silvermans-accuracy-checklist/">this one</a> from Steve Buttry. Second, drive corrections up by making it easy and rewarding for users to submit corrections, building better error queuing and filtering systems, assigning more staff resources to corrections, etc. The goal is to have the blue line and the red line meet, meaning that all errors are rapidly corrected.</p>
<p>Used internally, this sort of chart could give a quantitative understanding of how well a newsroom is doing in terms of accuracy, or at least those aspects of the concept of &#8220;accuracy&#8221; that are well-captured by the metrics. One could even break it out by desk to see if certain parts of the newsroom are over- or under-performing &#8212; or perhaps we&#8217;ll learn that certain types of reporting are just harder to get right. Proudly posted externally, this sort of chart would demonstrate a tangible commitment to accuracy, far better than any values statement. And if the metric could be standardized across the industry &#8212; in much the same way that, e.g. accountants have standardized their reporting &#8212; then we would finally be able to compare the ongoing accuracy of two news sources in an empirical way. I think there might be some real hope for standardized, fair accuracy metrics, because news organizations are obliged to fix the errors they find on an audit, and this will leave a trail in the revision history that serves as a public record of what a newsroom considers an &#8220;error&#8221; to be.</p>
<p><strong>Accuracy is not what it used to be</strong><br />
The explosion of information sources combined with the <a href="http://people-press.org/2009/09/13/press-accuracy-rating-hits-two-decade-low/">decades long decline in media trust</a> makes this a great time to dream up ways to increase accuracy in journalism. One approach goes like this: the first step would be admitting how inaccurate journalism has historically been. Then we have to come up with standardized accuracy evaluation procedures, in pursuit of metrics that capture enough of what we mean by &#8220;true&#8221; to be worth optimizing. Meanwhile, we can ramp up the efficiency of our online corrections processes until we find as many useful, legitimate errors as possible with as little staff time as possible. It might also be possible do data mining on types of errors and types of stories to figure out if there are patterns in how an organization fails to get facts right.</p>
<p>There are some serious practical problems and gnarly missing details here. But think of the prize! I&#8217;d love to live in a world where I could compare the accuracy of information sources, where errors got found and fixed with crowd-sourced ease, and where news organizations weren&#8217;t shy about telling me what they did and did not know. Basic factual accuracy is far from the only measure of good journalism, but perhaps it&#8217;s an improvement over the current <a href="http://www.cjr.org/the_audit/the_pulitzers_and_the_wsj.php">sad state of affairs</a>: &#8220;aside from prizes, there really aren’t any other metrics for journalism quality.&#8221;</p>
]]></content:encoded>
			<wfw:commentRss>http://jonathanstray.com/measuring-and-increasing-accuracy-in-journalism/feed</wfw:commentRss>
		<slash:comments>19</slash:comments>
		</item>
		<item>
		<title>On &#8220;Balance&#8221;</title>
		<link>http://jonathanstray.com/on-balance</link>
		<comments>http://jonathanstray.com/on-balance#comments</comments>
		<pubDate>Wed, 13 Apr 2011 02:12:44 +0000</pubDate>
		<dc:creator>Jonathan Stray</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[editorial judgement]]></category>
		<category><![CDATA[journalism]]></category>
		<category><![CDATA[polarization]]></category>

		<guid isPermaLink="false">http://jonathanstray.com/?p=2877</guid>
		<description><![CDATA[What does it mean to say that someone&#8217;s reporting is “balanced”? I think it&#8217;s supposed to suggest something like &#8220;not one-sided,&#8221; which is really supposed to imply &#8220;fair.&#8221; I really do believe in fairness in journalism, but the whole &#8220;balance&#8221; metaphor seems completely wrong to me. Anyway it&#8217;s become clear to me that this word [...]]]></description>
			<content:encoded><![CDATA[<p>What does it mean to say that someone&#8217;s reporting is “balanced”? I think it&#8217;s supposed to suggest something like &#8220;not one-sided,&#8221; which is really supposed to imply &#8220;fair.&#8221; I really do believe in fairness in journalism, but the whole &#8220;balance&#8221; metaphor seems completely wrong to me. Anyway it&#8217;s become clear to me that this word means different things to different people.<span id="more-2877"></span></p>
<p>I do not think that giving equal time or credence to “left” and “right” points of view, or “pro” and “con”, or any other such pair, has anything to do with “balance.” First of all, there may be more enlightening ways to view the issue &#8212; not every story is best understood as a conflict. But more fundamentally, I think this word implies that journalists should strive to be equally acceptable (or equally unacceptable) to all sides. The <a href="http://www.ajr.org/article.asp?id=5032">often-used</a> <a href="http://blogs.forbes.com/jeffbercovici/2011/03/22/science-settles-it-nprs-liberal-but-not-very/">phrase</a> &#8220;straight down the middle&#8221; has become a synonym for &#8220;fair,&#8221; but this metaphor gives me hives. What does being exactly between two poles have to do with truth? A judge doesn&#8217;t define fairness in this way, and neither should journalists. Rather, I believe that our job is to represent reality as best as we can discern it, with humility, intelligence, sensitivity, and transparency.</p>
<p>A better definition of balance might mean: the story leaves the viewer with an impression that matches what is actually “out there” in the world, plus a detailed critique (links are really helpful for that part.) There is room in this for pluralistic truths, because we live in a pluralistic world, but that doesn&#8217;t mean all truths are equal. It is surely important to acknowledge the existence of and fully understand the points of view of the various “sides” in a dispute, but I feel we would be failing significantly if we did not also convey:</p>
<ul>
<li>whether each viewpoint is in the majority or the minority, and by whom it is held.</li>
<li>an evaluation of the evidence supporting each position, if the issue involves testable claims of fact.</li>
<li>a thorough sense of who is affected by the events in the story, even if those people are unpopular or invisible.</li>
<li>what is still uncertain, either because the question is inherently difficult to answer or because we have not yet completed deeper reporting.</li>
</ul>
<p>In other words, I don’t think it’s enough to say “this person disagrees” without some comment on whether or not their account holds up to the truth, as best we can determine it. &#8220;<a href="http://www.huffingtonpost.com/jay-rosen/he-said-she-said-journali_b_187682.html">He said, she said</a>&#8221; is right out. In fact, forget &#8220;balance.” It&#8217;s just not the right word at all, because it implies an equivalence that may not exist. Yes, the journalist needs to be dispassionate with the facts, but the facts might not favor all parties equally. I’d much rather think about reporting in terms of completeness, representativeness, accuracy, context, accountability, and transparency. That’s what fairness is to me.</p>
]]></content:encoded>
			<wfw:commentRss>http://jonathanstray.com/on-balance/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>A computational journalism reading list</title>
		<link>http://jonathanstray.com/a-computational-journalism-reading-list</link>
		<comments>http://jonathanstray.com/a-computational-journalism-reading-list#comments</comments>
		<pubDate>Tue, 01 Feb 2011 02:29:28 +0000</pubDate>
		<dc:creator>Jonathan Stray</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[belief]]></category>
		<category><![CDATA[computational journalism]]></category>
		<category><![CDATA[journalism]]></category>
		<category><![CDATA[knowledge]]></category>
		<category><![CDATA[media]]></category>
		<category><![CDATA[minds]]></category>
		<category><![CDATA[misinformation]]></category>
		<category><![CDATA[politics]]></category>
		<category><![CDATA[social media]]></category>

		<guid isPermaLink="false">http://jonathanstray.com/?p=2596</guid>
		<description><![CDATA[[Last updated: 18 April 2011 -- added statistical NLP book link] There is something extraordinarily rich in the intersection of computer science and journalism. It feels like there&#8217;s a nascent field in the making, tied to the rise of the internet. The last few years have seen calls for a new class of  &#8220;programmer journalist&#8221; [...]]]></description>
			<content:encoded><![CDATA[<p><em>[Last updated: 18 April 2011 -- added statistical NLP book link]</em></p>
<p>There is something extraordinarily rich in the intersection of computer science and journalism. It feels like there&#8217;s a nascent field in the making, tied to the rise of the internet. The last few years have seen calls for a new class of  &#8220;<a href="http://www.niemanlab.org/2011/01/dave-winer-how-can-universities-educate-journo-programmers/">programmer journalist</a>&#8221; and the birth of a community of <a href="http://hackshackers.com/">hacks and hackers</a>. Meanwhile, several schools are now <a href="http://www.wired.com/epicenter/2010/04/will-columbia-trained-code-savvy-journalists-bridge-the-mediatech-divide/">offering joint degrees</a>. But we&#8217;ll need more than competent programmers in newsrooms. What are the key problems of computational journalism? What other fields can we draw upon for ideas and theory? For that matter, what is it?</p>
<p>I&#8217;d like to propose a working definition of computational journalism as the application of computer science to the problems of public information, knowledge, and belief, by practitioners who see their mission as outside of both commerce and government. This includes the journalistic mainstay of &#8220;reporting&#8221; &#8212; because information not published is information not known &#8212; but my definition is intentionally much broader than that. To succeed, this young discipline will need to draw heavily from social science, computer science, public communications, cognitive psychology and other fields, as well as the traditional values and practices of the journalism profession.</p>
<p>&#8220;Computational journalism&#8221; has no textbooks yet. In fact the term barely is barely recognized. The phrase seems to have emerged at Georgia Tech in 2006 or <a href="http://www.cc.gatech.edu/classes/AY2007/cs4803cj_spring/">2007</a>. Nonetheless I feel like there are already important topics and key references.</p>
<p><strong>Data journalism</strong><br />
Data journalism is obtaining, reporting on, curating and publishing data in the public interest. The practice is often more about spreadsheets than algorithms, so I&#8217;ll suggest that not all data journalism is &#8220;computational,&#8221; in the same way that a novel written on a word processor isn&#8217;t &#8220;computational.&#8221; But data journalism is interesting and important and dovetails with computational journalism in many ways.</p>
<ul>
<li>The Nieman Journalism Lab&#8217;s <a href="http://www.niemanlab.org/2010/08/how-the-guardian-is-pioneering-data-journalism-with-free-tools/">interview with Guardian Data Blog editor Simon Rogers</a> remains a solid introduction to (one kind of) contemporary practice.</li>
<li>The best practical guides I know are Rogers&#8217; &#8220;<a href="http://www.journalism.co.uk/skills/how-to-get-to-grips-with-data-journalism/s7/a542402/">How to: get to grips with data journalism</a>&#8221; and Dan Nguyen&#8217;s <a href="http://www.propublica.org/nerds/item/doc-dollars-guides-collecting-the-data">series of data-scraping tutorials at ProPublica</a>.</li>
<li>Stanford&#8217;s <a href="http://datajournalism.stanford.edu/">Journalism in the Age of Data</a> is an hour-long documentary on data journalism and visualization.</li>
<li>The web is a linked system of human-readable documents. Now Tim Berners-Lee wants to create a web of machine-readable <a href="http://blog.ted.com/2009/03/13/tim_berners_lee_web/">linked data</a>. The full potential is unclear, but it&#8217;s a big idea that may come to be the backbone of <a href="http://en.wikipedia.org/wiki/Semantic_Web">semantic web</a> visions. The <a href="http://data.nytimes.com/">New York Times</a>, <a href="http://www.guardian.co.uk/open-platform">The Guardian</a>, and others are experimenting with open data APIs.</li>
<li>Everyblock creator Adrian Holovaty seems to have been the first to suggest that reporters file structured data in his 2006 &#8220;<a href="http://www.holovaty.com/writing/fundamental-change/">A Fundamental Way Newspaper Websites Need to Change</a>.&#8221; This idea is beautifully expanded in Stijn Debrouwere&#8217;s &#8220;<a href="http://stdout.be/2010/information-architecture-for-news-websites/">Information Architecture for News Websites</a>&#8221; series.</li>
</ul>
<p><strong>Visualization</strong><br />
Big data requires powerful exploration and storytelling tools, and increasingly that means visualization. But there&#8217;s good visualization and bad visualization, and the field has advanced tremendously since Tufte wrote <a href="http://www.edwardtufte.com/tufte/books_vdqi">The Visual Display of Quantitative Information</a>. There is lots of good science that is too little known, and many open problems here.</p>
<ul>
<li>Tamara Munzner&#8217;s <a href="http://www.cs.ubc.ca/labs/imager/tr/2009/VisChapter/">chapter on visualization</a> is the essential primer. She puts visualization on rigorous perceptual footing, and discusses all the major categories of practice. Absolutely required reading for anyone who works with pictures of data.</li>
<li>Ben Fry invented the Processing language and wrote his <a href="http://benfry.com/phd/">PhD thesis on &#8220;computational information design</a>,&#8221; which is his powerful conception of the iterative, interactive practice of designing useful visualizations.</li>
<li>How do we make visualization statistically rigorous? How do we know we&#8217;re not just fooling ourselves when we see patterns in the pixels? This <a href="http://jonathanstray.com/papers/wickham.pdf">amazing paper by Wickham</a> et. al. has some answers.</li>
<li>Is a visualization a story? Segal and Heer explore this question in &#8220;<a href="http://vis.stanford.edu/files/2010-Narrative-InfoVis.pdf">Narrative Visualization: Telling Stories with Data</a>.&#8221;</li>
</ul>
<p><strong>Computational linguistics</strong><br />
Data is more than numbers. Given that the web is designed to be read by humans, it makes heavy use of human language. And then there are all the world&#8217;s books, and the archival recordings of millions of speeches and interviews. Computers are slowly getting better at dealing with language.</p>
<ul>
<li>Word frequency techniques like <a href="http://en.wikipedia.org/wiki/Tfidf">tf-idf</a> and the <a href="http://en.wikipedia.org/wiki/Vector_space_model">vector space document model</a> are very simple and very useful. See also <a href="http://en.wikipedia.org/wiki/Stemming">stemming</a>. Lots more in the wonderful (and free!) <em><a href="http://nlp.stanford.edu/IR-book/information-retrieval-book.html">Introduction to Information Retrieval</a></em>. This book explains how search engines are built, and  discusses tf-idf etc. in great technical detail.</li>
<li>Statistical language models are increasingly important for all kinds of applications. Michael Nielsen has a great <a href="http://michaelnielsen.org/blog/introduction-to-statistical-machine-translation/">introduction to statistical machine translation</a>. Google&#8217;s Peter Norvig discusses how he implemented <a href="http://norvig.com/spell-correct.html">statistical spelling correction</a> on his laptop during a long plane flight. For the full deal, see the book <em><a href="http://books.google.com/books?id=YiFDxbEX3SUC&amp;lpg=PP1&amp;dq=Foundations%20of%20statistical%20language%20processing%22&amp;pg=PP1#v=onepage&amp;q&amp;f=false">Foundations of Statistical Natural Language Processing</a></em>.</li>
<li>On a related note, <a href="http://ngrams.googlelabs.com/">Google N-gram viewer</a> lets you look at the frequency of short phrases within 4% of all books published, ever. The <a href="http://mfi.uchicago.edu/publications/papers/Science_Culturomics.pdf">excellent paper</a> gives examples of how to use this for cultural research. Dan Cohen has <a href="http://www.dancohen.org/2010/12/19/initial-thoughts-on-the-google-books-ngram-viewer-and-datasets/">important criticisms</a>.</li>
<li>Speech-to-text algorithms enable automated transcription, and Matt Thompson explores the <a href="http://www.niemanlab.org/2010/12/coming-soon-to-journalism-matt-thompson-sees-the-speakularity-and-universal-instant-transcription/">huge implications for journalism</a>.</li>
<li>Reuters maintains the <a href="http://www.opencalais.com/">OpenCalais</a> entity extraction service, which parses text to contextually determine who and what is referenced.</li>
<li>IBM&#8217;s Watson project built a question-answering system that reads reference books and wins at Jeopardy. Imagine how useful to journalists and curious readers this could be! This <a href="http://www.stanford.edu/class/cs124/AIMagzine-DeepQA.pdf">paper on the DeepQA system</a> describes how they did it.</li>
</ul>
<p><strong>Communications technology and free speech</strong><br />
<a href="http://harvardmagazine.com/2000/01/code-is-law.html">Code is law</a>. Because our communications systems use software, the underlying mathematics of communication lead to staggering political consequences &#8212; including whether or not it is possible for governments to verify online identity or remove things from the internet. The key topics here are networks, cryptography, and information theory.</p>
<ul>
<li>The <a href="http://www.cacr.math.uwaterloo.ca/hac/index.html">Handbook of Applied Cryptography</a> is a classic, and free online. But despite the title it doesn&#8217;t really explain how crypto is used in the real world, <a href="http://en.wikipedia.org/wiki/Cryptography">like Wikipedia does</a>.</li>
<li>It&#8217;s important to know how the internet routes information, using <a href="http://en.wikipedia.org/wiki/Transmission_Control_Protocol">TCP/IP</a> and <a href="http://en.wikipedia.org/wiki/Border_Gateway_Protocol">BGP</a>, or at a somewhat higher level, things like the <a href="http://www.ittc.ku.edu/~niehaus/classes/750-s06/documents/BT-description.pdf">BitTorrent protocol</a>. The technical details determine how hard it is to do things like block websites, suppress the dissemination of a file, or <a href="http://blog.torproject.org/blog/recent-events-egypt">remove entire countries from the internet</a>.</li>
<li>Anonymity is deeply important to online free speech, and very hard. The <a href="http://www.torproject.org/">Tor project</a> is the outstanding leader in anonymity-related research.</li>
<li>Information theory is stunningly useful across almost every technical discipline. Pierce&#8217;s <a href="http://www.amazon.com/Introduction-Information-Theory-Symbols-Signals/dp/0486240614/ref=pd_rhf_p_t_1">short textbook</a> is the classic introduction, while Tom Schneider&#8217;s <a href="http://www-lmmb.ncifcrf.gov/~toms/paper/primer/">Information Theory Primer</a> seems to be the best free online reference.</li>
</ul>
<p><strong>Tracking the spread of information (and misinformation)</strong><br />
What do we know about how information spreads through society? Very little. But one nice side effect of our increasingly digital public sphere is the ability to track such things, at least in principle.</p>
<ul>
<li><a href="http://memetracker.org/">Memetracker</a> was (AFAIK) the first credible demonstration of whole-web information tracking, following quoted soundbites through blogs and mainstream news sites and everything in between. Zach Seward has cogent <a href="http://www.niemanlab.org/2009/07/in-the-news-cycle-memes-spread-more-like-a-heartbeat-than-a-virus/">reflections on their findings</a>.</li>
<li>The <a href="http://truthy.indiana.edu/">Truthy Project</a> aims for automated detection of astro-turfing on Twitter. They specialize in covert political messaging, or as I like to call it, computational propaganda.</li>
<li>We badly need tools to help us determine the source of any given online &#8220;fact.&#8221; There are many existing techniques that could be applied to the problem, as I discussed in a <a href="http://jonathanstray.com/escaping-the-news-hall-of-mirrors">previous post</a>.</li>
<li>If we had information provenance tools that worked across a spectrum of media outlets and feed types (web, social media, etc.) it would be much cheaper to do the sort of <a href="http://www.journalism.org/analysis_report/how_news_happens">information ecosystem studies</a> that Pew and others occasionally undertake. This would lead to a much better understanding of <a href="http://www.niemanlab.org/2010/02/the-googlechina-hacking-case-how-many-news-outlets-do-the-original-reporting-on-a-big-story/">who does original reporting</a>.</li>
</ul>
<p><strong>Filtering and recommendation</strong><br />
With <a href="http://techcrunch.com/2010/08/04/schmidt-data/">vastly more information than ever before</a> available to us, attention becomes the scarcest resource. Algorithms are an essential tool in filtering the flood of information that reaches each person. (Social media networks also <a href="http://jonathanstray.com/whats-the-point-of-social-news">act as filters</a>.)</p>
<ul>
<li>The paper on <a href="http://crpit.com/confpapers/CRPITV70Truyen.pdf">preference networks</a> by Turyen et. al. is probably as good an introduction as anything to the state of the art in recommendation engines, those algorithms that tell you what articles you might like to read or what <a href="http://en.wikipedia.org/wiki/Netflix_Prize">movies you might like to watch</a>.</li>
<li>Before Google News there was Columbia News Blaster, which incorporated a number of interesting algorithms such as multi-lingual article clustering, automatic summarization, and more as described in <a href="http://www.cs.columbia.edu/~sable/research/hlt-blaster.pdf">this paper</a> by McKeown et. al.</li>
<li>Anyone playing with clustering algorithms needs to have a deep appreciation of the <a href="http://en.wikipedia.org/wiki/Ugly_duckling_theorem">ugly duckling theorem</a>, which says that there is no categorization without preconceptions. King and Grimmer explore this with their technique for <a href="http://gking.harvard.edu/files/abs/discov-abs.shtml">visualizing the space of clusterings</a>.</li>
<li>Any digital journalism product which involves the audience to any degree &#8212; that should be all digital journalism products &#8212; is a piece of social software, well defined by Clay Shirky in his classic essay, &#8220;<a href="http://www.shirky.com/writings/group_enemy.html">A Group Is Its Own Worst Enemy</a>.&#8221; It&#8217;s also a &#8220;<a href="http://cdixon.org/2010/01/17/collective-knowledge-systems/">collective knowledge system</a>&#8221; as articulated by Chris Dixon.</li>
</ul>
<p><strong>Measuring public knowledge</strong><br />
If journalism is about &#8220;informing the public&#8221; then we must consider what happens to stories after publication &#8212; this is the <a href="http://jonathanstray.com/does-journalism-work">&#8220;last mile&#8221; problem in journalism</a>. There is almost none of this happening in professional journalism today, aside from basic traffic analytics. The key question here is, how does journalism change ideas and action? Can we apply computers to help answer this question empirically?</p>
<ul>
<li>World Public Opinion&#8217;s recent <a href="http://www.worldpublicopinion.org/pipa/articles/brunitedstatescanadara/671.php?nid=&amp;id=&amp;pnt=671&amp;lb=">survey of misinformation among American voters</a> solves this problem in the classic way, by doing a randomly sampled opinion poll. I discuss their bleak results <a href="http://jonathanstray.com/american-journalism-failed-to-inform-voters">here</a>.</li>
<li>Blogosphere maps and other kinds of visualizations can help us understand the public information ecosystem, such as this <a href="http://cyber.law.harvard.edu/publications/2008/Mapping_Irans_Online_Public/interactive_blogosphere_map">interactive visualization of Iranian blogs</a>. I have previously suggested using such maps as a navigation tool that might <a href="http://jonathanstray.com/mapping-the-daily-me">broaden our information horizons</a>.</li>
<li> <a href="http://www.unglobalpulse.org/">UN Global Pulse</a> is a serious attempt to create a real-time global monitoring system to detect humanitarian threats in crisis situations. They plan to do this by mining the &#8220;data exhaust&#8221; of entire societies &#8212; social media postings, online records, news reports, and whatever else they can get their hands on. Sounds like <a href="http://www.unglobalpulse.org/blog/real-time-information-everyone-journalists-perspective-un-global-pulse">key technology for journalism</a>.</li>
<li><a href="http://sm.rutgers.edu/vox/event/">Vox Civitas</a> is an ambitious social media mining tool designed for journalists. Computational linguistics, visualization, and more.</li>
</ul>
<p><strong>Research agenda</strong><br />
I know of only one work which proposes a research agenda for computational journalism.</p>
<ul>
<li>&#8220;<a href="http://www.eecs.umich.edu/~congy/work/cidr11.pdf">Computational Journalism: A Call to Arms for Database Researchers</a>&#8221; by Sarah Cohen et. al. raises the very intriguing possibility of building systems that automatically or semi-automatically scan databases for stories, document the rationale for believing certain facts, etc.</li>
</ul>
<p>This paper presents a broad vision and is really a must-read. However, it deals almost exclusively with reporting, that is, finding new knowledge and making it public. I&#8217;d like to suggest that the following unsolved problems are also important:</p>
<ul>
<li>Tracing the source of any particular &#8220;fact&#8221; found online, and generally tracking the spread and mutation of information.</li>
<li>Cheap metrics for the state of the public information ecosystem. How accurate is the web? How accurate is a particular source?</li>
<li>Techniques for mapping public knowledge. What is it that people actually know and believe? How polarized is a population? What is under-reported? What is well reported but poorly appreciated?</li>
<li>Information routing and timing: how can we route each story to the set of people who might be most concerned about it, or best in a position to act, at the moment when it will be most relevant to them?</li>
</ul>
<p>This sort of attention to the health of the public information ecosystem as a whole, beyond just the traditional surfacing of new stories, seems essential to the project of <a href="http://jonathanstray.com/does-journalism-work">making journalism work</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://jonathanstray.com/a-computational-journalism-reading-list/feed</wfw:commentRss>
		<slash:comments>45</slash:comments>
		</item>
		<item>
		<title>The state of The State of the Union coverage, online</title>
		<link>http://jonathanstray.com/the-state-of-the-state-of-the-union-coverage-online</link>
		<comments>http://jonathanstray.com/the-state-of-the-state-of-the-union-coverage-online#comments</comments>
		<pubDate>Sat, 29 Jan 2011 00:07:46 +0000</pubDate>
		<dc:creator>Jonathan Stray</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[journalism]]></category>
		<category><![CDATA[politics]]></category>
		<category><![CDATA[social media]]></category>
		<category><![CDATA[twitter]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://jonathanstray.com/?p=2543</guid>
		<description><![CDATA[The state of the union is a big pre-planned event, so it&#8217;s a great place to showcase new approaches and techniques. What do news digital news organizations do when they go all out? Here&#8217;s my roundup of online coverage Tuesday night. Live coverage The Huffington Post, the New York Times, the Wall Street Journal, ABC, CNN, Mashable, and [...]]]></description>
			<content:encoded><![CDATA[<p>The state of the union is a big pre-planned event, so it&#8217;s a great place to showcase new approaches and techniques. What do news digital news organizations do when they go all out? Here&#8217;s my roundup of online coverage Tuesday night.</p>
<p><strong>Live coverage</strong></p>
<p>The <a href="http://www.huffingtonpost.com/2011/01/25/state-of-the-union-2011-live_n_814024.html">Huffington Post</a>, the <a href="http://twitter.com/#!/nytimes/status/30080180894572544">New York Times</a>, the <a href="http://twitter.com/#!/WSJ/status/30083697923334144">Wall Street Journal</a>, <a href="http://twitter.com/#!/ABC/status/30084370786164736">ABC</a>, <a href="http://live.cnn.com/?cnn=yes">CNN</a>, <a href="http://mashable.com/2011/01/25/watch-obamas-state-of-the-union-live-video/">Mashable</a>, and many others, including even <a href="http://motherjones.com/mojo/2011/01/state-union-live-video-coverage-and-commentary">Mother Jones</a> had live web video. But you can get live video on television, so perhaps the digitally native form of the live blog is more interesting. This can include commentary from multiple reporters, reactions from social media, link round-ups, etc. The <a href="http://thecaucus.blogs.nytimes.com/2011/01/25/live-blog-president-obama-delivers-state-of-the-union/">New York Times</a>, the <a href="http://boston.com/community/blogs/less_is_more/2011/01/state_of_the_union_live_blog.html">Boston Globe</a>, <a href="http://blogs.wsj.com/washwire/2010/01/27/analysis-of-president-obamas-state-of-the-union/tab/liveblog/">The Wall Street Journal</a>, <a href="http://politicalticker.blogs.cnn.com/2011/01/25/live-blog-from-state-of-the-union-address/">CNN</a>, <a href="http://firstread.msnbc.msn.com/_news/2011/01/25/5917940-live-blogging-the-state-of-the-union-">MSNBC</a>, and many others had a live blog. The Huffington Post&#8217;s <a href="http://www.huffingtonpost.com/2011/01/25/state-of-the-union-2011-s_n_813477.html">effort</a> was particularly comprehensive, continuing well into Wednesday afternoon.</p>
<p>Multi-format, socially-aware live coverage is now standard, and by my reckoning makes television look meagre. But the experience is not really available on tablet and mobile yet. For example, almost all of the live video feeds were in Flash and therefore unavailable on Apple devices, as CNET <a href="http://news.cnet.com/8301-13924_3-20029577-64.html">reports</a>.</p>
<p>As far as tools, there was <a href="http://thehill.com/blogs/floor-action/floor-speeches/140065-liveblog-the-2011-state-of-the-union-address">some use of Coveritlive</a>, but most live blogs seemed to be using nondescript custom software.</p>
<p><strong>Visualizations</strong></p>
<p>Lots of visualization love this year. But visualizations take time to create, so most of them were rooted in previously available SOTU information. The Wall Street Journal did an interactive <a href="http://online.wsj.com/article/SB10001424052748704698004576104951112319700.html">topic and keyword breakdown</a> of Obama&#8217;s addresses to congress since 2009, which <a href="http://twitter.com/#!/WSJ/status/30113903820144641">moved</a> about an hour after Tuesday&#8217;s speech concluded.</p>
<p style="text-align: center;"><a href="http://online.wsj.com/article/SB10001424052748704698004576104951112319700.html"><img class="size-full wp-image-2556 aligncenter" title="WSJ SOTU breakdown" src="http://jonathanstray.com/wp-content/uploads/2011/01/Screen-shot-2011-01-27-at-3.03.21-PM.png" alt="" width="521" height="252" /></a></p>
<p style="text-align: center;">
<p>The New York Times had a <a href="http://www.nytimes.com/interactive/2011/01/25/us/politics/state-of-the-union-words-used.html">snazzy graphic</a> comparing the topics of 75 years of SOTU addresses,  by looking at the rates of certain carefully chosen words. Rollovers for individual counts, but mostly a flat thing.</p>
<p style="text-align: center;"><a href="http://www.nytimes.com/interactive/2011/01/25/us/politics/state-of-the-union-words-used.html"><img class="size-full wp-image-2557 aligncenter" title="Screen shot 2011-01-27 at 3.08.23 PM" src="http://jonathanstray.com/wp-content/uploads/2011/01/Screen-shot-2011-01-27-at-3.08.23-PM.png" alt="" width="517" height="347" /></a></p>
<p>The Guardian Data Blog took a similar historical approach, with <a href="http://www.guardian.co.uk/news/datablog/2011/jan/25/state-of-the-union-text-obama">Wordles for SOTU speeches</a> from Obama and seven other presidents back to Washington. Being the Data Blog, they also put the word frequencies for these speeches into a downloadable spreadsheet. It&#8217;s a huge image, definitely intended for big print pages.</p>
<p style="text-align: center;"><a href="http://www.guardian.co.uk/news/datablog/2011/jan/25/state-of-the-union-text-obama#zoomed-picture"><img class="size-medium wp-image-2568" title="State-of-the-union-wordle-001" src="http://jonathanstray.com/wp-content/uploads/2011/01/State-of-the-union-wordle-001-129x300.jpg" alt="" width="129" height="300" /></a></p>
<p>A shout-out to my AP colleagues for all their hard work on our <a href="http://hosted.ap.org/specials/interactives/wdc/sotu-2011/index.html?SITE=WIMIL&amp;SECTION=POLITICS">SOTU interactive</a>, which included the video, a fact-checked transcript, and an animated visualization of Twitter responses before, during, and after the State of the Union.</p>
<p style="text-align: center;"><a href="http://hosted.ap.org/specials/interactives/wdc/sotu-2011/index.html?SITE=WIMIL&amp;SECTION=POLITICS"><img class="size-full wp-image-2560 aligncenter" title="AP SOTU 2011 interactive" src="http://jonathanstray.com/wp-content/uploads/2011/01/Screen-shot-2011-01-27-at-4.38.21-PM.png" alt="" width="410" height="406" /></a></p>
<p style="text-align: left;">
<p style="text-align: left;">But it&#8217;s not clear what, if anything, we can actually learn from such visualizations. In terms of solid journalism content, possibly the best visualization came not from a news organization but from Nick Diakopoulos and co. at Rutgers University. Their <a href="http://sm.rutgers.edu/vox/event/?e=3">Vox Civitas</a> tool does filtering, search, and visualization of over  100,000 tweets captured during the address.</p>
<p style="text-align: left;">
<p style="text-align: center;"><a href="http://jonathanstray.com/wp-content/uploads/2011/01/Screen-shot-2011-01-28-at-6.24.48-PM.png"><img class="size-full wp-image-2574 aligncenter" title="Vox Civitas SOTU 2011" src="http://jonathanstray.com/wp-content/uploads/2011/01/Screen-shot-2011-01-28-at-6.24.48-PM.png" alt="" width="499" height="428" /></a></p>
<p style="text-align: left;">I find this interface a little too complex for general audience consumption &#8212; definitely a power user&#8217;s tool. But the <a href="http://www.nickdiakopoulos.com/wp-content/uploads/2007/05/diamonds-in-the-rough_VAST_cr1.pdf">algorithms</a> are second to none. For example, Vox Civitas compares tweets to the text of the speech within the previous two minutes to detect &#8220;relevance,&#8221; and the automated keyword extraction &#8212; you can see the keywords at the bottom of the interface above &#8212; is based on <a href="http://en.wikipedia.org/wiki/Tf%E2%80%93idf">tf-idf</a> and seems to choose really interesting and relevant words. The interactive graph of keyword frequency over time clearly shows the sort of information that I had hoped to reveal with the AP&#8217;s visualization.</p>
<p><strong>Fact Checking</strong></p>
<p>A number of organizations did real-time or near real-time fact checking, as Yahoo <a href="http://news.yahoo.com/s/yblog_thecutline/20110124/ts_yblog_thecutline/news-orgs-nonprofits-to-fact-check-obamas-sotu-address">reports</a>. The Sunlight Foundation used its<a href="http://sunlightfoundation.com/live/">Sunlight Live</a> system fo real-time fact checks and commentary. This platform, incorporating live video, social media monitoring, and other components is expected to be available as an <a href="http://sunlightfoundation.com/blog/2010/07/20/knight-batten-winners/">open-source web app</a>, for the use of other news organizations, by mid-2011.</p>
<p>The Associated Press published a long fact check <a href="http://news.yahoo.com/s/ap/us_state_of_union_fact_check">piece</a> (also integrated into the AP <a href="http://hosted.ap.org/specials/interactives/wdc/sotu-2011/index.html?SITE=WIMIL&amp;SECTION=POLITICS">interactive</a>), ABC had their own <a href="http://abcnews.go.com/Politics/State_of_the_Union/state-union-2011-fact-check-president-obamas-address/story?id=12760731">story</a>, and CNN took a <a href="http://articles.cnn.com/2011-01-25/politics/fact.check.obama.energy_1_oil-industry-oil-companies-undersea-gusher?_s=PM:POLITICS">stab at it</a>.</p>
<p>But the heaviest hitter was Politifact, who had a number of fact check rulings within hours and several more by Wednesday evening. These are together in a nice <a href="http://politifact.com/truth-o-meter/article/2010/jan/27/fact-checking-obamas-state-union-speech/">summary article</a>, but as is their custom the <a href="http://politifact.com/truth-o-meter/statements/2010/jan/28/barack-obama/tax-cut-95-percent-stimulus-made-it-so/">individual fact checks</a> are extensively documented and linked to primary sources.</p>
<p><strong>Audience engagement</strong></p>
<p>Pretty much every news organization had some SUTO action on social media, though with varying degrees of aggressiveness and creativity. Some of the more interesting efforts involved solicitation of audience responses of a specific kind. NPR asked people to describe their <a href="http://www.npr.org/2011/01/26/133211131/the-state-of-the-union-in-your-words">reaction to the state of the union in three words</a>. This was <a href="http://twitter.com/#!/nprnews/status/30103502969638913">promoted aggressively on Twitter</a> and Facebook. They also asked for political affiliation, and split out the 4000 responses into Democratic and Republican word clouds:</p>
<p style="text-align: center;"><a href="http://jonathanstray.com/wp-content/uploads/2011/01/npr_sotu_wordle.jpg"><img class="size-full wp-image-2544 aligncenter" title="wordle_final_all" src="http://jonathanstray.com/wp-content/uploads/2011/01/npr_sotu_wordle.jpg" alt="" width="499" height="305" /></a></p>
<p>Apparently, <a href="http://www.huffingtonpost.com/2011/01/26/obama-state-of-the-union-salmon-joke_n_814205.html">Obama&#8217;s salmon joke</a> went down well. The Wall Street Journal went live Tuesday morning with &#8220;<a href="http://online.wsj.com/article/SB10001424052748703555804576101851104651000.html">The State of the Union is&#8230;</a>&#8221; asking viewers to leave a one word answer. This was also <a href="http://twitter.com/#!/WSJ/status/29913094041899008 ">promoted on Twitter</a>. Their results were presented in the same interactive, as a popularity-sorted list.</p>
<p style="text-align: center;"><a href="http://jonathanstray.com/wp-content/uploads/2011/01/Screen-shot-2011-01-27-at-1.43.49-PM2.png"><img class="size-full wp-image-2562 aligncenter" title="Screen shot 2011-01-27 at 1.43.49 PM" src="http://jonathanstray.com/wp-content/uploads/2011/01/Screen-shot-2011-01-27-at-1.43.49-PM2.png" alt="" width="519" height="381" /></a></p>
<p>Aside from this type of interactive, we saw lots of agressive social media engagement in general. The more social-media savvy organizations were all over this, promoting their upcoming coverage and responding to their audiences. As usual, the Huffington Post was pretty seriously tweeting the event, posting about updates to their live blog, etc. and going well into Wednesday morning. Perhaps inspired by NPR, they encouraged people to tweet their <a href="http://www.huffingtonpost.com/2011/01/25/state-of-the-union-three-word-twitter-reactions_n_814061.html#s230045&amp;title=Juneau_Underwood">#3wordreaction</a> to the speech. They also collected and highlighted reaction from <a href="http://twitter.com/#!/HuffPostEdu/status/30418469711253505">teachers</a>, <a href="http://twitter.com/#!/HuffingtonPost/status/30641043472908289">Sarah Palin</a>, etc.</p>
<p>But as an AP colleague of mine asked, engagement to what end? Getting people&#8217;s attention is great, but then how do we, as journalists, focus that attention in a way that makes people think or act?</p>
<p><strong>The White House</strong></p>
<p>No online media roundup of the SOTU would be complete without a discussion of the White House&#8217;s own efforts, including<a href="http://www.whitehouse.gov/state-of-the-union-2011"> web and mobile</a> app presences. Fortunately, Nieman Journalism Lab has <a href="http://www.niemanlab.org/2011/01/state-of-the-union-whitehouse-gov-as-a-media-outlet/">done this for us</a>. Here I&#8217;ll just add that the White House <a href="http://www.youtube.com/watch?v=cF1nE-JbJgw&amp;feature=player_embedded#">livestreamed a Q&amp;A session</a> in front of  an audience immediately after the speech, in which White House Office of Public Engagement&#8217;s Kal Penn (aka <a href="http://en.wikipedia.org/wiki/Kal_Penn">Kumar</a>) read questions from social media. Then Obama himself did an <a href="http://www.youtube.com/watch?v=nqoeuIlaxRc&amp;feature=player_embedded#">intervew</a> Thursday afternoon in which he answered questions submitted as videos on YouTube.</p>
]]></content:encoded>
			<wfw:commentRss>http://jonathanstray.com/the-state-of-the-state-of-the-union-coverage-online/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>What is news when the audience is editor?</title>
		<link>http://jonathanstray.com/what-is-news-when-the-audience-is-editor</link>
		<comments>http://jonathanstray.com/what-is-news-when-the-audience-is-editor#comments</comments>
		<pubDate>Sat, 15 Jan 2011 19:37:23 +0000</pubDate>
		<dc:creator>Jonathan Stray</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[collaborative filtering]]></category>
		<category><![CDATA[journalism]]></category>
		<category><![CDATA[news agenda]]></category>

		<guid isPermaLink="false">http://jonathanstray.com/?p=2491</guid>
		<description><![CDATA[This is a paper I wrote in December 2009. I&#8217;ve decided to post it now, partially because it contains a previously unreported 30-day content comparison of Digg versus the New York times. Looking back on this work, I think that its greatest weakness is an under-appreciation of the importance of production processes in determining what [...]]]></description>
			<content:encoded><![CDATA[<p><em>This is a paper I wrote in December 2009. I&#8217;ve decided to post it now, partially because it contains a previously unreported 30-day content comparison of Digg versus the New York times. Looking back on this work, I think that its greatest weakness is an under-appreciation of the importance of production processes in determining what gets reported and how. In other words, I believe now that the intense pressure of daily deadlines shapes the news far more than external influences such as political and commercial pressures &#8212; at least in countries where the press is relatively free. Also available as a <a href="http://jonathanstray.com/papers/What%20Is%20News%20When%20the%20Audience%20is%20Editor.pdf">pdf</a>.</em></p>
<p><strong>Abstract</strong><br />
There are now several websites which allow users to assemble news content from around the internet by means of voting systems. The result is a new kind of front page that directly reflects what the audience believes to be salient, as opposed to what the editorial staff of a newsroom believes the audience should know. Content analyses of such sites show that they have little overlap with mainstream media agendas (5% in a previous study). In fact, many of the items selected by users would not traditionally be considered &#8220;news&#8221; at all. This paper examines the shift from editor to audience agendas in the context of previous theories of news production, discusses existing content analysis work on the subject, and reports on a new 30 day study of Digg.com versus NYTimes.com.</p>
<p><strong>Introduction</strong><br />
No news organization can cover everything. Traditionally, it is ultimately the editor of a news publication who decides what is newsworthy: what stories reporters will follow, and what stories will be published. It has been considered part of the value of a news organization to determine what its audiences need to know about.</p>
<p>It&#8217;s never been entirely clear how professional journalists decide which events are worth reporting, out of all the events taking place in the world. Neither has it been obvious how editorial choices relate to the audience&#8217;s personal judgments about what is important, but  such questions were largely theoretical before the advent of the web. &#8220;I own a newspaper, you do not&#8221; was always the implicit end to discussions about who got to decide what was news.</p>
<p>Today, publishing is near-free and the news package has been disaggregated. An online audience member can select single stories that interest them, without reading or even really being aware of the traditional news package. Alongside this disaggregation we find a new class of online applications that re-aggregate content from multiple sources. Readers vote on pages from across the web, and the top-rated items are displayed on the aggregator&#8217;s home page.</p>
<p>News consumers are literally tearing the world&#8217;s newspapers apart and re-assembling them to fit their own agendas, including lots of content not traditionally considered news at all.</p>
<p>This paper examines what we can learn about the online audience&#8217;s judgment not only of what is important but what is news at all, and how it differs from that of traditional newsrooms. I review previous work on &#8220;news values&#8221;  and &#8220;news agenda&#8221; in professional journalism, look at measurements of what audiences view online, and report on my own 30 day quantitative study of Digg as compared to the New York Times.</p>
<p><strong>Features of the audience-generated agenda</strong><br />
<span id="more-2491"></span>Online audiences seem to be selecting for themselves a radically different set of stories and topics than that assembled for them by the mainstream media. The most relevant work on this topic is 2007 study by the Pew Research Center&#8217;s Project for Excellence in Journalism [<a href="http://www.journalism.org/node/7493">1</a>]. The PEJ investigation followed Digg, Del.icio.us  and Reddit for one week, as well as the more conventionally edited Yahoo News, collecting a total of 644 stories. It compared these to 1,395 stories from the same period published in the PEJ&#8217;s News Coverage Index, a collection of mainstream news sources in print, online, network TV, cable, and radio. The report&#8217;s &#8220;key findings&#8221; are worth excerpting:</p>
<blockquote style="text-align: left;">
<ul>
<li>The news agenda of the three user-sites that week was markedly different from that of the mainstream press. Many of the stories users selected did not appear anywhere among the top stories in the mainstream media coverage studied.</li>
<p>&#8230;</p>
<li>The sources user news sites draw on are strikingly different from the mainstream media. Seven in ten stories on the user sites come either from blogs or Web sites such as YouTube and WebMd that do not focus mostly on news.</li>
<p>&#8230;</p>
<li>Despite claims that the Web would internationalize consumers’ news diets, coverage across the three user-news sites focused more on domestic events and less on news from abroad than the mainstream media that week.</li>
</ul>
</blockquote>
<p>These points suggest the major themes that will recur in this paper. User news judgment is vastly different from editor news judgment.  Users do not appear to care whether or not stories sources are produced in traditional journalistic fashion. And &#8220;serious&#8221; journalism (on, e.g. international topics) is unpopular.</p>
<p><strong>Theories of news agenda </strong><br />
How do professional journalists decide what stories to follow and publish?</p>
<p>Schudson [2] examines research into this question beginning with the &#8220;gatekeeper&#8221; model. In this framework, those who are in a position to decide what is published control what information may pass from the world into the news.  The notion is that the gatekeepers will inject their particular perspectives and biases into the news. And yet, early studies revealed that there is little variation in the wire stories chosen for publication by different local editors. Moreover, the &#8220;gatekeeper&#8221; approach doesn&#8217;t answer the question of what comes to the gate and how.</p>
<blockquote style="text-align: left;"><p>The term &#8220;gatekeeper&#8221; is still in use and provides a handy, if not altogether appropriate, metaphor for the relation of news organizations to news products. A problem with the metaphor is that it leaves &#8220;information&#8221; sociologically untouched, a pristine material that comes to the gate already prepared; the journalist as &#8220;gatekeeper&#8221; simply decides which pieces of prefabricated news will be allowed through the gate.</p></blockquote>
<p>In pursuit of a more sophisticated model of how the news content is decided, Schudson identifies three broad schools of theory.</p>
<p>In political economy theories of news production, structural constraints determine what it is possible to publish, regardless of the intentions of individual journalists. Chomsky&#8217;s &#8220;propaganda model&#8221; is the archetypal example. According to Chomsky the news is strictly limited to and complicit in reporting only what is favorable to maintaining the (unjust) status quo. Issues of political interest, commercial pressures and the capitalist structures of society are cogently discussed in this branch of theory.</p>
<p>&#8220;Organizational&#8221; theories are also structural, but don&#8217;t necessarily see collusion between newsrooms and elites. Instead, the focus is on how the individual journalist within these structures ends up having little choice in how they operate. Noting that the great majority of news stories are the result of official reports from government agencies, Schudson says that the reporter sees the world as &#8220;bureaucratically organized&#8221;:</p>
<blockquote style="text-align: left;"><p>One study after another comes up with essentially the same observation, and it matters not whether the study is at the national, state, or local level &#8212; the story of journalism, on a day-to-day basis, is the story of the interaction of reporters and officials. Some claim officials generally have the upper hand. Some media critics, including many government officials, say reporters do. But there is little doubt that the center of news generation is the link between reporter and official, the interaction of the representatives of the news bureaucracies and the government bureaucracies. This is clear especially when one examines the actual daily practices of journalists.</p></blockquote>
<p>Constructivist theories that investigate how meaning is produced for journalists and audiences, taking into account the cultures in which they both live. It differs from the organizational perspective in that it examines the symbols and ideas available to journalists. This body of theory is the one best suited to deal with framing choices, and questions of what is and isn&#8217;t surprising (and therefore newsworthy) within a particular culture.</p>
<p>Each of these theories provides interesting analytical tools, yet there is no overall theory that reliably answers our basic question: what makes the news? Lacking a clear explanation of this point, it&#8217;s difficult to know how well the public is being served by traditional news sources.</p>
<p><strong>What is newsworthy?</strong><br />
If we ask journalists how they decide what beats to follow, what leads to investigate, and what stories to produce, we typically get answers involving the &#8220;newsworthiness&#8221; of various events. Yet journalists are at a loss to explain what this actually means. One veteran editor described news judgment to me as &#8220;tribal,&#8221; i.e. publication dependent and essentially arbitrary &#8212; which is of course at odds with theories of &#8220;objective&#8221; reporting. Hall writes about the difficulty of defining &#8220;newsworthy&#8221; in a 1973 essay on photojournalism [3]:</p>
<blockquote style="text-align: left;"><p>&#8220;News values&#8221; are one of the most opaque structures of meaning in modern society. All &#8220;true journalists&#8221; are supposed to possess it: few can or are willing to identify and define it. Journalists speak of &#8220;the news&#8221; as if events select themselves. Further, they speak as if which is the &#8220;most significant&#8221;  news story, and which &#8220;news angles&#8221; are the most salient are divinely inspired. Yet of the millions of events which occur every day in the world, only a tiny portion ever become visible as &#8220;potential news stories&#8221;; and of this proportion, only a small fraction are actually produced as the day&#8217;s news in the news media. We appear to be dealing, then, with a &#8220;deep structure&#8221; whose function as a selective device is un-transparent event to those who professionally most know how to operate it.</p></blockquote>
<p>Perhaps our most comprehensive understanding of what &#8220;newsworthiness&#8221; actually means comes from Shoemaker [4] and colleagues. They performed very diverse studies of what people consider to newsworthy and found something surprising: the news agenda doesn&#8217;t reflect <em>anyone&#8217;s</em> personal judgment!</p>
<blockquote style="text-align: left;"><p>In our study of news in ten countries, Akiba Cohen and I (2006) discovered a disconnect between what people think is newsworthy and how prominently newspapers display the stories. People in four types of focus groups &#8212; journalists, public relations practitioners, low socio-economic audience, and high socio-economic audience &#8212; were asked to rank ten headlines according to their newsworthiness, each set being taken from their local newspapers several months earlier. The stories ranged (in percentiles) from the most prominent as displayed in the newspaper to the least prominent.</p>
<p>As expected, people within each focus group ranked the stories in much the same way, but we also found that journalists agreed with public relations practitioners, high SES audience members agreed with low SES audience members, journalists with audience members, and so on &#8212; no matter what their station in life, people agreed on how newsworthy the events were. This was true in each of the ten countries we studied.</p>
<p>But when we compared the peoples’ newsworthiness rankings to how prominently their local newspapers had displayed the stories, agreement was much lower. In some countries there was actually a negative relationship between how newsworthy people thought an event was and how prominently it was covered by the newspaper. In most countries, the relationship was positive, but much weaker than the relationships between the various groups of people.</p>
<p>The newsworthiness of an event is only one of many factors that determines how prominently the story will be covered. We cannot assume that the most prominently covered stories are the ones that people (whether editors, reporters, PR practitioners, doctors, mechanics, or teachers) think are most newsworthy, and we cannot reasonably expect people’s mental judgments about what is newsworthy to correlate highly with what actually becomes the social artifact news.</p></blockquote>
<p>In other words, it does not appear that mainstream news agenda is representative of what even journalists think is newsworthy, let alone the audience.</p>
<p>This finding demands an explanation. Does the news as produced not reflect individual judgment due to structural issues such as, say, political pressures? Might this be an example of groupthink, where each person produces what they imagine everyone else in their culture wants? It is also possible that the people in Shoemaker&#8217;s focus groups were mis-reporting their judgments in some way, whether through selective perception and cognitive bias, Hawthorne and other experimental effects, or social pressures.</p>
<p>But if we assume for the moment that Shoemaker&#8217;s finding is believable, we do not yet have any good explanations of why it could be that the &#8220;social artifact&#8221; of news represents the newsworthiness judgment of no one in particular. But we do get a prediction: given a free choice, audiences would construct a news agenda that is dramatically different from the mainstream status quo.</p>
<p><strong>What are Audiences Actually Reading?</strong><br />
Consumption is one form of audience judgment. Newspaper publishers have long known or assumed that their readers don&#8217;t have the same priorities as the newsroom. Stereotypically, it is sports and celebrity gossip that draw the most readers. This wasn&#8217;t necessarily an economic concern in the newsprint era given that the customer could not buy less than a whole paper. In any case, reader story choice was not immediately measureable. In some studies readers have been asked what they read, but this technique is subject to deep problems related to memory and perception. In another design, readers have been directly observed as they read the paper, but this is an artificial situation and subject to well-known distortions such the Hawthorne effect (where the subject tries to please the researcher).</p>
<p>In contrast, online measurements can be completely unobtrusive. In fact the raw data is already routinely recorded in web server logs, and collected by companies such as Nielsen Ratings. Tewksbury [5] analyzed visits to 13 pre-selected news sites by 9,209 randomly-selected Americans in the months of March and May 2000.  From [5] table 2:</p>
<table style="text-align: left;" border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="144" valign="top"><strong>Story   Category</strong></td>
<td width="72" valign="top"><strong>%   views</strong></td>
</tr>
<tr>
<td width="144" valign="top">Sports</td>
<td width="72" valign="top">26.0</td>
</tr>
<tr>
<td width="144" valign="top">Business and money</td>
<td width="72" valign="top">13.4</td>
</tr>
<tr>
<td width="144" valign="top">Arts and entertainment</td>
<td width="72" valign="top">10.9</td>
</tr>
<tr>
<td width="144" valign="top">Features</td>
<td width="72" valign="top">10.7</td>
</tr>
<tr>
<td width="144" valign="top">U.S. national</td>
<td width="72" valign="top">10.2</td>
</tr>
<tr>
<td width="144" valign="top">Technology and science</td>
<td width="72" valign="top">7.0</td>
</tr>
<tr>
<td width="144" valign="top">Interactive elements</td>
<td width="72" valign="top">7.7</td>
</tr>
<tr>
<td width="144" valign="top">World</td>
<td width="72" valign="top">6.1</td>
</tr>
<tr>
<td width="144" valign="top">Politics</td>
<td width="72" valign="top">5.4</td>
</tr>
<tr>
<td width="144" valign="top">Weather</td>
<td width="72" valign="top">3.6</td>
</tr>
<tr>
<td width="144" valign="top">Health</td>
<td width="72" valign="top">1.5</td>
</tr>
<tr>
<td width="144" valign="top">Opinion and editorial</td>
<td width="72" valign="top">1.4</td>
</tr>
<tr>
<td width="144" valign="top">State and local</td>
<td width="72" valign="top">1.2</td>
</tr>
<tr>
<td width="144" valign="top">Obituary</td>
<td width="72" valign="top">.1</td>
</tr>
<tr>
<td width="144" valign="top">Other news</td>
<td width="72" valign="top">2.5</td>
</tr>
<tr>
<td width="144" valign="top"></td>
<td width="72" valign="top"></td>
</tr>
<tr>
<td width="144" valign="top">Advertising, index page,   etc.</td>
<td width="72" valign="top">19.1</td>
</tr>
</tbody>
</table>
<p>True to stereotype, sports lead with 26% of views, followed by business, entertainment, features, and national news. The political and international news considered so important by professional journalists together comprise just 11.5% of total page views.</p>
<p>Academic work on news agenda has taken a similarly narrow focus, in that it has not really come to grips with the implications of a readership who doesn&#8217;t much care for the news that journalists think is important. Media effects researchers have for decades used studies where subjects are asked &#8220;what is the most important problem facing the nation?&#8221; [6] Within this research paradigm, by definition only &#8220;problems facing the nation&#8221; can generate news. Where does sports and lifestyle reporting fit into this? Even before the era of user news sites, it seems that journalists and scholars alike had a skewed conception of how their audiences interacted with the news media.</p>
<p><strong>The Audience Agenda-Generation Process</strong><br />
Audience-driven content aggregation sites such as Digg, Reddit, etc. all work on similar principles. Users can submit arbitrary URLs into the system, either directly on the aggregation site or via submission buttons that content publishers make available to readers on their site in the hope that their content will be promoted, e.g. &#8220;Digg this&#8221; buttons. Similar voting buttons are provided on the aggregation site for each item displayed. Votes collected from all locations are tallied for each item, and the ranked results constitute the user-generated &#8220;front page.&#8221;</p>
<p>The resulting rankings are not equivalent to a poll of readers. To begin with, votes must be counted in a time-limited way, or such sites would rank the most popular content of all time, as opposed to popular recent content.  The number of votes for an item also depends greatly on the number of people exposed to that item, and this has much to do with factors that influence the extent of &#8220;viral&#8221; transmission of an item through social media, including emotional response (see e.g. [<a href="http://marketing.wharton.upenn.edu/documents/research/Virality.pdf">7</a>]) and the social network topology around the people who have an interest in the topic. Further, readers voting on the site are far more likely to vote for items that are more prominently displayed &#8212; and thus already more popular.</p>
<p>Nonetheless, this voting process produces some sort of snapshot of aggregate audience interest. It&#8217;s a relatively opaque sort of snapshot, and doesn&#8217;t clearly represent anything in particular. It will favor already popular items and items with emotional content &#8212; but so does pop culture. It&#8217;s not obvious that this user-generated news agenda is any &#8220;better&#8221; or &#8220;worse&#8221; than the agenda of a newsroom.</p>
<p>What we can say is that type of audience-generated agenda clearly draws on a wider array of sources than a traditional news publication. Any web page can be voted upon, not just content from &#8220;news&#8221; sites. Videos are relevant, as are blog postings on arbitrary topics. Crucially, the audience agenda generation process seems to involve only very weak preconceptions about what is potentially newsworthy &#8212; &#8220;anything on the web&#8221; &#8212; as compared to journalists and academics. Because users vote individually, mostly in private, not for pay, and effectively anonymously, we might also expect user agendas to be free from the structural and sociological constraints acting on a newsroom. Is it possible that user-generated agenda are simply a more honest reflection of what all of us actually consider newsworthy?</p>
<p><strong>Audience-Generated Agendas in Detail</strong><br />
The PEJ study [<a href="http://www.journalism.org/node/7493">1</a>] examined user-edited sites in several ways. One of the most revealing is the (lack of) overlap between these sites and the stories in the Pew&#8217;s news coverage index of mainstream media outlets.</p>
<blockquote><p>In the user-generated sites, [mainstream media] stories were barely visible. Overall, just 5% of the stories captured on these three sites overlapped with the ten most widely-covered stories in the Index (13% for Reddit, 4% for Digg, and 0% for Del.icio.us).</p></blockquote>
<p>Again we see that there is very little overlap between what the mainstream media considers &#8220;news&#8221;, and the stories that users choose for themselves. Even accounting for the demographic skew of these sites &#8212; which are arguably still &#8220;early adopter&#8221; and over-represent tech geeks &#8212; the lack of agreement is astounding.</p>
<p>They further examine the content in terms of the top five subject categories for each site.</p>
<p><strong>News Index (Mainstream Media)</strong></p>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="144" valign="top"><strong>Topic</strong></td>
<td width="72" valign="top"><strong>Story   % </strong></td>
</tr>
<tr>
<td width="144" valign="top">International (non-US)</td>
<td width="72" valign="top">15</td>
</tr>
<tr>
<td width="144" valign="top">Disasters/accidents</td>
<td width="72" valign="top">11</td>
</tr>
<tr>
<td width="144" valign="top">US Foreign Affairs</td>
<td width="72" valign="top">10</td>
</tr>
<tr>
<td width="144" valign="top">Immigration</td>
<td width="72" valign="top">8</td>
</tr>
<tr>
<td width="144" valign="top">Government</td>
<td width="72" valign="top">7</td>
</tr>
</tbody>
</table>
<p><strong>Digg</strong></p>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="144" valign="top"><strong>Topic</strong></td>
<td width="72" valign="top"><strong>Story   % </strong></td>
</tr>
<tr>
<td width="144" valign="top">Technology and science</td>
<td width="72" valign="top">40</td>
</tr>
<tr>
<td width="144" valign="top">Lifestyle</td>
<td width="72" valign="top">11</td>
</tr>
<tr>
<td width="144" valign="top">International (non-US)</td>
<td width="72" valign="top">6</td>
</tr>
<tr>
<td width="144" valign="top">Business</td>
<td width="72" valign="top">6</td>
</tr>
<tr>
<td width="144" valign="top">Government</td>
<td width="72" valign="top">6</td>
</tr>
<tr>
<td width="144" valign="top">Celebrities</td>
<td width="72" valign="top">6</td>
</tr>
</tbody>
</table>
<p><strong> </strong></p>
<p><strong>Reddit</strong></p>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="144" valign="top"><strong>Topic</strong></td>
<td width="72" valign="top"><strong>Story   % </strong></td>
</tr>
<tr>
<td width="144" valign="top">Technology and science</td>
<td width="72" valign="top">22</td>
</tr>
<tr>
<td width="144" valign="top">Lifestyle</td>
<td width="72" valign="top">15</td>
</tr>
<tr>
<td width="144" valign="top">Government</td>
<td width="72" valign="top">13</td>
</tr>
<tr>
<td width="144" valign="top">International (non-US)</td>
<td width="72" valign="top">7</td>
</tr>
<tr>
<td width="144" valign="top">Crime</td>
<td width="72" valign="top">6</td>
</tr>
</tbody>
</table>
<p><strong>Del.icio.us</strong></p>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="144" valign="top"><strong>Topic</strong></td>
<td width="72" valign="top"><strong>Story   % </strong></td>
</tr>
<tr>
<td width="144" valign="top">Technology and science</td>
<td width="72" valign="top">41</td>
</tr>
<tr>
<td width="144" valign="top">Lifestyle</td>
<td width="72" valign="top">20</td>
</tr>
<tr>
<td width="144" valign="top">International (non-US)</td>
<td width="72" valign="top">16</td>
</tr>
<tr>
<td width="144" valign="top">Business</td>
<td width="72" valign="top">6</td>
</tr>
<tr>
<td width="144" valign="top">Miscellaneous</td>
<td width="72" valign="top">6</td>
</tr>
</tbody>
</table>
<p>The difference of focus between editor- and audience-generated agenda stands out here. The audience sites had heavy coverage of technology and science issues, which again suggests demographic differences. Lifestyle was also much more popular. As in the online readership survey, what a traditional journalist would call &#8220;hard news&#8221; barely registers.</p>
<p>My own study of user vs. mainstream media agenda was a 30 day comparison of Digg and the New York Times, from 4 October 2009 to 4 November 2009. Each evening shortly after midnight, I took a screen capture of the front pages of both sites. With the browser window set to a full screen height of 1024 pixels, and ignoring the smallest sized links on the NYT page, both screen shots averaged about nine stories per day. I categorized the stories on each site and also tracked, for each day, how many stories appeared on both sites. In total, I collected 238 stories from Digg and 227 stories from the New York Times this way.</p>
<p>Rather than the default Digg page as used in the PEJ study, I used the &#8220;24 hour news&#8221; view in order to capture a list that was a little more comparable in time scale to a daily newspaper, and contained the &#8220;news&#8221; label which may tell content voters that &#8220;newsworthiness&#8221; is being asked for. Results by top 5 story categories:</p>
<p><strong>Story Category &#8212; Digg</strong></p>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="144" valign="top"><strong>Topic</strong></td>
<td width="72" valign="top"><strong>Story   % </strong></td>
</tr>
<tr>
<td width="144" valign="top">Politics and international</td>
<td width="72" valign="top">31</td>
</tr>
<tr>
<td width="144" valign="top">Technology and science</td>
<td width="72" valign="top">27</td>
</tr>
<tr>
<td width="144" valign="top">Lifestyle</td>
<td width="72" valign="top">13</td>
</tr>
<tr>
<td width="144" valign="top">Arts and entertainment</td>
<td width="72" valign="top">10</td>
</tr>
<tr>
<td width="144" valign="top">Miscellaneous</td>
<td width="72" valign="top">8</td>
</tr>
</tbody>
</table>
<p><strong>Story Category &#8212; NYT</strong></p>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="144" valign="top"><strong>Topic</strong></td>
<td width="72" valign="top"><strong>Story   % </strong></td>
</tr>
<tr>
<td width="144" valign="top">Politics and international</td>
<td width="72" valign="top">51</td>
</tr>
<tr>
<td width="144" valign="top">Lifestyle</td>
<td width="72" valign="top">12</td>
</tr>
<tr>
<td width="144" valign="top">Business</td>
<td width="72" valign="top">11</td>
</tr>
<tr>
<td width="144" valign="top">Arts and entertainment</td>
<td width="72" valign="top">10</td>
</tr>
<tr>
<td width="144" valign="top">Sports</td>
<td width="72" valign="top">9</td>
</tr>
</tbody>
</table>
<p>In these tables, audience and editor agendas don&#8217;t seem that different. But consider this:</p>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="144" valign="top"><strong>Digg&#8217;s   overlap with&#8230;</strong></td>
<td width="72" valign="top"><strong>Story   % </strong></td>
</tr>
<tr>
<td width="144" valign="top">All mainstream media</td>
<td width="72" valign="top">29</td>
</tr>
<tr>
<td width="144" valign="top">The New York Times</td>
<td width="72" valign="top">3</td>
</tr>
</tbody>
</table>
<p>Although the selection of topics was similar, the actual stories almost never overlapped between Digg and the NYT. Only a few very prominent stories were covered in both agendas, such as Obama winning the Nobel Peace Prize. In total, stories in the Digg &#8220;news&#8221; category came from mainstream media reports only 29% of the time. This is high compared to the PEJ study which averaged 5%. The difference may be attributable to my choice of the  &#8220;news&#8221; category on Digg. The fact that this category exists, and the why and how of different user behavior when using this category, is an area ripe for future research.</p>
<p><strong>Conclusion</strong><br />
Lacking a clear understanding of how a traditional newsroom selects its stories, we can say very little about how audiences might theoretically choose differently. Even journalists cannot explain how story selection really works. However, both online news readership surveys and user-aggregated news sites show very different agendas than the mainstream media. This is most visible when one looks at the overlap between audience and editor-generated agendas &#8212; 5% in the PEJ study and 29% for the Digg &#8220;news&#8221; category.</p>
<p>Is this because the media never really represented public tastes for information, due to structural constraints or value differences between journalists and their audiences? Structural and organizational reasons seem a more promising explanation then personal judgments, given that Shoemaker&#8217;s surveys suggest that not even journalists agree with the agendas of their publications.</p>
<p>But we must also contend with the deeper problem of &#8220;what is news?&#8221; What can thinkably be on a news agenda? In traditional media effects research, only &#8220;problems facing the nation&#8221; can be on the agenda. On the web, the answer is most naturally &#8220;any web page.&#8221; Most of the web is nothing like news from a traditional journalism perspective, but this doesn&#8217;t stop audiences from voting it on to the agenda.</p>
<p>And this paper has only scratched the surface. We&#8217;ve discussed only the web, not real-time messaging services such as Twitter and Facebook, and this leads us to a key underlying assumption of all of the work discussed in this paper: the &#8220;news&#8221; is the same for every member of the audience. This is a very &#8220;broadcast&#8221; mentality, and the web is not a broadcast medium. In the future, news will be personalized and it will also be personal: the machinations of my social network may not be newsworthy to a journalist, but it&#8217;s certainly newsworthy to me. The entire concept of &#8220;news&#8221; is undergoing a transformation.</p>
<p>So what is news in the age of the audience-as-editor? In 2007 entrepreneur Adrian Holovaty founded a site called EveryBlock.com that aggregates local police reports, blog posts, and other web content with determinable location. Users from all over the US can see what is happening literally in their neighborhood. It is a valuable source of news, yet EveryBlock employs no &#8220;journalists&#8221;. The success of this product sparked a lively debate around the question &#8220;is data journalism?&#8221; Holovaty&#8217;s answer was [<a href="http://www.holovaty.com/writing/data-is-journalism/">8</a>],</p>
<blockquote><p>I no longer see the point in debating the definition of journalism. I&#8217;m interested in building products that improve people&#8217;s lives via information. Whether somebody calls that &#8220;journalism&#8221; is utterly uninteresting.</p></blockquote>
<p><strong>See also</strong></p>
<ul>
<li><a href="http://jonathanstray.com/rating-items-by-number-of-votes">Rating items by number of votes: ur doing it rong</a>. User-voting on items tends to favor the already popular. A statistical commentary on the problem, and a suggested solution.</li>
<li><a href="http://jonathanstray.com/papers/divergent%20online%20news%20preferences%20of%20journalists%20and%20readers.pdf">The divergent online news preferences of journalists and readers</a>. A similar quantitative content study, comparing headlines with user-generated &#8220;most popular&#8221; story lists.</li>
<li><a href="http://www.journalism.org/node/7493">The Latest News Headlines &#8212; Your Vote Counts</a>. The classic PEJ study I discussed in this paper.</li>
</ul>
<p><strong>References</strong></p>
<p>1. &#8220;The Latest News Headlines &#8212; Your Vote Counts.&#8221; Project for Excellence in Journalism, September 12, 2007. <a href="http://www.journalism.org/node/7493">http://www.journalism.org/node/7493</a></p>
<p>2. Michael Schudson, &#8220;The Sociology of News Production&#8221;, Media Culture Society, 1989; 11; 263.</p>
<p>3. Stuart Hall, &#8220;The Determination of News Photographs&#8221;, pp 176-190 in S. Cohen and J. Young (eds), The Manufacture of News: A Reader, Beverly Hills: Sage, 1973</p>
<p>4. Pamela Shoemaker, &#8220;News and Newsworthiness: A Commentary.&#8221; Communications, 31(1):105-111, 2006</p>
<p>5. David Tewksbury,  &#8220;What do Americans Really Want to Know?&#8221; Journal of Communication Research, December 2003</p>
<p>6. Maxwell McCombs, &#8220;Building Consensus: The News Media&#8217;s Agenda Setting Roles.&#8221; Political Communication, 14:433-443, 1997.</p>
<p>7. Jonah Berger and Katy Milkman, &#8220;Social Transmission and Viral Culture.&#8221; Wharton School, University of Pennsylvania, 2009. <a href="http://marketing.wharton.upenn.edu/documents/research/Virality.pdf">http://marketing.wharton.upenn.edu/documents/research/Virality.pdf</a></p>
<p>8. <a href="http://www.holovaty.com/writing/data-is-journalism/">http://www.holovaty.com/writing/data-is-journalism/</a></p>
]]></content:encoded>
			<wfw:commentRss>http://jonathanstray.com/what-is-news-when-the-audience-is-editor/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

