The editorial search engine

It’s impossible to build a computer system that helps people find or filter information without at some point making editorial judgements. That’s because search and collaborative filtering algorithms embody human judgement about what is important to know. I’ve been pointing this out for years, and it seems particularly relevant to the journalism profession today as it grapples with the digital medium. It’s this observation which is the bridge between the front page and the search results page, and it suggests a new generation of digital news products that are far more useful than just online translations of a newspaper.

It’s easy to understand where human judgement enters into information filtering algorithms, if you think about how such things are built. At some point a programmer writes some code for, say, a search engine, and tests it by looking at the output on a variety of different queries. Are the results good? In what way do they fall short of the social goals of the software? How should the code be changed? It’s not possible to write a search engine without a strong concept of what “good” results are, and that is an editorial judgement.

I bring this up now for two reasons. One is an ongoing, active debate over “news applications” — small programs designed with journalistic intent — and their role in journalism. Meanwhile, for several years Google’s public language has been slowly shifting from “our search results are objective” to “our search results represent our opinion.” The transition seems to have been completed a few weeks ago, when Matt Cutts spoke to Wired about Google’s new page ranking algorithm:

In some sense when people come to Google, that’s exactly what they’re asking for — our editorial judgment. They’re expressed via algorithms. When someone comes to Google, the only way to be neutral is either to randomize the links or to do it alphabetically.

There it is, from the mouth of the bot. “Our editorial judgment” is “expressed via algorithms.” Google is saying that they have and employ editorial judgement, and that they write algorithms to embody it. They use algorithms instead of hand-curated lists of links, which was Yahoo’s failed web navigation strategy of the late 1990s, because manual curation doesn’t scale to whole-web sizes and can’t be personalized. Yet hand selection of articles is what human editors do every day in assembling the front page. It is valuable, but can’t fulfill every need.

Informing people takes more than reporting
Like a web search engine, journalism is about getting people the accurate information they need or want. But professional journalism is built upon pre-digital institutions and economic models, and newsrooms are geared around content creation, not getting people information. The distinction is important, and journalism’s lack of attention to information filtering and organization seems like a big omission, an omission that explains why technology companies have become powerful players in news.

I don’t mean to suggest that going out and getting the story — aka “reporting” — isn’t important. Obviously, someone has to provide the original report that then ricochets through the web via social media, links, and endless reblogging. Further, there is evidence that very few people do original reporting. Last year I counted the percentage of news outlets did their own reporting on one big story, and found that only 13 of 121 stories listed on Google News did not simply copy information found elsewhere. A contemporaneous Pew study of the news ecosystem of Baltimore found that most reporting was still done by print newspapers, with very little contributed by “new media,” though this study has been criticized for a number of potentially serious category problems. I’ve also repeatedly experienced the power that a single original report can have, as when I made a few phone calls to discover that Jurgen Habermas is not on Twitter, or worked with AP colleagues to get the first confirmation from network operators that Egypt had dropped off the internet. Working in a newsroom, obsessively watching the news propagate through the web, I see this every day: it’s amazing how few people actually pump original reports into the ecosystem.

But reporting isn’t everything. It’s not nearly enough. Reporting is just one part of ensuring that important public information is available, findable, and known. This is where journalism can learn something from search engines, because I suspect what we really want is a hybrid of human and algorithmic judgement.

As conceived in the pre-digital era, news is a non-personalized, non-interactive stream of updates about a small number of local or global stories. The first and most obvious departure from this model would be the ability to search within a news product for particular stories of interest. But the search function on most news websites is terrible, and mostly fails at the core task of helping people find the best stories about a topic of interest. If you doubt this, try going to your favorite news site and searching for that good story that you read there last month. Partially this is technical neglect. But at root this problem is about newsroom culture: the primary product is seen to be getting the news out, not helping people find what is there. (Also, professional journalism is really bad at linking between stories, and most news orgs don’t do fine-grained tracking of social sharing of their content, which are two of primary signals that search engines use to determine which articles are the most relevant.)

Story-specific news applications
We are seeing signs of a new kind of hybrid journalism that is as much about software as it is about about reporting. It’s still difficult to put names to what is happening, but terms like “news application” are emerging. There has been much recent discussion of the news app, including a session at the National Institute of Computer-Assisted Reporting conference in February, and landmark posts on the topic at Poynter and NiemanLab. Good examples of the genre include ProPublica’s dialysis facility locator, which combines investigative reporting with a search engine built on top of government data, and the Los Angeles Time’s real-time crime map, which plots LAPD data across multiple precincts and automatically detects statistically significant spikes. Both can be thought of as story-specific search engines, optimized for particular editorial purposes.

Yet the news apps of today are just toes in the water. It is no disrespect to all of the talented people currently working in the field say this, because we are at the beginning of something very big. One common thread in recent discussion of news apps has been a certain disappointment at the slow rate of adoption of the journalist-programmer paradigm throughout the industry. Indeed, with Matt Waite’s layoff from Politifact, despite a Pulitzer Prize for his work, some people are wondering if there’s any future at all in the form. My response is that we haven’t even begun to see the full potential of software combined with journalism. We are under-selling the news app because we are under-imagining it.

I want to apply search engine technology to tell stories. “Story” might not even be the right metaphor, because the experience I envision is interactive and non-linear, adapting to the user’s level of knowledge and interest, worth return visits and handy in varied circumstances. I don’t want a topic page, I want a topic app. Suppose I’m interested in — or I have been directed via headline to — the subject of refugees and internal migration. A text story about refugees due to war and other catastrophes is an obvious introduction, especially if it includes maps and other multimedia. And that would typically be the end of  the story by today’s conventions. But we can do deeper. The International Organization for Migration maintains detailed statistics on the topic. We could plot that data, make it searchable and linkable. Now we’re at about the level of a good news app today. Let’s go further by making it live, not a visualization of a data set but a visualization of a data feed, an automatically updating information resource that is by definition evergreen. And then let’s pull in all of the good stories concerning migration, whether or not our own newsroom wrote them. (As a consumer, the reporting supply chain is not my problem, and I’ve argued before that news organizations need to do much more content syndication and sharing.) Let’s build a search engine on top of every last scrap of refugee-related content we can find. We could start with classic keyword search techniques, augment them by link analysis weighted toward sources we trust, and ingest and analyze the social streams of whichever communities deal with the issue. Then we can tune the whole system using our editorial-judgment-expressed-as-algorithms to serve up the most accurate and relevant content not only today, but every day in the future. Licensed content we can show within our product, and all else we can simply link to, but the search engine needs to be a complete index.

Rather than (always, only) writing stories, we should be trying to solve the problem of comprehensively informing the user on a particular topic. Web search is great, and we certainly need top-level “index everything” systems, but I’m thinking of more narrowly focussed projects. Choose a topic and start with traditional reporting, content creation, in-house explainers and multimedia stories. Then integrate a story-specific search engine that gathers together absolutely everything else that can be gathered on that topic, and applies whatever niche filtering, social curation, visualization, interaction and communication techniques are most appropriate. We can shape the algorithms to suit the subject. To really pull this off, such editorially-driven search engines need to be both live in the sense of automatically incorporating new material from external feeds, and comprehensive in the sense of being an interface to as much information on the topic as possible. Comprehensiveness will keep users coming back to your product and not someone else’s, and the idea of covering 100% of a story is itself powerful.

Other people’s content is content too
The brutal economics of online publishing dictate that we meet the needs of our users with as little paid staff time as possible. That drives the production process toward algorithms and outsourced content. This might mean indexing and linking to other people’s work, syndication deals that let a news site run content created by other people, or a blog network that bright people like to contribute to. It’s very hard for the culture of professional journalism to accept this idea, the idea that they should leverage other people’s work as far as they possibly can for as cheap as they can possibly get it, because many journalists and publishers feel burned by aggregation. But aggregation is incredibly useful, while the feelings and job descriptions of newsroom personnel are irrelevant to the consumer. As Sun Microsystems founder Bill Joy put it, “no matter who you are, most of the smartest people work for someone else,” and the idea that a single newsroom can produce the world’s best content on every topic is a damaging myth. That’s the fundamental value proposition of aggregation — all of the best stuff in one place. The word “best” represents editorial judgement in the classic sense, still a key part of a news organization’s brand, and that judgement can be embodied in whatever algorithms and social software are designed to do the aggregation. I realize that there are economic issues around getting paid for producing content, but that’s the sort of thing that needs to be solved by better content marketplaces, not lawsuits and walled gardens.

None of this means that reporters shouldn’t produce regular stories on their beats, or that there aren’t plenty of topics which require lots of original reporting and original content. But asking who did the reporting or made the content misses the point. A really good news application/interactive story/editorial search engine should be able to teach us as much as we care to learn about the topic, regardless of the state of our previous knowledge, and no matter who originally created the most relevant material.

What I am suggesting comes down to this: maybe a digital news product isn’t a collection of stories, but a system for learning about the world. For that to happen, news applications are going to need to do a lot of algorithmically-enhanced organization of content originally created by other people. This idea is antithetical to current newsroom culture and the traditional structure of the journalism industry. But it also points the way to more useful digital news products: more integration of outside sources, better search and personalization, and story-specific news applications that embody whatever combination of original content, human curation, and editorial algorithms will best help the user to learn.

[Updated 27 March with more material on social signals in search, Bill Joy’s maxim, and other good bits.]
[Updated 1 April with section titles.]

What’s the point of social news?

According to Facebook, social news seems to be mostly about knowing what all my friends are reading. I’m not so sure. But I think there really is something to the idea of “social news” for journalism, and for journalism product design.

I take “social” to mean “interacting with other people.” That’s a fundamental technical possibility of digital media, as basic to the internet as moving pictures are to television. I’m not sure that anyone really knows yet what to do with that possibility, but happily there are already at least two very well-developed uses. Maybe social news isn’t about “friends” at all, but about filtering and news-gathering.

Twitter is really a filter
I get most of both my general and special interest news from Twitter. I rarely go to the home page of a news site, or use a news app. It’s not the tweets themselves that are informative, but the links within them to articles posted elsewhere. I follow a large set of people with varied interests, and some of them work for news organizations, but most do not. My twitter feed is faster, more diverse, and available across more platforms (all of them) than any one news organization’s output.

This doesn’t mean that Twitter is a perfect news delivery system, but to me it’s proven better than just about anything else at getting me the news mix that I want, and keeping me interested in the world at large. (Admittedly, I follow people I’ve met in other countries, so yeah, travel is way better than Twitter for that.) I am not alone in this opinion. The structure of follower relationships among Twitter users suggests that it’s more of a news network than a social network.

The usefulness of Twitter for news has a lot to do with certain basic design choices. First, a tweet is really as short as you can get and still communicate a complete concept, so it’s basically an extended headline. Second, Twitter differs from Facebook in that relationships can be unidirectional: I don’t need anyone’s permission to follow them, and they may not know or care that I do. Following someone on Twitter also differs from following a blog via RSS because most tweets refer to someone else’s work through a link — Twitter is more about re-publishing than publishing. Retweets also include the name of the original tweeter, which enables discovery of interesting new curators.

Filtering is much more valuable than it used to be, in this era of information overload, and these properties make Twitter an excellent filtering system. There are several news products based almost entirely on displaying links tweeted by the people you follow, such as The Twitter Tim.es and Flipboard. The medium that Twitter invented — global public short messaging with links — has already been endlessly replicated and will be with us forever.

There is a sense in which news organizations have always seen filtering as a big part of their value. One of the duties of the professional editor is to decide what you need to see. But at least one thing has upset that model irretrievably: the internet is not a broadcast medium. While each person reads an identical copy of the Times and watches an identical CNN broadcast, there’s no reason my internet has to look the same as your internet. A small team of human editors can’t personalize the headlines for every reader, so that leaves algorithmic filtering, such as Google News’ personalization features, or social filtering, such as Twitter.

The point is, there’s probably something to learn from how Twitter uses social relationships to route information. As the Nieman Journalism Lab said: “social news isn’t about the people you know so much as the people with whom you share interests.” To put this in terms of the product I wish I had: when I use your news product, I want to be able to follow the recommended reading of other members of the audience, if they so allow. Also, can I follow a particular reporter? And does your product integrate with the other methods I already use for getting information, so I don’t have to choose?

Social networks are great for reporting
Audience-journalist collaboration, blah blah blah. If the idea that professionals are no longer the only players in news is new to you, see blogging and Wikipedia. But a news organization probably has to look at this from a different angle. For me, the core idea of social news-gathering is that the audience is, or could be, an extension of the news organization’s source network.

Hopefully, a newsroom knows about interesting developments before anyone else, and then verifies and publicizes them, but that’s getting near impossible when anyone can publish, and when virality can amplify primary sources without the involvement of a media organization. We don’t know yet very much about collective news-gathering, but there are promising directions. It seems like maybe there are two broad categories of breaking news: public events that anyone could have witnessed, and private events initially known only to privileged observers.

Social media is now routinely used to augment reporting of public events. There are entire units in news organizations dedicated to getting stories from the audience, often under the awkward rubric of “user-generated content.” But why sift for events online when you can give your audience the tools to give you the story directly? Right now if I see a plane land in a river, I tweet it. Wouldn’t a news organization prefer that I send my eye-witness photo to the UGC editor instead? To this end, several mobile news apps include the ability to submit pictures. CNN’s iReport app and website is probably the best developed of these. Ideally, I could send that breaking news tweet to the newsroom and to my friends at the same time, within the same application.

Fast reporting of private events has always depended on having the right sources. A well established source may call the reporter or send an email when something newsworthy happens. Someone with a much looser connection to the organization may not, and perhaps this is an opportunity for social news tools. When someone knows something — or can talk about something — you want them to contact the newsroom first. The potential of this weak-tie news sourcing approach hasn’t really been studied, to my knowledge, but I imagine that it would require, at minimum, a trusted brand, an easily-reachable editorial staff, and frictionless communication tools. If it’s easier just to tweet or blog the news, the source will.

There are several other good examples of social news-gathering, on the theme of asking your audience for help. Crowdsourcing is usually thought of as the recruitment of many unspecialized helpers, as the Guardian did with its MP expenses project. But the Guardian also reached out to its audience to find that one specialist attorney who could unravel the mystery of Tony Blair’s tax returns. Hopefully the specialists a newsroom needs to consult are already among the audience, and they will see the call for experts when a reporter sends one out. For that matter, a smart and engaged audience can correct you quickly when you are wrong. Nothing says “we care about accuracy” like a fact check box on every story.

But is it journalism?
Yes, absolutely. The job of journalism is to collect accurate information on an ongoing basis and ensure that the audience for each story learns about that story. Any way you can deliver that service is fair game. People depend on each other for the news all the time, so journalists better get in those conversations.