It’s impossible to build a computer system that helps people find or filter information without at some point making editorial judgements. That’s because search and collaborative filtering algorithms embody human judgement about what is important to know. I’ve been pointing this out for years, and it seems particularly relevant to the journalism profession today as it grapples with the digital medium. It’s this observation which is the bridge between the front page and the search results page, and it suggests a new generation of digital news products that are far more useful than just online translations of a newspaper.
It’s easy to understand where human judgement enters into information filtering algorithms, if you think about how such things are built. At some point a programmer writes some code for, say, a search engine, and tests it by looking at the output on a variety of different queries. Are the results good? In what way do they fall short of the social goals of the software? How should the code be changed? It’s not possible to write a search engine without a strong concept of what “good” results are, and that is an editorial judgement.
I bring this up now for two reasons. One is an ongoing, active debate over “news applications” — small programs designed with journalistic intent — and their role in journalism. Meanwhile, for several years Google’s public language has been slowly shifting from “our search results are objective” to “our search results represent our opinion.” The transition seems to have been completed a few weeks ago, when Matt Cutts spoke to Wired about Google’s new page ranking algorithm:
In some sense when people come to Google, that’s exactly what they’re asking for — our editorial judgment. They’re expressed via algorithms. When someone comes to Google, the only way to be neutral is either to randomize the links or to do it alphabetically.
There it is, from the mouth of the bot. “Our editorial judgment” is “expressed via algorithms.” Google is saying that they have and employ editorial judgement, and that they write algorithms to embody it. They use algorithms instead of hand-curated lists of links, which was Yahoo’s failed web navigation strategy of the late 1990s, because manual curation doesn’t scale to whole-web sizes and can’t be personalized. Yet hand selection of articles is what human editors do every day in assembling the front page. It is valuable, but can’t fulfill every need.
Informing people takes more than reporting
Like a web search engine, journalism is about getting people the accurate information they need or want. But professional journalism is built upon pre-digital institutions and economic models, and newsrooms are geared around content creation, not getting people information. The distinction is important, and journalism’s lack of attention to information filtering and organization seems like a big omission, an omission that explains why technology companies have become powerful players in news.
I don’t mean to suggest that going out and getting the story — aka “reporting” — isn’t important. Obviously, someone has to provide the original report that then ricochets through the web via social media, links, and endless reblogging. Further, there is evidence that very few people do original reporting. Last year I counted the percentage of news outlets did their own reporting on one big story, and found that only 13 of 121 stories listed on Google News did not simply copy information found elsewhere. A contemporaneous Pew study of the news ecosystem of Baltimore found that most reporting was still done by print newspapers, with very little contributed by “new media,” though this study has been criticized for a number of potentially serious category problems. I’ve also repeatedly experienced the power that a single original report can have, as when I made a few phone calls to discover that Jurgen Habermas is not on Twitter, or worked with AP colleagues to get the first confirmation from network operators that Egypt had dropped off the internet. Working in a newsroom, obsessively watching the news propagate through the web, I see this every day: it’s amazing how few people actually pump original reports into the ecosystem.
But reporting isn’t everything. It’s not nearly enough. Reporting is just one part of ensuring that important public information is available, findable, and known. This is where journalism can learn something from search engines, because I suspect what we really want is a hybrid of human and algorithmic judgement.
As conceived in the pre-digital era, news is a non-personalized, non-interactive stream of updates about a small number of local or global stories. The first and most obvious departure from this model would be the ability to search within a news product for particular stories of interest. But the search function on most news websites is terrible, and mostly fails at the core task of helping people find the best stories about a topic of interest. If you doubt this, try going to your favorite news site and searching for that good story that you read there last month. Partially this is technical neglect. But at root this problem is about newsroom culture: the primary product is seen to be getting the news out, not helping people find what is there. (Also, professional journalism is really bad at linking between stories, and most news orgs don’t do fine-grained tracking of social sharing of their content, which are two of primary signals that search engines use to determine which articles are the most relevant.)
Story-specific news applications
We are seeing signs of a new kind of hybrid journalism that is as much about software as it is about about reporting. It’s still difficult to put names to what is happening, but terms like “news application” are emerging. There has been much recent discussion of the news app, including a session at the National Institute of Computer-Assisted Reporting conference in February, and landmark posts on the topic at Poynter and NiemanLab. Good examples of the genre include ProPublica’s dialysis facility locator, which combines investigative reporting with a search engine built on top of government data, and the Los Angeles Time’s real-time crime map, which plots LAPD data across multiple precincts and automatically detects statistically significant spikes. Both can be thought of as story-specific search engines, optimized for particular editorial purposes.
Yet the news apps of today are just toes in the water. It is no disrespect to all of the talented people currently working in the field say this, because we are at the beginning of something very big. One common thread in recent discussion of news apps has been a certain disappointment at the slow rate of adoption of the journalist-programmer paradigm throughout the industry. Indeed, with Matt Waite’s layoff from Politifact, despite a Pulitzer Prize for his work, some people are wondering if there’s any future at all in the form. My response is that we haven’t even begun to see the full potential of software combined with journalism. We are under-selling the news app because we are under-imagining it.
I want to apply search engine technology to tell stories. “Story” might not even be the right metaphor, because the experience I envision is interactive and non-linear, adapting to the user’s level of knowledge and interest, worth return visits and handy in varied circumstances. I don’t want a topic page, I want a topic app. Suppose I’m interested in — or I have been directed via headline to — the subject of refugees and internal migration. A text story about refugees due to war and other catastrophes is an obvious introduction, especially if it includes maps and other multimedia. And that would typically be the end of the story by today’s conventions. But we can do deeper. The International Organization for Migration maintains detailed statistics on the topic. We could plot that data, make it searchable and linkable. Now we’re at about the level of a good news app today. Let’s go further by making it live, not a visualization of a data set but a visualization of a data feed, an automatically updating information resource that is by definition evergreen. And then let’s pull in all of the good stories concerning migration, whether or not our own newsroom wrote them. (As a consumer, the reporting supply chain is not my problem, and I’ve argued before that news organizations need to do much more content syndication and sharing.) Let’s build a search engine on top of every last scrap of refugee-related content we can find. We could start with classic keyword search techniques, augment them by link analysis weighted toward sources we trust, and ingest and analyze the social streams of whichever communities deal with the issue. Then we can tune the whole system using our editorial-judgment-expressed-as-algorithms to serve up the most accurate and relevant content not only today, but every day in the future. Licensed content we can show within our product, and all else we can simply link to, but the search engine needs to be a complete index.
Rather than (always, only) writing stories, we should be trying to solve the problem of comprehensively informing the user on a particular topic. Web search is great, and we certainly need top-level “index everything” systems, but I’m thinking of more narrowly focussed projects. Choose a topic and start with traditional reporting, content creation, in-house explainers and multimedia stories. Then integrate a story-specific search engine that gathers together absolutely everything else that can be gathered on that topic, and applies whatever niche filtering, social curation, visualization, interaction and communication techniques are most appropriate. We can shape the algorithms to suit the subject. To really pull this off, such editorially-driven search engines need to be both live in the sense of automatically incorporating new material from external feeds, and comprehensive in the sense of being an interface to as much information on the topic as possible. Comprehensiveness will keep users coming back to your product and not someone else’s, and the idea of covering 100% of a story is itself powerful.
Other people’s content is content too
The brutal economics of online publishing dictate that we meet the needs of our users with as little paid staff time as possible. That drives the production process toward algorithms and outsourced content. This might mean indexing and linking to other people’s work, syndication deals that let a news site run content created by other people, or a blog network that bright people like to contribute to. It’s very hard for the culture of professional journalism to accept this idea, the idea that they should leverage other people’s work as far as they possibly can for as cheap as they can possibly get it, because many journalists and publishers feel burned by aggregation. But aggregation is incredibly useful, while the feelings and job descriptions of newsroom personnel are irrelevant to the consumer. As Sun Microsystems founder Bill Joy put it, “no matter who you are, most of the smartest people work for someone else,” and the idea that a single newsroom can produce the world’s best content on every topic is a damaging myth. That’s the fundamental value proposition of aggregation — all of the best stuff in one place. The word “best” represents editorial judgement in the classic sense, still a key part of a news organization’s brand, and that judgement can be embodied in whatever algorithms and social software are designed to do the aggregation. I realize that there are economic issues around getting paid for producing content, but that’s the sort of thing that needs to be solved by better content marketplaces, not lawsuits and walled gardens.
None of this means that reporters shouldn’t produce regular stories on their beats, or that there aren’t plenty of topics which require lots of original reporting and original content. But asking who did the reporting or made the content misses the point. A really good news application/interactive story/editorial search engine should be able to teach us as much as we care to learn about the topic, regardless of the state of our previous knowledge, and no matter who originally created the most relevant material.
What I am suggesting comes down to this: maybe a digital news product isn’t a collection of stories, but a system for learning about the world. For that to happen, news applications are going to need to do a lot of algorithmically-enhanced organization of content originally created by other people. This idea is antithetical to current newsroom culture and the traditional structure of the journalism industry. But it also points the way to more useful digital news products: more integration of outside sources, better search and personalization, and story-specific news applications that embody whatever combination of original content, human curation, and editorial algorithms will best help the user to learn.
[Updated 27 March with more material on social signals in search, Bill Joy’s maxim, and other good bits.]
[Updated 1 April with section titles.]
51 thoughts on “The editorial search engine”
Great article – what do you think about sensor networks? Once stories are collected through technology agents, the role of a journalist would be more of providing an opinion of the facts?
Agree on all points except: “Rather than writing stories, we should be trying to solve the problem of comprehensively informing the user on a particular topic.” Humans take in information via stories – that’s why everyone talks so much about story “hooks.” So it makes sense that journalists don’t flock to collecting a whole bunch of content and data in one place without offering context to that content, via the most compelling story. The why leads readers into wanting to learn the what. If we don’t figure out what matters to our readers, we will have wasted a whole lot of time building a uselessly intricate, all-encompassing tool – that no one reads.
Capturing and expressing 100% of a topic seems to be the key editorial concept. Nice piece.
Widespread, enthusiastic agreement. I’d offer a reminder that “traditional” (i.e. Google pagerank) search algorithms are failing to meet their mission at this global scale, and search companies know it. They’re looking for better ways to do what you’re describing: comprehensively informing people on a topic. And I’ll agree with Daniel: it’s the completeness of the answer that will matter.
Moving news companies out of the selling-your-attention business and into the information business isn’t an easy pitch. But thanks for helping move it along.
It’s important to understand that some of the shift behind the rhetoric from “objective” to “opinion” is driven by the legal landscape. Very roughly, a claim of “opinion” currently gets a lot of protection under First Amendment law, while “objective” would seem to go more into a realm of business process which would be much more subject to government investigation and regulation. This has huge implications given Google’s problems with charges of unfairness and anti-trust.
This editorial search engine is a great concept. To be especially useful, it needs effective, easy navigation according to the user’s interests in each of the nuances of the big topic. Giving me everything I might want is good, but helping me find exactly want I want is a true game-changer.
Are you developing such a search engine? This is exactly the kind of service AP should be providing for the news business.
Jonathan, very interesting, very elaborate, very enlightening.
I do not think that Matt Cutts’ “editorial judgment” means the same that journalists mean by “editorial judgment”. I do not think the word “information” has the same meaning in Googlers’ minds than in journalists’.
Are not you confusing “information” with “organized and retrievable data”? Just by having all the relevant data stacked in the best “accurate” and “precise” way… do we have the information we need? Do you think this editorial search engine is useful to cover all human interests (how does poltics and policy mix in?, i.e.)?
It’s a great idea for many areas or knowledge and learning. Pushing it to the extreme doesn’t make it a technocratic ideal that dents freedom?
Lots of great replies here.
Kadijah: I don’t mean to denigrate the power of stories, or, more generally, narrative. We’ve been explaining the world to each other through stories for thousands of years, and I’m sure we’ll continue to do so. But there are new types of stories, for example visual and interactive stories, and a story is still sort of an atomic unit, only a piece of a much larger service — consider the relation between “story” and “newspaper.”
Daniel: capturing 100% of a topic is indeed key. What I’m arguing here is a) that’s impossible without clever automation and b) that automation includes “editorial judgement” in the classic sense.
Seth: you are right that that some of Google’s language shift seems to be due to legal reasons, and the article I linked above discusses this in a more nuanced way. But I think the point stands: you can’t write a page ranking algorithm without a some point looking at the results and asking, “is this a good ranking?” and it’s at that point that human judgements come in.
Steve: “Giving me everything I might want is good, but helping me find exactly want I want is a true game-changer.” Extremely well put, thanks for this clarification. And you may be interested in some of the research work around the AP’s Knight News Challenge application.
Toni: I don’t believe in technocratic ideals. I believe in building computer systems that do what humans want. Are not both a human editor and the authors of Google’s algorithms attempting to select the material that is, in some way, “best”? And I see no contradiction with politics and policy — we are free to design the algorithms to treat politics and policy-related material in any way that we find satisfying. But I would be interested in hearing more about the difference you see between Google’s “information” and a journalists “information,” because I think there may be an important distinction here.
Thank you everyone for the lively response.
Come to think of it, you might enjoy a book chapter I wrote a few years ago:
“Google, Links, and Popularity versus Authority”
See in particular page 108, where I draw analogies to journalistic “algorithms”
You should have a look at http://www.newsblick.net/
From the FAQ page:
What is Newsblick?
Newsblick gathers messages containing links from the internet (more precisely Twitter) and presents them in real-time. The goal is to give you a quick overview of what is happening in the world from a wide variety of sources in an ordered, clearly arranged and searchable way. Furthermore it helps you discover news-websites and blogs.
How does it work?
Newsblick learns which sources its users prefer and updates the list of sources accordingly. By favourizing and blocking sources you can affect the sources of tomorrow. Thus, Newsblick reflects the overall preferences of its users.
Great read, certainly some things to think about. My only concern with so much aggregation is what it might mean, or might seem to mean, to news organizations in regards to what appropriate staffing for a newsroom or overseas bureau is. If papers (antiquated term, I realize. Maybe “organizations” is better?) focus on creating apps and rely on the content being produced by others, we lose some of the diversity of coverage that helps to keep journalism honest. And by diversity I don’t mean to say that twenty reporters will leave the same press conference with different takes on the same statements, but rather that twenty journalists keep us less dependent on one fallible human being, who could theoretically be more likely to err in reporting or outright collude with powerful interests. This would be the case even more so with smaller outfits already looking for reasons not to pay reporters, which means we’d rely on the big dogs primarily (the Times, the Journal). Judith Miller in the run-up to the Iraq war is one example of the dangers of this. Another can be found in the law professor Yochai Benkler’s recently released study of Wikileaks and it’s coverage in the media, where he wrote that “about two-thirds of news reports incorrectly reported that Wikileaks had simply dumped over 250,000 classified cables onto the net,” when, “In fact, Wikileaks made that large number of cables available only privately,” in “highly-censored form,” to the news organizations they chose to work with. I think Benkler related this to organizations re-reporting stories already published rather than their own independent reporting, but I’m having a hard time finding that statement right now.
So anyway, this is not to say that we don’t need better programming and better linking in journalism or that the “do what you do best, link to the rest” adage is untrue–I think that largely that is true–but what might ultimately result if this moves more quickly than the business model to sustain many reporters in the field? Journalism is an expensive enterprise and financially inefficient by the basic logic that we needs lots of people on the ground. We’ve gotten away with it for a while when ad space was finite and people subscribed, but things are changing.
Thank you very much for the link to your chapter on Google’s search ranking algorithm. I have never seen a clearer non-technical explanation of the hard choices one has to make in building a page ranking algorithm, and their unintended social consequences. Clearly, everything I wrote was remedial for you!
Ya, knowledge does not end with publication, it starts with it.
What an excellent piece. I think you especially hit the nail on the head when you say that the “root this problem is about newsroom culture.”
Just to hone in on linking, it’s interesting just to look at this blog post and compare it structurally to both opinion and news pieces on the electronic versions of most newspapers. Your post, with its integrated hyperlinks, provides a layer of data curation by pointing readers to useful and relevant information available elsewhere on the web. A typical newspaper piece either has no inline links, or they are produced automatically by (a generally poor) algorithm – and usually link to resources on the same domain. And it is shockingly common to electronically-delivered newspaper articles that talk directly about a web resource (website, YouTube video, Twitter account or tweet, etc.) without linking to it.
All of this is unfortunate, too, because technology offers the ability for newspapers to offer so much context to stories that is simply not possible with paper. Yet rather than embrace this, traditional newspaper publishers seem either to ignore these possibilities, or – bizarrely – be actively hostile toward them.
i get to be the one who disagrees with you. I worked at Bloomberg, where the terminal — the console that feeds clients the news — does a lot of this automatic updating of all the related charts, graphs and related-news links. It is a royal pain.
If you want to illustrate a story with a recent stock decline, for example, you want the readers 5 years from now to see that same two-week stock chart that you are looking at now. You don’t want them to see whatever two weeks happen to precede their reading.
Similarly, when you put “related news” links into a story, you are showing the stories that influenced your thinking on this very story. The automated systems that constantly update links of supposedly related news — a “feature” in most WordPress blogs, for example — just aren’t there yet. The biggest problem with those that I’ve seen today are that they don’t distinguish between stories that came out before and those that came out after the news article I’m reading. It is very confusing to read a story and see a related link that contradicts the story I’m reading, perhaps because it’s referring to news separated by 2 or 10 years from the story at hand.
Basically, there is a lot to be said for static information. Individual news stories don’t need to be littered up with all sorts of search boxes. Most browsers nowadays have a search box on the screen at all times. Why replace that? Why not just highlight the text you’re interested, and “drill down” to that with your own search engine of choice? Why should news organizations get into the costly, time-consuming search business rather than doing what we do now, which is outsource it to Google and Friends?
This is a fascinating article, it’s a really insightful comment about the state of the media, at the brink of the era of the nano-casting, the ultra-focused personalization media model.
This new media model is really a paradigm shift in the way news and publication have been since the printing press! Even as Digg.com and other social sites appeared, they were still expressing the voice of the “majority” as in a “democratic” system.
Today, this model is shifting, away from the democratic system but to a more postmodern-like system focused on multiple identities and communities, as well as hyper focused individualization, as opposed to the “mass-media” and “democracy of the industrial “modern” age.
This new model of media and information distribution, based on topics and on community interests, reputation, trust and engagement, can greatly and massively augment the actual amount of good information that actually reaches it’s audience. It would be a really powerful information delivery tool, that would help to solve the information overload “fact” by focusing on the filters, at the individual, and at the community level.
By multiplying the screen “real estate” and adapting it to the user’s needs, interests and likes, (implicit and explicit), there is no need to focus on the most popular news since there is no one “front page” anymore, but potentially millions!!
This idea has been driving me for years, as complete explode the current media distribution model would really allow to use the real power hidden in the social network!
Thought-provoking piece. On this part:
>> “A really good news application/interactive story/editorial search engine should be able to teach us as much as we care to learn about the topic, regardless of the state of our previous knowledge, and no matter who originally created the most relevant material.”
If News Organization A built such an application around a topic (the economy, for instance), and it did everything you suggest, then how could (or should) News Organization B go about creating something of its own that is equally comprehensive yet different enough to draw its own audience.
What I am getting at is, will such comprehension and aggregation naturally winnow out competition and differentiation? Not saying that’s a bad thing, because I believe the best products deserve to dominate.
since when is it a news outlet’s job to provide all information on a given topic? personally i turn to news to get the topical. true, archives don’t work too well, but creating a better search within one site is quite a different thing from providing all info on a single topic. that’s simply not what consumers turn to you for, and i’m still not sure, after reading, why a news outlet would try to compete with a search engine like Google in that arena.
you seem really to want a better tool for yourself as a journalist (which of course would serve any news consumer too) — and on the editorial thing i’d certainly agree. in fact, when i’ve been looking into stories to cover, vague ideas that i haven’t seen reported much, i’ve typed ” [xxxx] controversial” into Google to see what comes up! could it be the start of an editorial search engine??
I am honestly not as well accustomed to this topic but I do prefer to explore blogs for layout concepts and interesting subjects. You certainly described a that I normally really don’t care significantly about and crafted it tremendously fascinating. This can be a good blog that I’ll be aware of. review on info in website
Hey there would you mind stating which blog platform you’re working with? I’m going to start my own blog in the near future but I’m having a difficult time choosing between BlogEngine/Wordpress/B2evolution and Drupal. The reason I ask is because your layout seems different then most blogs and I’m looking for something completely unique. P.S My apologies for being off-topic but I had to ask!
It’s a WordPress blog with a custom theme.
Great article !
Building a search engine on top of “topics” is exactly what we are trying to achieve. We’ve just set up a lab to demonstrate the possibilities where the subject is “data journalism” itself. The project is in a beta-phase, but the results are quiet promising.
The system needs fine-tuning to fit the newsrooms of the future, any ideas are welcome.
I’ve been striving every thing and regrettably they won’t operate. I attempted publishing a playlist within my wordpress within the ‘Pages’ area however the playlist won’t demonstrate. Can someone you should help with this? .
Have you ever considered creating an ebook or guest authoring on other websites?
I have a blog based on the same topics you discuss and
would really like to have you share some stories/information. I know
my audience would value your work. If you are even remotely interested, feel free to shoot me an e mail.
If some one wants expert view about blogging afterward i advise
him/her to pay a visit this weblog, Keep up the pleasant
S a strength of character factory wit 28248 h shrewd contours to are handleld def——– , Not with rank the detail to this is Dattani;s paramount cooperate . With genus relationships the same as the focus of dramatic representation , Dattani;s usage of the performance gap suggests to his audiences the va
Intended instead of guys and instead of women, several of the top Croc 5477 sRx products include CrocsRx Custom made Cloud, CrocsRx Silver Cloud, and CrocsRx bsoFwc5IX8m Relief. Several of the prime colours instead of CrocRx Silver Cloud are chocolate, fuchsia, and light blue, although CrocRx Relief comes