Publications and Talks

I do a lot of work that isn’t on this blog.

Big Things

The Curious Journalist’s Guide to Data. A free book on using quantitative principles in journalism. March 2016.

The Overview Project. An open source large document set analysis system built for investigative journalism.

Frontiers of Computational Journalism (Fall 2016) My required course for the dual masters degree in computer science and journalism at Columbia. The course evolves every year but you can watch the lecture videos for the 2013 version.

Interactive Data Journalism in One Semester. A free course syllabus, with all materials, for taking students with no previous data experience to interactive visualization in 14 weeks.

All The Things

Practical Digital Security for Journalists (video). Everything I know about journosec in one hour. Kiplinger Fellowship, Ohio State University, April 2017.

Social Science Replication in the Age of Software, SSRC Parameters. If we want replicable research, we need to preserve the code, not just the data. March 2017.

The age of the Cyborg, Columbia Journalism Review. How journalists are using AI techniques to augment, not replace, their work. November 2016.

The polls didn’t fail, we just chose to ignore the math. Quartz, November 2016

“What do journalists do with documents? Field notes for Natural Language Processing Researchers.” Talk video and paper. Presented at Computation+Journalism Symposium at Stanford University, September 2016

What can peacebuilders in the global North and South learn from each other about the ethics of peacetech? (video) with Diana Dajer. Build Peace conference, Zurich, September 2016

Platforms and Algorithms in European Journalism (video). An address at the European Broadcasting Union’s annual conference. June 2016.

Practical Digital Security for Journalists (video).  A one-lecture introduction to keeping sources and staff safe. Ohio State University, April 2016. Updated above.

Data Journalism Fundamentals MOOC. The first ever data journalism MOOC specifically for Asian journalists. I did the “data analysis” week. University of Hong Kong, March 2016.

Solve Every Statistics Problem with One Weird Trick (video). A most unserious lightning talk about randomization in statistics. NICAR, March 2016

How much influence does the media really have over elections? Digging into the data. Does amount of coverage influence polling results? In the primaries, yes, but it’s complicated. Nieman Journalism Lab, January 2016.

A Brief Guide to Robot Reporting Tools, Nieman Reports, September 2015.

Knowledge Management in Investigative Journalism (video). Talks from a conference I organized in London in September 2015 including representatives from OCCRP, ICIJ, and other major data-driven investigative news organizations.

Chinese Data Journalism Manual (数据新闻手册), with Yolanda Ma. Based on two years of workshops we delivered to Chinese journalists. DJChina.org, September 2015.

Surgeon Scorecard. I did much of the statistical programming for this first ever public analysis of per-surgeon complication rates. ProPublica, July 2015.

Seeing Media Polarization Through Data, slides for a workshop at Build Peace, April 2015.

Rapid Rise in Super-PACs Dominated by Single Donors. With Robert Faturetchi. ProPublica, April 2015.

Take two steps back from journalism: What are the editorial products we’re not building? Nieman Journalism Lab, March 2015.

From Algorithms to Stories. Or, three lessons in building tools for computational journalism. 2nd Kavli Symposium on Science Journalism, February 2015

Overview: The Design, Adoption, and Analysis of a Visual Document Mining Tool For Investigative Journalists. With Matthew Brehmer, Stephen Ingram, Tamara Munzner, IEEE Transactions on Visualization, November 2014.

Unseen Toll: Wages of Millions Seized to Pay Past Debts. With Paul Kiel, Chris Arnold. I scraped and analyzed several million court records for this story. ProPublica, September 2014.

Security for Journalists, Part One: The Basics and Security for Journalists, Part Two: Threat Modeling. My attempt at an all-inclusive guide. Still mostly accurate, though appropriate tools and tactics have changed. OpenNews Source, August 2014.

Confusing Marriage and Violence Prevention. On interpreting a tricky correlation. The Atlantic, June 2014.

The Document Mining PulitzersOverview Project, May 2014.

You got the documents. Now what? A guide to reporting from large document dumps. OpenNews Source, April 2014.

Lord Byron plays Fetch with His Mousey Toy (video). Welcome to my cat. YouTube, January 2014.

Algorithms are not enough: Lessons applying computer science to journalism. My experience trying to apply advanced text analysis to investigative reporting. Tow Center for Digital Journalism, January 2014.

What do journalists do with documents? The different kinds of document-driven storiesOverview Project, January 2014.

FAQ: What You Need to Know About the NSA’s Surveillance Programs, ProPublica. August 2013.

Peace, Conflict, and Data. (video) A talk at the IPSI Bologna Symposium on Conflict Resolution, July 2013

Text Analysis in Transparency. (video) A talk at Sunlight Labs, May 2013.

Objectivity and the decades-long shift from “just the facts” to “what does it mean?” Nieman Journalism Lab, May 2013

How does a country get to open data? What Taiwan can teach us about the evolution of accessNieman Journalism Lab, April 2013.

Computer Science and Journalism: Two Great Tastes that Taste Great Together. A public lecture at the University of Hong Kong, February 2013.

The Whole Dysfunctional National Conversation About Guns—on Twitter … in One Interactive Graph, The Atlantic. A interactive visualization of a twitter network analysis showing the extreme polarization around this issue. February 2013

Gun Violence in America: The 13 Key Questions (With 13 Concise Answers). The Atlantic, February 2013

Who Should See What When? Three Principles for Personalized News. Nieman Journalism Lab, July 2012.

Metrics, metrics everywhere: How do we measure the impact of journalism? Nieman Journalism Lab. This turned out to be a surprisingly influential article, probably because it was early in the journalism metrics conversation. Perhaps my most cited work. August 2012.

Are we stuck in filter bubbles? Here are five potential paths out. Nieman Journalism Lab, July 2012

There’s no such thing as an objective filter. Why designing algorithms that tell us the news is hard. Nieman Journalism Lab, June 2012.

Beyond the crime scene: We need new and better models for crime reportingNieman Journalism Lab, June 2012.

How do you tell when the news is biased? It depends on how you see yourself. Nieman Journalism Lab, June 2012

What is it that journalists do? It can’t be reduced to just one thing. Nieman Journalism Lab, May 2012

Using the Overview prototype for document mining (video). An early demonstration at NICAR. March 2012.

What did private security contractors do in Iraq? Associated Press, February 2012. An analysis of 4500 pages of recently declassified documents, using a prototype of the Overview platform.

A Full-text Visualization of the Iraq War Logs. Associated Press, December 2010. A really neat proof-of-concept image of NLP and visualization in document set mining for journalism.

Wikileaks: How the leak was leaked. Associated Press, September 2011.

Egypt shuts down Internet on eve of protests, Associated Press, May 2011.

Investigating thousands (or millions) of documents by clustering (video). Talk on proof-of-concept document mining work applied to Wikileaks material, NICAR, February 2011.

How The Guardian is Pioneering Data Journalism with Free Tools, Nieman Journalism Lab, August 2010.

Is this the future of Journalism? Why Wikileaks matters. Foreign Policy, April 2010.

After Google’s Move, a Shift in Search Terms. With Lili Lee, The New York Times, March 2010.

The Google/China hacking case: How many news outlets do the original reporting on a big story? Nieman Journalism Lab. A quantitative analysis of who actually digs up original facts and who just rewrites other stories. February 2010.

Play Paywall!, the new web game sweeping the newspaper industry. Nieman Journalism Lab. An interactive paywall revenue calculator for news organizations. January 2010.

Web browser flaw could put e-commerce security at risk. CNET. Remember when MD5 hash collisions threatened to break HTTPS? December 2008.