The Curious Journalist’s Guide to Data. A free book on using quantitative principles in journalism. March 2016.
Workbench. A modular data processing platform for scraping, cleaning, and analyzing data in a transparent and reproducible way, without coding.
The Overview Project. An open source large document set analysis system built for investigative journalism.
Frontiers of Computational Journalism (Fall 2018) My course for the dual masters degree in computer science and journalism at Columbia. Requires some CS background. A much older version is available as lecture videos for the 2013 syllabus.
Algorithms in Journalism (Summer 2018) – Syllabus and materials for my course in the LEDE Program at Columbia. Covers text analysis, machine learning, election prediction, simulations, and algorithmic accountability work in journalism. Prerequisite is basic Python/Pandas.
Interactive Data Journalism in One Semester. A free course syllabus, with all materials, for taking students with no previous data experience to interactive visualization in 14 weeks.
All The Things
Changing Journalism Workflows for Transparency (video) and slides. How do we get data journalists to publish their methods along with their stories, without asking them to do extra work? A talk at iAnnotate18, June 2018.
Researcher and journalist collaboration: What’s working and what isn’t. A panel at the Computation + Journalism Symposium where I discuss the “research to reporting gap.” Northwestern University, September 2017.
The Ethics of Persuasion. What are the norms that separate ethical from unethical persuasion? A talk at the DOD’s Joint Concept for Operating in the Information Environment workshop, August 2017.
Network Analysis in Journalism: Practices and Possibilities. What kinds of network analysis do journalists do, and what kinds could they do if they had better tech? KDD Data Science and Journalism workshop, August 2017. Talk video and slides.
Practical Digital Security for Journalists (video). Everything I know about journosec in one hour. Kiplinger Fellowship, Ohio State University, April 2017.
Social Science Replication in the Age of Software, SSRC Parameters. If we want replicable research, we need to preserve the code, not just the data. March 2017.
The age of the Cyborg, Columbia Journalism Review. How journalists are using AI techniques to augment, not replace, their work. November 2016.
The polls didn’t fail, we just chose to ignore the math. Quartz, November 2016
“What do journalists do with documents? Field notes for Natural Language Processing Researchers.” Talk video and paper. Presented at Computation+Journalism Symposium at Stanford University, September 2016
What can peacebuilders in the global North and South learn from each other about the ethics of peacetech? (video) with Diana Dajer. Build Peace conference, Zurich, September 2016
Platforms and Algorithms in European Journalism (video). An address at the European Broadcasting Union’s annual conference. June 2016.
Practical Digital Security for Journalists (video). A one-lecture introduction to keeping sources and staff safe. Ohio State University, April 2016. Updated above.
Data Journalism Fundamentals MOOC and accompanying video lectures. The first ever data journalism MOOC specifically for Asian journalists. I did the “data analysis” week. University of Hong Kong, March 2016.
Solve Every Statistics Problem with One Weird Trick (video). A most unserious lightning talk about randomization in statistics. NICAR, March 2016
How much influence does the media really have over elections? Digging into the data. Does amount of coverage influence polling results? In the primaries, yes, but it’s complicated. Nieman Journalism Lab, January 2016.
A Brief Guide to Robot Reporting Tools, Nieman Reports, September 2015.
Knowledge Management in Investigative Journalism (video). Talks from a conference I organized in London with people from OCCRP, ICIJ, and other data-driven investigative news organizations. September 2015.
Chinese Data Journalism Manual (数据新闻手册), with Yolanda Ma. Based on two years of workshops we delivered to Chinese journalists. DJChina.org, September 2015.
Surgeon Scorecard. I did much of the statistical programming for this first ever public analysis of per-surgeon complication rates. ProPublica, July 2015.
Seeing Media Polarization Through Data, slides for a workshop at Build Peace, April 2015.
Rapid Rise in Super-PACs Dominated by Single Donors. With Robert Faturetchi. ProPublica, April 2015.
Take two steps back from journalism: What are the editorial products we’re not building? Nieman Journalism Lab, March 2015.
From Algorithms to Stories. Or, three lessons in building tools for computational journalism. 2nd Kavli Symposium on Science Journalism, February 2015
Overview: The Design, Adoption, and Analysis of a Visual Document Mining Tool For Investigative Journalists. With Matthew Brehmer, Stephen Ingram, Tamara Munzner, IEEE Transactions on Visualization, November 2014.
Unseen Toll: Wages of Millions Seized to Pay Past Debts. With Paul Kiel, Chris Arnold. I scraped and analyzed several million court records for this story. ProPublica, September 2014.
Security for Journalists, Part One: The Basics and Security for Journalists, Part Two: Threat Modeling. My attempt at an all-inclusive guide. Still mostly accurate, though appropriate tools and tactics have changed. OpenNews Source, August 2014.
Confusing Marriage and Violence Prevention. On interpreting a tricky correlation. The Atlantic, June 2014.
The Document Mining Pulitzers, Overview Project, May 2014.
You got the documents. Now what? A guide to reporting from large document dumps. OpenNews Source, April 2014.
Lord Byron plays Fetch with His Mousey Toy (video). Welcome to my cat. YouTube, January 2014.
Algorithms are not enough: Lessons applying computer science to journalism. My experience trying to apply advanced text analysis to investigative reporting. Tow Center for Digital Journalism, January 2014.
What do journalists do with documents? The different kinds of document-driven stories. Overview Project, January 2014.
FAQ: What You Need to Know About the NSA’s Surveillance Programs, ProPublica. August 2013.
Peace, Conflict, and Data. (video) A talk at the IPSI Bologna Symposium on Conflict Resolution, July 2013
Text Analysis in Transparency. (video) A talk at Sunlight Labs, May 2013.
Objectivity and the decades-long shift from “just the facts” to “what does it mean?” Nieman Journalism Lab, May 2013
How does a country get to open data? What Taiwan can teach us about the evolution of access. Nieman Journalism Lab, April 2013.
Computer Science and Journalism: Two Great Tastes that Taste Great Together. A public lecture at the University of Hong Kong, February 2013.
The Whole Dysfunctional National Conversation About Guns—on Twitter … in One Interactive Graph, The Atlantic. A interactive visualization of a twitter network analysis showing the extreme polarization around this issue. February 2013
Gun Violence in America: The 13 Key Questions (With 13 Concise Answers). The Atlantic, February 2013
Who Should See What When? Three Principles for Personalized News. Nieman Journalism Lab, July 2012.
Metrics, metrics everywhere: How do we measure the impact of journalism? Nieman Journalism Lab. This turned out to be a surprisingly influential article, probably because it was early in the journalism metrics conversation. Perhaps my most cited work. August 2012.
Are we stuck in filter bubbles? Here are five potential paths out. Nieman Journalism Lab, July 2012
There’s no such thing as an objective filter. Why designing algorithms that tell us the news is hard. Nieman Journalism Lab, June 2012.
Beyond the crime scene: We need new and better models for crime reporting. Nieman Journalism Lab, June 2012.
How do you tell when the news is biased? It depends on how you see yourself. Nieman Journalism Lab, June 2012
What is it that journalists do? It can’t be reduced to just one thing. Nieman Journalism Lab, May 2012
Using the Overview prototype for document mining (video). An early demonstration at NICAR. March 2012.
What did private security contractors do in Iraq? Associated Press, February 2012. An analysis of 4500 pages of recently declassified documents, using a prototype of the Overview platform.
A Full-text Visualization of the Iraq War Logs. Associated Press, December 2010. A really neat proof-of-concept image of NLP and visualization in document set mining for journalism.
Investigating thousands (or millions) of documents by clustering (video). Talk on proof-of-concept document mining work applied to Wikileaks material, NICAR, February 2011.
How The Guardian is Pioneering Data Journalism with Free Tools, Nieman Journalism Lab, August 2010.
Linking by the numbers: How news organizations are using links (or not). Nieman Journalism Lab, June 2010.
Making connections: How major news organizations talk about links. Nieman Journalism Lab, June 2010.
Why link out? Four journalistic purposes of the noble hyperlink. Nieman Journalism Lab, 2010.
Drawing out the audience: Inside BBC’s User-Generated Content Hub. Nieman Journalism Lab, May 2010.
Is this the future of Journalism? Why Wikileaks matters. Foreign Policy, April 2010.
After Google’s Move, a Shift in Search Terms. With Lili Lee, The New York Times, March 2010.
The Google/China hacking case: How many news outlets do the original reporting on a big story? Nieman Journalism Lab. A quantitative analysis of who actually digs up original facts and who just rewrites other stories. February 2010.
Play Paywall!, the new web game sweeping the newspaper industry. Nieman Journalism Lab. An interactive paywall revenue calculator for news organizations. January 2010.
Web browser flaw could put e-commerce security at risk. CNET. Remember when MD5 hash collisions threatened to break HTTPS? December 2008.