Publications and Talks

Highlights

Understanding Recommenders. A jointly authored blog all about how recommenders work and how to build recommenders that are better for people and society.

The Algorithmic Management of Polarization and Violence on Social Media. With Ravi Iyer and Helena Puig-Larrauri. August 2023

Better Conflict Bulletin. A weekly newsletter dedicated to the idea that the U.S. domestic conflict could be better. We don’t cover the news, we cover the most productive responses to the news.

Building Human Values into Recommender Systems: An Interdisciplinary Synthesis. A big collaborative paper that covers values, metrics, product design, and policy. July 2022.

Designing Recommender Systems to Depolarize. How to build social media and news aggregators for better conflict. First Monday, May 2022

Aligning AI Optimization to Community Well-being. How large platforms have been using (something like) well-being metrics and how this could generalize to other AI systems. International Journal of Community Well-being, Dec 2020

Making AI work for Investigative Journalism, a paper about why progress has been so slow and where the near-term opportunity lies. Digital Journalism, July 2019

Teaching Materials

Designing Algorithmic Media (UC Berkeley, Spring 2024). Covers what we know about building information processing systems to promote informed citizens, well being, healthy conflict, transparency, democratic control, etc.

Risk Ratios. (NICAR 2022) A self-guided workbook that teaches risk ratios — crucial for thinking about vaccine effectiveness, corruption in politics, discrimination in hiring, and many other things.

Frontiers of Computational Journalism (Columbia, Fall 2018) including lecture videos. My course for the dual masters degree in computer science and journalism at Columbia.

Algorithms in Journalism (Columbia, Summer 2018) Syllabus and materials for my course in the LEDE Program at Columbia. Covers text analysis, machine learning, election prediction, simulations, and algorithmic accountability work in journalism. Prerequisite is basic Python/Pandas.

Interactive Data Journalism in One Semester. (CUNY, Spring 2016) A free course syllabus, with all materials, for taking students with no previous data experience to interactive visualization in 14 weeks.

The Curious Journalist’s Guide to Data. A book on using quantitative principles in journalism. No data background required. March 2016.

Software

Workbench. A modular data processing platform for scraping, cleaning, and analyzing data in a transparent and reproducible way, without coding.

Overview. An open source document mining system using natural language processing and data visualization.

ENIAC Chess. We wired the world’s first general purpose digital computer to play mediocre chess. “A deep hack” – Anonymous

Everything Else

AI as a Public Good: Ensuring Democratic Control of AI in the Information Space. I co-chaired the international working group that created this report at the Forum on Information and Democracy. February 2024.

Amicus brief from Center for Democracy and Technology re Gonzalez vs. Google. This was a U.S. Supreme Court case regarding recommender systems, terrorism, and the infamous section 230. It’s complicated, so I joined a number of experts encouraging the Court not set precedents that might break things. February 2023.

Democratic Control of Recommender Systems (video) and slides. A Metagovernance seminar talk on how the large platforms that filter and personalize content could be designed and operated according to democratic principles. November 2022.

What We Talk About When We Talk About Algorithms. I appeared on the LawfareBlog podcast to talk about recommender algorithms in social media, and how to think about concepts like “engagement” and “amplification.” July 2022.

Show me the algorithm: Transparency in recommendation systems. What would we ask platforms to reveal if we could ask for anything? Schwartz Reisman Institute, August 2021.

Can Social Media Help Heal Divided Nations? Podcast with Tech Policy Press on polarization and recommender systems. August 2021

The Practice of Recommender Alignment, a talk at the Schwartz Reisman Institute, U. Toronto, March 2021

Beyond Engagement: Aligning Algorithmic Recommendations With Prosocial Goals, the state of the art and a bibliography, a report on a workshop at Partnership on AI, January 2021

What are you optimizing for? Aligning Recommender Systems with Human Values with Steven Adler and Dylan Hadfield-Menell. How have we tried to get recommender systems to “do what we want” and what ideas can we bring in from AI alignment? See also the video. ICML Participatory ML workshop, July 2020.

Aligning AI to Human Values means Picking the Right Metrics. An argument for more measurement of human outcomes in AI, and some case studies where companies did like Facebook did this. April 2020.

Can you make AI fairer than a judge? Play our courtroom algorithm game with Karen Hao. An interactive narrative explaining the paradoxes in quantitative fairness measurements. See also the making of. October 2019.

An introduction to algorithmic bias and quantitative fairness. One short talk (at the Investigative Reporters and Editors conference) and one long talk (at Code for America) on these issues. June 2019.

Profit-maximizing AI. A talk on what AI is going to do to markets, reviewing recent research on algorithmic collusion, personalized pricing, etc. June 2019.

Institutional Counter-disinformation in a Networked Democracy, a paper for the International Workshop on Misinformation, Computational Fact-Checking and Credible Web, San Francisco, May 2019

Changing Journalism Workflows for Transparency (video) and slides. How do we get data journalists to publish their methods along with their stories, without asking them to do extra work? A talk at iAnnotate18, June 2018.

Researcher and journalist collaboration: What’s working and what isn’t. A panel at the Computation + Journalism Symposium where I discuss the “research to reporting gap.” Northwestern University, September 2017.

The Ethics of Persuasion. What are the norms that separate ethical from unethical persuasion? A talk at the DOD’s Joint Concept for Operating in the Information Environment workshop, August 2017.

Network Analysis in Journalism: Practices and Possibilities. What kinds of network analysis do journalists do, and what kinds could they do if they had better tech?  KDD Data Science and Journalism workshop, August 2017. Talk video and slides.

Practical Digital Security for Journalists (video). Everything I know about journosec in one hour. Kiplinger Fellowship, Ohio State University, April 2017.

Social Science Replication in the Age of Software, SSRC Parameters. If we want replicable research, we need to preserve the code, not just the data. March 2017.

The age of the Cyborg, Columbia Journalism Review. How journalists are using AI techniques to augment, not replace, their work. November 2016.

The polls didn’t fail, we just chose to ignore the math. Quartz, November 2016

“What do journalists do with documents? Field notes for Natural Language Processing Researchers.” Talk video and paper. Presented at Computation+Journalism Symposium at Stanford University, September 2016

What can peacebuilders in the global North and South learn from each other about the ethics of peacetech? (video) with Diana Dajer. Build Peace conference, Zurich, September 2016

Platforms and Algorithms in European Journalism (video). An address at the European Broadcasting Union’s annual conference. June 2016.

Practical Digital Security for Journalists (video).  A one-lecture introduction to keeping sources and staff safe. Ohio State University, April 2016. Updated above.

Data Journalism Fundamentals MOOC and accompanying video lectures. The first ever data journalism MOOC specifically for Asian journalists. I did the “data analysis” week. University of Hong Kong, March 2016.

Solve Every Statistics Problem with One Weird Trick (video). A most unserious lightning talk about randomization in statistics. NICAR, March 2016

How much influence does the media really have over elections? Digging into the data. Does amount of coverage influence polling results? In the primaries, yes, but it’s complicated. Nieman Journalism Lab, January 2016.

A Brief Guide to Robot Reporting Tools, Nieman Reports, September 2015.

Knowledge Management in Investigative Journalism (video). Talks from a conference I organized in London with people from OCCRP, ICIJ, and other data-driven investigative news organizations. September 2015.

Chinese Data Journalism Manual (数据新闻手册), with Yolanda Ma. Based on two years of workshops we delivered to Chinese journalists. DJChina.org, September 2015.

Surgeon Scorecard. I did much of the statistical programming for this first ever public analysis of per-surgeon complication rates. ProPublica, July 2015.

Seeing Media Polarization Through Data, slides for a workshop at Build Peace, April 2015.

Rapid Rise in Super-PACs Dominated by Single Donors. With Robert Faturetchi. ProPublica, April 2015.

Take two steps back from journalism: What are the editorial products we’re not building? Nieman Journalism Lab, March 2015.

From Algorithms to Stories. Or, three lessons in building tools for computational journalism. 2nd Kavli Symposium on Science Journalism, February 2015

Overview: The Design, Adoption, and Analysis of a Visual Document Mining Tool For Investigative Journalists. With Matthew Brehmer, Stephen Ingram, Tamara Munzner, IEEE Transactions on Visualization, November 2014.

Unseen Toll: Wages of Millions Seized to Pay Past Debts. With Paul Kiel, Chris Arnold. I scraped and analyzed several million court records for this story. ProPublica, September 2014.

Security for Journalists, Part One: The Basics and Security for Journalists, Part Two: Threat Modeling. My attempt at an all-inclusive guide. Still mostly accurate, though appropriate tools and tactics have changed. OpenNews Source, August 2014.

Confusing Marriage and Violence Prevention. On interpreting a tricky correlation. The Atlantic, June 2014.

The Document Mining PulitzersOverview Project, May 2014.

You got the documents. Now what? A guide to reporting from large document dumps. OpenNews Source, April 2014.

Lord Byron plays Fetch with His Mousey Toy (video). Welcome to my cat. YouTube, January 2014.

Algorithms are not enough: Lessons applying computer science to journalism. My experience trying to apply advanced text analysis to investigative reporting. Tow Center for Digital Journalism, January 2014.

What do journalists do with documents? The different kinds of document-driven storiesOverview Project, January 2014.

FAQ: What You Need to Know About the NSA’s Surveillance Programs, ProPublica. August 2013.

Peace, Conflict, and Data. (video) A talk at the IPSI Bologna Symposium on Conflict Resolution, July 2013

Text Analysis in Transparency. (video) A talk at Sunlight Labs, May 2013.

Objectivity and the decades-long shift from “just the facts” to “what does it mean?” Nieman Journalism Lab, May 2013

How does a country get to open data? What Taiwan can teach us about the evolution of accessNieman Journalism Lab, April 2013.

Computer Science and Journalism: Two Great Tastes that Taste Great Together. A public lecture at the University of Hong Kong, February 2013.

The Whole Dysfunctional National Conversation About Guns—on Twitter … in One Interactive Graph, The Atlantic. A interactive visualization of a twitter network analysis showing the extreme polarization around this issue. February 2013

Gun Violence in America: The 13 Key Questions (With 13 Concise Answers). The Atlantic, February 2013

Who Should See What When? Three Principles for Personalized News. Nieman Journalism Lab, July 2012.

Metrics, metrics everywhere: How do we measure the impact of journalism? Nieman Journalism Lab. This turned out to be a surprisingly influential article, probably because it was early in the journalism metrics conversation. Perhaps my most cited work. August 2012.

Are we stuck in filter bubbles? Here are five potential paths out. Nieman Journalism Lab, July 2012

There’s no such thing as an objective filter. Why designing algorithms that tell us the news is hard. Nieman Journalism Lab, June 2012.

Beyond the crime scene: We need new and better models for crime reportingNieman Journalism Lab, June 2012.

How do you tell when the news is biased? It depends on how you see yourself. Nieman Journalism Lab, June 2012

What is it that journalists do? It can’t be reduced to just one thing. Nieman Journalism Lab, May 2012

Using the Overview prototype for document mining (video). An early demonstration at NICAR. March 2012.

What did private security contractors do in Iraq? Associated Press, February 2012. An analysis of 4500 pages of recently declassified documents, using a prototype of the Overview platform.

A Full-text Visualization of the Iraq War Logs. Associated Press, December 2010. A really neat proof-of-concept image of NLP and visualization in document set mining for journalism.

Wikileaks: How the leak was leaked. Associated Press, September 2011.

Egypt shuts down Internet on eve of protests, Associated Press, May 2011.

Investigating thousands (or millions) of documents by clustering (video). Talk on proof-of-concept document mining work applied to Wikileaks material, NICAR, February 2011.

How The Guardian is Pioneering Data Journalism with Free Tools, Nieman Journalism Lab, August 2010.

Is this the future of Journalism? Why Wikileaks matters. Foreign Policy, April 2010.

After Google’s Move, a Shift in Search Terms. With Lili Lee, The New York Times, March 2010.

The Google/China hacking case: How many news outlets do the original reporting on a big story? Nieman Journalism Lab. A quantitative analysis of who actually digs up original facts and who just rewrites other stories. February 2010.

Play Paywall!, the new web game sweeping the newspaper industry. Nieman Journalism Lab. An interactive paywall revenue calculator for news organizations. January 2010.

Web browser flaw could put e-commerce security at risk. CNET. Remember when MD5 hash collisions threatened to break HTTPS? December 2008.