WhatÂ types of defenses against disinformation are possible? And which of these would we actually want to use in a democracy, where approaches like censorship can impinge onÂ important freedoms? To try to answer these questions, I looked at what threeÂ counter-disinformation organizations are actually doing today, and categorized their tactics.
The EU East StratCom Task Force is a contemporary government counter-propaganda agency. Facebook has made numerous changes to its operations to try to combat disinformation, and is a good example of what platforms can do. The Chinese information regime is a marvel of networked information control, and provokes questions about what a democracy should and should not do.
There are many kinds of questions about discrimination fairness or bias where data is relevant. Who gets stopped on the road by the police? Who gets admitted to college? Who gets approved for a loan, and who doesn’t? The data-driven analysis of fairness has become even more important as we start to deploy algorithmic decision making across society.
I attempted to synthesize an introductory framework for thinking about what fairness means in a quantitative sense, and how these mathematical definitions connect to legal and moral principles and our real world institutions of criminal justice, employment, lending, and so on. I ended up with two talks.
This short talk (20 minutes), part of a panel at the Investigative Reporters & Editors conference, has no math. (Slides)
This longer talk (50 minutes), presented at Code for America SF, gets into a lot more depth, including the mathematical definitions of different types of fairness, and the whole tricky issue of whether or not algorithms should be “blinded” to attributes like race and gender. It also includes several case studies of real algorithmic systems, and discusses how we might design such systems to reduce bias. (Slides)
My favorite resources on these topics:
The Workbench workflowÂ analyzing Massachusetts traffic ticket data.
Sandra Mayson, Bias In, Bias Out. One ofÂ myÂ favorite overall discussions of algorithmic bias.
Update, Oct 2020: we’ve done a lot more since this post! If you want to try working on this problem, Weights and Biases is very kindly hosting a public benchmark.
I’ve just completed an experiment to extract information from TV station political advertising disclosure forms using deep learning. In the process I’ve produced a challenging journalism-relevant dataset for NLP/AI researchers. Original data from ProPublica’s Free The FilesÂ project.
The resulting model achieves 90% accuracy extracting total spending from the PDFs in the (held out) test set, which shows that deep learning can generalize surprisingly well to previously unseen form types. I expect it could be made much more accurate through some feature engineering (see below.)
You can find the code and documentation here. Full thanks to my collaborator Nicholas Bardy of Weights & Biases.
TV stations are required to disclose their sale of political advertising, but there is no requirement that this disclosure is machine readable. Every election, tens of thousands of PDFs are posted to the FCC Public File, available atÂ https://publicfiles.fcc.gov/. All of these contain essentially the same information, but in in hundreds of different formats, like these: