What Data Can’t Tell Us About Buying Politicians

Corruption in the classic sense is when a politician sells their influence. Quid pro quo, pay to play, or just an old fashioned bribe — whatever you want to call it, this is the smoking gun that every political journalist is trying to find. Recently, data journalists have begin to look for influence peddling using statistical techniques. This is promising, but the data has to be just right, and it’s really hard to turn it into proof.

To illustrate the problems, let’s look at a failure.

On August 23, the Associated Press released a bombshell of a story implying that Clinton was selling access to the US government in exchange for donations to her foundation. I’m impressed by the AP’s initiative in using primary documents to look into a serious question of political ethics. But this is not a good story. It’s already been criticized in various ways. It’s the statistics I want to talk about here — which are, in a word, wrong. (And perhaps the AP now agrees: they changed the headline and deleted the tweet.) Here’s the lede:

At least 85 of 154 people from private interests who met or had phone conversations scheduled with Clinton while she led the State Department donated to her family charity or pledged commitments to its international programs, according to a review of State Department calendars

There’s no question this has the appearance of something fishy. In that sense alone, it’s probably newsworthy. But the deeper question is not about the appearance, but whether there were in fact behind the scenes deals greased by money, and I think that this statistic is not nearly as strong as it seems. It’s fine to report something that looks bad, but I think news organizations also need to clearly explain when the evidence is limited — or maybe not make an ambiguous statistic the third word in the story.

So here, in detail, are the limitations of this type of data and analysis. Continue reading What Data Can’t Tell Us About Buying Politicians

Sometimes an algorithm really is (politically) unbiased

Facebook just announced that they will remove humans from the production process for their “trending” news section (the list of stories on the right of the page, not the main news feed.) Previously, they’ve had to defend themselves from accusations of liberal bias in how these stories were selected. Removing humans is designed, in part, to address these concerns.

The reaction among technologically literate press and scholars (e.g. here, here, and here) has been skeptical. They point out that algorithms are not unbiased; they are created by humans and operate on human data.

I have to disagree with my colleagues here. I think this change does, or could, remove an important type of bias: a preference along the US liberal-conservative axis. Further, relying on algorithmic processes rather than human processes leads to a sort of procedural fairness. You know that every story is going to be considered for inclusion in the “trending” box in exactly the same way. (Actually, I don’t believe that Facebook’s trending topics were ever politically biased — the evidence was always thin — but this is as much about appearance and legitimacy as any actual wrongdoing.)

Of course algorithms are not at all “unbiased.” I’ve been one of many voices saying this for a long time. I’ve written about the impossibility of creating an objective news filtering algorithm. I teach the students in my computational journalism class how to create such algorithms, and we talk about this a lot. Algorithmic techniques can be biased in all sorts of ways: they can be discriminatory because of the data they use for reference, they can harm minorities due to fundamental statistical problems,  and they can replicate the biased ways that humans use language.

And yet, removing humans really can remove an important potential source of bias. The key is recognizing what type of bias Facebook’s critics are concerned about.

Continue reading Sometimes an algorithm really is (politically) unbiased

Startups vs. Systems: Why Doing Good with Tech is Hard

It’s not easy to make social change with technology. There’s excitement around bringing “innovation” to social problems, which usually means bringing in ideas from the technology industry. But societies are more than software, and social enterprise doesn’t have the same economics as startups.

I knew all this going into my summer fellowship at Blue Ridge Labs, but my experience has given me a clearer idea of why. These are the themes that kept coming up for me after two months working with 16 other fellows on the problem of access to justice (A2J) for low-income New Yorkers.

You have to engage the incumbents

The culture of tech startups is not well adapted to taking on big systems. Startups have traditionally tried to enter the wide open spaces created by the new possibilities of technology, or use technical advantage to bypass incumbents. They generally try avoid engaging with major institutions, yet institutional reform is a key part of the “structural change” that so many of us want.

Uber does an end-run around the taxi system, but you can’t simply do an end run around the court system, the state Bar, or the local police.

Continue reading Startups vs. Systems: Why Doing Good with Tech is Hard

Words and numbers in journalism: How to tell when your story needs data

Update: A more recent version of this material appears in my book, The Curious Journalist’s Guide To Data.

I’m not convinced that journalists are always aware when they should be thinking about numbers. Usually, by training and habit, they are thinking about words. But there are deep relationships between words and numbers in our everyday language, if you stop to think about them.

A quantity is an amount, something that can be compared, measured or counted — in short, a number. It’s an ancient idea, so ancient that it is deeply embedded in every human language. Words like “less” and “every” are obviously quantitative, but so are more complex concepts like “trend” and “significant.” Quantitative thinking starts with recognizing when someone is talking about quantities.

Consider this sentence from the article Anti-Intellectualism is Killing America which appeared in Psychology Today:

In a country where a sitting congressman told a crowd that evolution and the Big Bang are “lies straight from the pit of hell,” where the chairman of a Senate environmental panel brought a snowball into the chamber as evidence that climate change is a hoax, where almost one in three citizens can’t name the vice president, it is beyond dispute that critical thinking has been abandoned as a cultural value.

This is pure cultural critique, and it can be interpreted many different ways. To start with, I don’t know of standard and precise meanings for “critical thinking” and “cultural value.” We could also read this paragraph as a rant, an exaggeration for effect, or an account of the author’s personal experience. Maybe it’s art. But journalism is traditionally understood as “non-fiction,” and there is an empirical and quantitative claim at the heart of this language.

Continue reading Words and numbers in journalism: How to tell when your story needs data

The Editorial Product

(This post first appeared at Nieman Journalism Lab)

The traditional goal of news is to say what just happened. That’s sort of what “news” means. But there are many more types of nonfiction information services, and many possibilities that few have yet explored.

I want to take two steps back from journalism, to see where it fits in the broader information landscape and try to imagine new things. First is the shift from content to product. A news source is more than the stories it produces; it’s also the process of deciding what to cover, the delivery system, and the user experience. Second, we need to include algorithms. Every time programmers write code to handle information, they are making editorial choices.

Imagine all the wildly different services you could deliver with a building full of writers and developers. It’s a category I’ve started calling editorial products.

In this frame, journalism is just one part of a broader information ecosystem that includes everything from wire services to Wikipedia to search engines. All of these products serve needs for factual information, and they all use some combination of professionals, participants, and software to produce and deliver it to users — the reporter plus the crowd and the algorithm. Here are six editorial products that journalists and others already produce, and six more that they could. Continue reading The Editorial Product

Job: Help us learn how teaching works by visualizing millions of syllabi

Overview is an open-source document analysis and visualization system originally developed at the Associated Press for investigative journalists. It’s been used to report some of the biggest investigative stories of the last few years. We’re looking for a developer to extend the software to analyze millions of scraped syllabi for the Open Syllabus Project.

You will help us put 2 million scraped syllabi online, do natural language processing to extract citations from each syllabus, and build visualizations to do citation analysis. We want to see what people are actually teaching for each subject, and how this changes over time, and make this type of analysis widely available to researchers. We’re looking for someone to build out Overview to support this, growing our team from three to four people. This is an ideal job for a programmer with visualization, natural language processing, digital humanities or data journalism experience.

The project is Scala on the back and Coffeescript on the front, but you’ll more often be writing plugins in Javascript and doing data pre-processing in whatever works for you. We’re looking for a full stack engineer who can extend the back end infrastructure to process the syllabi, then build the UI to make all this data accessible to users. You’ll be working within a small team of professionals who will quickly get you up to speed on the core codebase and the plugin API you will use to create visualizations. Everything you write will be released under the AGPL open source license.

This is a six-month contract position to begin with. We hope to extend that, and we’d be especially excited to find someone who wants to grow into a larger role within our small team. We’re a distributed team based out of NYC, remote friendly, flexible hours.

Contact me here if interested.

How can I help?

What’s the best simple action you can take to address a particular social problem?

I wish there was somewhere that reviewed attempts to solve social problems, everything from activist campaigns to government programs. You’d go to this site, look up “homelessness” or “education” or “Asian tsunami” or “criminal justice reform” and get a recommendation for the most effective thing you could do right now, and if possible a button to do it or at least sign up to do itThe actions would be intentionally lightweight, like donating $10 or ten minutes of your time or pledging to vote a certain way. Think of a sort of Consumer Reports for social campaigns.

I’ve been calling this hypothetical civic information/action organization “How Can I Help?” because that’s the question it seeks to answer.

This is an ambitious idea, but there are working models to draw from. Continue reading How can I help?

What I learned at Build Peace, the first conference for technology and conflict resolution

The organizers of Build Peace tell me it was the first conference specifically on peace and technology, and they should know. I don’t know the peace building field very well, but I could see that some of its leading lights were in attendance. I learned quite a bit, and I am very glad I went.

I have to start by saying I don’t think “technology for peace” is a sure win. My understanding is that peace building is incredibly difficult work, and rarely truly successful, and I don’t see why technology necessarily changes that. Yet I am also a technologist and I presented some of my own data-driven peace work at the conference. Clearly I believe it might be good for something.

There is a great need for conversations between capable conflict resolution workers and thoughtful technologists — hence this conference. Here are some of the things I think I learned.

Continue reading What I learned at Build Peace, the first conference for technology and conflict resolution

Questions about the NYPD I cannot answer

Recently, the NYPD started a Twitter hashtag campaign, and it backfired.

Several of my friends — actual, real life good friends — shared this story on Facebook in a, let’s say, somewhat triumphant mood. And I wasn’t sure what to think. This is what I wrote.

I’m having trouble understanding what all this signifies. Here’s what I come up with that I am sure about:

I’m having trouble understanding what all this signifies. Here’s what I come up with that I am sure about:

  • my friends do not like cops
  • clearly there are other people who do not like cops
  • people who do not like cops are either more common on Twitter or more vocal than those who like them
  • the NYPD sure have beaten up a lot of people

But, these are the questions I remain unable to answer:

  • I think we probably want a police force that engages with people on social media. How should they have engaged?
  • Were any of these beatings “proportionate?” This is horrible language, I know, but give it a pass for a moment.
  • Is any beating ever proportionate? How could we even know the answer to this in principle, let alone in specific cases?
  • What is the overall record of the NYPD? Is this a question that even has meaning given the multidimensional nature of the problem? Can the answer be anything other than “terrible” if there are incidents like these?
  • What would I do if I was king of the NYPD?
  • Will my friends perceive this post as “defending the cops”? Will there be social sanctions of some sort for expressing these ideas? Is my echo chamber just as pernicious as the echo chambers of those that belong to my perceived “other”?

– Yours in sadness and inquiry.

The post has not received any “likes.”