Corruption in the classic sense is when aÂ politicianÂ sells their influence.Â Quid pro quo, pay to play, or just an old fashioned bribe — whatever you want to call it, this is the smoking gun that every political journalist is trying to find. Recently, data journalists have begin to look for influence peddling usingÂ statistical techniques. This is promising, but the data has to be just right, and it’s really hard to turn it into proof.
To illustrate the problems, let’s look at a failure.
On August 23, the Associated Press released aÂ bombshell of a storyÂ implyingÂ that Clinton wasÂ selling access to the US governmentÂ in exchange for donations to her foundation. I’m impressed by the AP’s initiative in usingÂ primaryÂ documents to look into a serious question of political ethics.Â But this is not a good story. It’s alreadyÂ been criticized in various ways. It’s the statistics I want to talk about here — which are, in a word, wrong. (And perhaps theÂ AP now agrees: they changed the headline and deleted theÂ tweet.)Â Here’s theÂ lede:
At least 85 of 154 people from private interests who met or had phone conversations scheduled with Clinton while she led the State Department donated to her family charity or pledged commitments to its international programs, according to a review of State Department calendars
There’s no question this has the appearance of something fishy. In that sense alone, it’s probably newsworthy. ButÂ the deeper question is not about the appearance, but whether there were in fact behind the scenes deals greased by money, and I think that thisÂ statistic is not nearly as strong as it seems. It’s fine to report something that looks bad, but I think news organizations also need to clearly explain when the evidence is limited — or maybe not make an ambiguousÂ statisticÂ theÂ third word in the story.
So here, in detail, are the limitations of this type of data and analysis. The first problem is that these 154 are a limited subset of theÂ more than 1700 people she met with. It only countsÂ private citizens, not government representatives, and this material only covers “about half of her four-year tenure.” So this isn’t really a good sample.
ButÂ even if the AP had access to Clinton’s completeÂ calendar, countingÂ the number of Clinton foundation donors still wouldn’t tell us much. There would still be no way to know ifÂ donors had any advantage over non-donors.Â If “pay to play” means anything, it must surely mean that you get something for paying that you wouldn’t otherwise get. In this case, that “something” is a meeting with the Secretary of Sate.
The simplest way to approach the question of advantageÂ is to use a risk ratio,Â which is normally used to compare things like the risk of dying of cancer if you are and aren’t a smoker, or getting shot by police if you’re black vs. white. Here, we’ll compareÂ the probability that you’ll get a meeting if you areÂ aÂ donor to the probability that you’ll get a meeting if you aren’t a donor. The formula looks like this:This summarizesÂ the advantage of paying in terms of increasingÂ your chances of getting a meeting. If 100 people paid and 50 got a meeting, but 1000 people didn’t pay and 500 of those stillÂ got a meeting, then paying doesn’t help get you a meeting.
The problem with the AP’s story is that there was no way for them to compute a risk ratio from meeting records. Clinton met with 85 people who donated to herÂ foundation, and 154-85 = 69 who did not. This gives us:
We’re still missing two numbers! We can’t compute the advantage of paying because we don’t know how many peopleÂ wanted a meeting, whether they paid, and whether or not they got a meeting. In other words, we need to know who got turned down for a meeting.Â The calendars and schedules thatÂ reporters can get don’t have that information and never will.
Can we concludeÂ anything at all from the AP’sÂ data? Not much. WeÂ can say only a few fairly obvious things. If many more than 85 people donated, then the numerator getsÂ small and there appears to be less advantage. On the other hand, if many more than 69 peopleÂ wanted a meeting but didn’t donate, the denominatorÂ gets small andÂ it looksÂ worseÂ for Clinton.
We might be able to get some idea of who got turned down by looking at theÂ Clinton Foundation contributorsÂ list. ThatÂ page lists 4277 donors who gave at least $10,000. (Far more gave less, but you have to figure that a meeting costs at least some minimum amount.) Reading through the list of donors, almost all of themÂ are private citizens, not governments. If we imagineÂ thatÂ any substantial number of thoseÂ 4277 donors hoped for a meeting with Clinton, the 85 privateÂ donors who did meet with herÂ areÂ at most 2% of those who tried to get a meeting.Â The numerator in the relative risk formula is small. The denominator might be even smaller if many thousands of people tried to get a meeting using exactly the same channels as the donors but Â¯\_(ãƒ„)_/Â¯ weâ€™ll never know.
In other words, there is no way ofÂ finding evidence of “pay to play” by looking only at who got to “play,”Â without also looking at who got turned down.
The inability to calculate a risk ratioÂ is a problem with manyÂ types of data that journalistsÂ use, but not others. Imagine looking for oil industry influence in a politician’s voting records. If you have goodÂ campaign finance data you know how much the oil companies donated to each politician. You also know how each politician voted onÂ bills that affect the industry, so you know when oil money both did and didn’t get results. Meeting records are not like this, because theyÂ don’t recordÂ the names of the peopleÂ whoÂ wanted to meet with a politician but didn’t.
Then there’s the problem of proving cause. Even when you can compute a relative risk, and theÂ dataÂ suggests that more donors than non-donors got a meeting, corruption only happened ifÂ the paymentÂ caused the meetings. There are all sorts of possibleÂ confounding variablesÂ that will cause the risk ratio to overestimate the causal effect, that is, overestimate what money buys you. What sort of factors would cause someone both to meet with Clinton and donate toÂ the Clinton Foundation, which does mostly global health work? All sorts of high-level folks might have business onÂ both fronts. For example, there are plenty of people working inÂ global health atÂ the international level, coordinating with governments and so on.
Of course, people working together without the influence of money between them can still be doing terrible things! That is a different type of crime though. Itâ€™s not the pervasive money-as-influence-in-politics story that data journalists might hope to find statistically, and thatâ€™s the kind of story the AP was after.
Unfortunately, most people don’t think about the influence of money in this way. They only see evidence of an association betweenÂ money and outcomes, without thinking about 1) those who wanted something and never got it, and 2) factors that wouldÂ align two people without one paying the other, like shared goals.Â It’s all guilt by association.
In short, political science is hard and we canâ€™t conclude very much from looking at meetings and donors! Yet I suspect it will still be quite difficult for many people to accept that the AP story is largely irrelevant to the question of whether Clinton was selling access. It is the association that seems suspicious to us, not the relative advantage. Suppose we know that half of the people who got promoted brought a bottle of wine to to the bossâ€™s garden party. That means nothing if half of the company brought a bottle of wine to bossâ€™s the garden party. But suppose instead that half of the people who got promoted slept with the boss. Now that seems like an open and shut case of â€œpay to play,â€ no? Not if the boss also slept with half of the rest of the employees. While that would be wildly inappropriate, itâ€™s not trading favors.
It seems that our perception ofÂ the association between acts and outcomes depends far more on our judgment of the act than whether or not itÂ actually gives you an advantage. Yet “advantage” is the wholeÂ ideaÂ of quid pro quo.
Which is not to say that ClintonÂ wasn’t influenced by donations to her foundation. Who can say that it was never a factor? In fact she wouldn’tÂ even need to give actual advantage to donors. Just the appearance, promise, or hope of advantage might be enough to shake people down, and thatÂ could be called corruption too. All I’m saying here is that we’re not going to be able to see statistical evidence of pay-to-playÂ in meeting records.
We can, however, look in the data for specific leads about specific fishy transactions. To the APâ€™s credit much of the long story was exactly that, though having a meeting about helping a Nobel Peace Prize winner keep his job at the head of a non-profit microfinanceÂ bank may not feel like much of a smoking gun.
The AP, being the AP, was extremely careful not to make factually incorrect statements. It’sÂ merely the totality of the piece that implies malfeasance. Or not. Let the readers make up their own minds, asÂ editors love toÂ say. I findÂ this a monumental cop out, because the process of inferring corruption from theÂ data is subtle! Readers will not be equipped to do that, so if we are using data as evidence we have to interpretÂ it for them.Â The storyÂ could have, and in my opinion should have, explained the limitations of the data much more carefully. The statistics are at best ambiguous, andÂ at worst suggestÂ that donors got no special treatmentÂ (if you compare toÂ the total number of donors, as above.) The numbers should never have been inÂ the lede, much less the headline.
But then, would there have been a story? Would have, should have, the AP run a Â story saying “here are some of the people Clinton met with who are also donors?” That’s not nearly as interesting a story — and that is its own kind of media bias. The tendency isÂ towards stronger results, even sensational results. Or no story at all, if not enough scandal can be found,Â which is straight upÂ publication bias.
The broader point for data journalists is that itÂ is extremelyÂ difficult to prove corruption, in the sense of quid pro quo, just by countingÂ who got what. To start with, we also need data on who wanted somethingÂ but didn’t get it, which is oftenÂ not recorded. Then we needÂ an argument that there are noÂ important confounders, nothing that is making two people work together without one paying the other (of course they could still be co-conspirators doing something terrible, but thatÂ would be a different type of crime.)Â The AP counted only those who got meetingsÂ and didn’t even touch on non-corrupt reasons for the correlation, so the numbers in the story — the headline numbers — mean essentially nothing,Â despite the unsavory association.