Words and numbers in journalism: How to tell when your story needs data

I’m not convinced that journalists are always aware when they should be thinking about numbers. Usually, by training and habit, they are thinking about words. But there are deep relationships between words and numbers in our everyday language, if you stop to think about them.

A quantity is an amount, something that can be compared, measured or counted — in short, a number. It’s an ancient idea, so ancient that it is deeply embedded in every human language. Words like “less” and “every” are obviously quantitative, but so are more complex concepts like “trend” and “significant.” Quantitative thinking starts with recognizing when someone is talking about quantities.

Consider this sentence from the article Anti-Intellectualism is Killing America which appeared in Psychology Today:

In a country where a sitting congressman told a crowd that evolution and the Big Bang are “lies straight from the pit of hell,” where the chairman of a Senate environmental panel brought a snowball into the chamber as evidence that climate change is a hoax, where almost one in three citizens can’t name the vice president, it is beyond dispute that critical thinking has been abandoned as a cultural value.

This is pure cultural critique, and it can be interpreted many different ways. To start with, I don’t know of standard and precise meanings for “critical thinking” and “cultural value.” We could also read this paragraph as a rant, an exaggeration for effect, or an account of the author’s personal experience. Maybe it’s art. But journalism is traditionally understood as “non-fiction,” and there is an empirical and quantitative claim at the heart of this language.

“Critical thinking has been abandoned as a cultural value” is an empirical statement because it speaks about something that is happening in the world with observable consequences. It is, in principle, a statement that can be tested against history. This gives us a basis for saying whether it’s true or false.

It’s quantitative because the word “abandoned” speaks about comparing amounts at two different times: something that we never had cannot be abandoned. At each point in time we need to decide whether or not “critical thinking” is a “cultural value.” This is in principle a yes or no question. A more realistic answer might involve shades of gray based on the number of people and institutions who are embodying the value of critical thinking, or perhaps how many acts of critical thinking are occurring. Of course “critical thinking” is not an easy thing to pin down, but if we choose any definition at all we are literally deciding which things “count” as critical thinking.

One way or another, testing this claim demands that we count something at two different points in time, and look for a big drop in the number. Compare this with the evidence provided:

  • a sitting congressman told a crowd that evolution and the Big Bang are “lies straight from the pit of hell”
  • the chairman of a Senate environmental panel brought a snowball into the chamber as evidence that climate change is a hoax
  • almost one in three citizens can’t name the vice president

The first two pieces of evidence seem to me more anti-science than anti-critical thinking, but let’s suppose our definitions allow it. The real problem is that these are anecdotes – which is just a judgmental word for “examples.” Anecdotes make poor evidence when it’s just as easy to come up with examples on the other side. Yeah, someone brought a snowball into Congress to argue against climate change, but also the EPA decided to start regulating carbon dioxide as a pollutant. The issue is one of generalization: we can’t draw conclusions about the state of an entire culture from just a few specific examples. Generalization is tricky at the best of times, but it’s much easier when you can count or measure the entirety of something. Instead we have only scattered facts, and no information about whether these cases are representative of the whole.

Or, as in historian G. Kitson Clark’s famous advice about generalization:

Do not guess; try to count. And if you cannot count, admit that you are guessing.

The fact that “one in three citizens can’t name the vice president” is closer to the sort of evidence we need. Let’s leave aside, for a moment, whether being able to name the vice president is really a good indication that “critical thinking” is a “cultural value.” This statement is still stronger than the first two examples because it generalizes in a way that individual examples cannot: it makes a claim about all U.S. citizens. It doesn’t matter how many people I can name who know who the vice president is, because we know (by counting) that there are 100 million who cannot. But this still only addresses one point in time. Were things better before? Was there any point in history where more than two thirds of the population could name the vice-president? We don’t know.

In short, the evidence in this paragraph is fundamentally not the right type. The word “abandoned” has embedded quantitative concepts that are not being properly handled. We need something tested or measured or counted across the entire culture at two different points in time, and we don’t have that.

Very many words have quantitative aspects. Words like “all” “every” “none” and “some” are so explicitly quantitative that they’re called “quantifiers” in mathematics. Comparisons like “more” and “fewer” are explicitly about counting, but much richer words like “better” and “worse” also require counting or measuring at least two things. There are words that compare different points in time, like “trend” “progress” and “abandoned.” There are words that imply magnitudes such as “few” “gargantuan” and “scant.” A series of Greek philosophers, long before Christ, showed that the logic of “if” “then” “and” “or” and “not” could be captured symbolically. To be sure, all of these words have meanings and resonances far beyond the mathematical. But they lose their central meaning if the quantitative core is ignored.

The relation between words and numbers is of fundamental importance in journalism. It tells you when you need to get quantitative. It’s essential for planning data journalism work and for communicating the results. It’s the heart of the data journalist’s job, really. The first step is to become aware of when quantitative concepts are being used in everyday language.


Peace, Conflict, and Data

A talk I gave at the IPSI Bologna Symposium on conflict resolution. Slides here.

We might be able to do better at conflict resolution — making peace in violent conflicts — with the help of good data analysis. There have long been data sets about war and violent conflict at the state level, but we now have much more.

There are now extraordinarily detailed, open-source event data streams that can be used for violence prediction. Conflict “microdata” from social media and communications records can be used to visualize the divisions in society. I also suggest a long term program of conflict data collection to learn, over many cases, what works in conflict resolution and what doesn’t.

We’re really just at the beginning of all of this. There are huge issues around data collection, interpretation, privacy, security, and politics. But the potential is too great to ignore.

Update: two excellent resources have come to my attention in the days since I gave this talk (which is, of course, part of why I give talks.)

First, see the International Peace Institute’s paper on Big Data for Conflict Prevention. This paper was co-authored by Patrick Meier, who has been deeply involved in the crisis mapping work I mentioned in my talk.

But even more awesome, Erica Chenoweth has done exactly the sort of data-driven case-control study I was contemplating in my talk, and shown that non-violent political resistance succeeds twice as often as armed resistance. Her data set, the Nonviolent and Violent Campaigns and Outcomes (NAVCO) Data Project, also shows that non-violence is much more likely to lead to good democracies five years later, and that a movement that can recruit 10% of the population is almost guaranteed to succeed.

I highly recommend her talk.