Words and numbers in journalism: How to tell when your story needs data

Update: A more recent version of this material appears in my book, The Curious Journalist’s Guide To Data.

I’m not convinced that journalists are always aware when they should be thinking about numbers. Usually, by training and habit, they are thinking about words. But there are deep relationships between words and numbers in our everyday language, if you stop to think about them.

A quantity is an amount, something that can be compared, measured or counted — in short, a number. It’s an ancient idea, so ancient that it is deeply embedded in every human language. Words like “less” and “every” are obviously quantitative, but so are more complex concepts like “trend” and “significant.” Quantitative thinking starts with recognizing when someone is talking about quantities.

Consider this sentence from the article Anti-Intellectualism is Killing America which appeared in Psychology Today:

In a country where a sitting congressman told a crowd that evolution and the Big Bang are “lies straight from the pit of hell,” where the chairman of a Senate environmental panel brought a snowball into the chamber as evidence that climate change is a hoax, where almost one in three citizens can’t name the vice president, it is beyond dispute that critical thinking has been abandoned as a cultural value.

This is pure cultural critique, and it can be interpreted many different ways. To start with, I don’t know of standard and precise meanings for “critical thinking” and “cultural value.” We could also read this paragraph as a rant, an exaggeration for effect, or an account of the author’s personal experience. Maybe it’s art. But journalism is traditionally understood as “non-fiction,” and there is an empirical and quantitative claim at the heart of this language.

“Critical thinking has been abandoned as a cultural value” is an empirical statement because it speaks about something that is happening in the world with observable consequences. It is, in principle, a statement that can be tested against history. This gives us a basis for saying whether it’s true or false.

It’s quantitative because the word “abandoned” speaks about comparing amounts at two different times: something that we never had cannot be abandoned. At each point in time we need to decide whether or not “critical thinking” is a “cultural value.” This is in principle a yes or no question. A more realistic answer might involve shades of gray based on the number of people and institutions who are embodying the value of critical thinking, or perhaps how many acts of critical thinking are occurring. Of course “critical thinking” is not an easy thing to pin down, but if we choose any definition at all we are literally deciding which things “count” as critical thinking.

One way or another, testing this claim demands that we count something at two different points in time, and look for a big drop in the number. Compare this with the evidence provided:

  • a sitting congressman told a crowd that evolution and the Big Bang are “lies straight from the pit of hell”
  • the chairman of a Senate environmental panel brought a snowball into the chamber as evidence that climate change is a hoax
  • almost one in three citizens can’t name the vice president

The first two pieces of evidence seem to me more anti-science than anti-critical thinking, but let’s suppose our definitions allow it. The real problem is that these are anecdotes – which is just a judgmental word for “examples.” Anecdotes make poor evidence when it’s just as easy to come up with examples on the other side. Yeah, someone brought a snowball into Congress to argue against climate change, but also the EPA decided to start regulating carbon dioxide as a pollutant. The issue is one of generalization: we can’t draw conclusions about the state of an entire culture from just a few specific examples. Generalization is tricky at the best of times, but it’s much easier when you can count or measure the entirety of something. Instead we have only scattered facts, and no information about whether these cases are representative of the whole.

Or, as in historian G. Kitson Clark’s famous advice about generalization:

Do not guess; try to count. And if you cannot count, admit that you are guessing.

The fact that “one in three citizens can’t name the vice president” is closer to the sort of evidence we need. Let’s leave aside, for a moment, whether being able to name the vice president is really a good indication that “critical thinking” is a “cultural value.” This statement is still stronger than the first two examples because it generalizes in a way that individual examples cannot: it makes a claim about all U.S. citizens. It doesn’t matter how many people I can name who know who the vice president is, because we know (by counting) that there are 100 million who cannot. But this still only addresses one point in time. Were things better before? Was there any point in history where more than two thirds of the population could name the vice-president? We don’t know.

In short, the evidence in this paragraph is fundamentally not the right type. The word “abandoned” has embedded quantitative concepts that are not being properly handled. We need something tested or measured or counted across the entire culture at two different points in time, and we don’t have that.

Very many words have quantitative aspects. Words like “all” “every” “none” and “some” are so explicitly quantitative that they’re called “quantifiers” in mathematics. Comparisons like “more” and “fewer” are explicitly about counting, but much richer words like “better” and “worse” also require counting or measuring at least two things. There are words that compare different points in time, like “trend” “progress” and “abandoned.” There are words that imply magnitudes such as “few” “gargantuan” and “scant.” A series of Greek philosophers, long before Christ, showed that the logic of “if” “then” “and” “or” and “not” could be captured symbolically. To be sure, all of these words have meanings and resonances far beyond the mathematical. But they lose their central meaning if the quantitative core is ignored.

The relation between words and numbers is of fundamental importance in journalism. It tells you when you need to get quantitative. It’s essential for planning data journalism work and for communicating the results. It’s the heart of the data journalist’s job, really. The first step is to become aware of when quantitative concepts are being used in everyday language.

3 thoughts on “Words and numbers in journalism: How to tell when your story needs data”

  1.   标报讯(网站王鹏昊) 鲁某伙共林某,从称跨邦婆司CEO,以单干开辟31.5亿的“国内嫩年衰弱乡”名目替幌子,主弛某处欺骗1000万国民币。今天,昌

  2.   标报讯 昨早10面阁下,在北京散庆门大巷一减油站门心,一辆马从达3轿车逃首招致5辆车连碰。使人受惊的非,20多岁的闹事轿车司机涉嫌酒&#210

Leave a Reply

Your email address will not be published.