A full-text visualization of the Iraq War Logs

Update (Apr 2012): the exploratory work described in this post has since blossomed into the Overview Project, an open-source large document set visualization tool for investigative journalists and other curious people, and we’ve now completed several stories with this technique. If you’d like to apply this type of visualization to your own documents, give Overview a try!

Last month, my colleague Julian Burgess and I took a shot a peering into the Iraq War Logs by visualizing them in bulk, as opposed to using keyword searches in an attempt to figure out which of the 391,832 SIGACT reports we should be reading. Other people have created visualizations of this unique document set, such as plots of the incident locations on a map of Iraq, and graphs of monthly casualties. We wanted to go a step further, by designing a visualization based on the the richest part of each report: the free text summary, where a real human describes what happened, in jargon-inflected English.

Also, we wanted to investigate more general visualization techniques. At the Associated Press we get huge document dumps on a weekly or sometimes daily basis. It’s not unusual to get 10,000 pages from a FOIA request — emails, court records, meeting minutes, and many other types of documents, most of which don’t have latitude and longitude that can be plotted on a map. And all of us are increasingly flooded by large document sets released under government transparency initiatives. Such huge files are far too large to read, so they’re only as useful as our tools to access them. But how do you visualize a random bunch of documents?

We’ve found at least one technique that yields interesting results, a graph visualization where each document is node, and edges between them are weighted using cosine-similarity onĀ TF-IDF vectors. I’ll explain exactly what that is and how to interpret it in a moment. But first, the journalism. We learned some things about the Iraq war. That’s one sense in which our experiment was a success; the other valuable lesson is that there are a boatload of research-grade visual analytics techniques just waiting to be applied to journalism.

click for super hi-res version

Interpreting the Iraq War, December 2006
This is a picture of the 11,616 SIGACT (“significant action”) reportsĀ from December 2006, the bloodiest month of the war. Each report is a dot. Each dot is labelled by the three most “characteristic” words in that report. Documents that are “similar” have edges drawn between them. The location of the dot is abstract, and has nothing to do with geography. Instead, dots with edges between them are pulled closer together. This produces a series of clusters, which are labelled by the words that are most “characteristic” of the reports in that cluster. I’ll explain precisely what “similar” and “characteristic” mean later, but that’s the intuition.

Continue reading A full-text visualization of the Iraq War Logs

Social Network of US Counterinsurgency Policy Authors

coincrop-270109

Who is writing the major policies of the wars in Iraq and Afghanistan, and what is the Obama administration likely to do? There have been many analyses and news reports of individual policies and events, but it’s hard to wade into this flood of information, and besides, how would I know who to listen to? In an effort to get some perspective on at least one major aspect of American military strategy, I decided to plot out all the authors of (public) counterinsurgency policy over the last decade, and the relationships between them, as evidenced by co-authorship of articles and papers.

Continue reading Social Network of US Counterinsurgency Policy Authors

Scott McClellan, President Bush, and the Permanent Campaign

Scott McClellan was the White House press secretary from July 2003 to April 2006, and the deputy press secretary before that. I saw him speak at a meeting of the Commonwealth Club this Tuesday, June 24. He talked about his relationship to President Bush, the administration’s “mistakes”, and why these mistakes were made. For example, he now feels that “the war in Iraq was not absolutely necessary.” It is fascinating to watch someone formerly so close to the president recant so publicly and dramatically, especially someone who appeared on CNN time and time again to justify the president’s decisions. The personal dynamics of what happened between the president and his press secretary are at least as interesting as the actual events, and perhaps give us a little bit of insight into the psychology of politics in America.

Continue reading Scott McClellan, President Bush, and the Permanent Campaign