This is a recording of my talk at the NICAR (National Institute of Computer-Assisted Reporting) conference last week, where I discuss some of our recent work at the AP with the Iraq and Afghanistan war logs.
References cited in the talk:
- “A full-text visualization of the Iraq war logs”, a detailed writeup of the technique used to generate the first set of maps presented in the talk.
- The Glimmer high-performance, parallel multi-dimensional scaling algorithm, which is the software I presented in the live demo portion. It will be the basis of our clustering work going forward. (We are also working on other large-scale visualizations which may be more appropriate for e.g. email dumps.)
- “Quantitative Discovery from Qualitative Information: A General-Purpose Document Clustering Methodology.” Justin Grimmer, Gary King, 2009. A paper that everyone working in document clustering needs to read. It clearly makes the point that there is no “best” clustering, just different algorithms that correspond to different pre-conceived frames on the story — and gives a method to compare clusterings (though I don’t think it will scale well to millions of docs.)
- Wikipedia pages for bag of words model, tf-idf, and cosine similarity, the basic text processing techniques we’re using.
- Gephi, a free graph visualization system, which we used for the one-month Iraq map. It will work up to a few tens of thousands of nodes.
- Knight News Challenge application for “Overview,” the open-source system we’d like to build for doing this and other kinds of visual explorations of large document sets. If you like our work, why not leave a comment on our proposal?
L’oggetto del tuo articolo e ben scritto e ho solo pensato che avrei dovuto lasciare un complimento poco qui. Bravi e continuate cosi! Ho pensato di iniziare un blog WordPress troppo. Sapete altri siti dove ti insegnano come?
Concentrate legal discovery databases.
Make anyone able to do the same thing as a lawyer.
Lawyers were able to stop this fifteen years ago, as I recall. It’s time to take that information and make it available to everyone. It’s not secret, but it is guarded.
Jonathan,
Ward Cunningham – best known for inventing the code that creates wikis – is working with PEG in a C++ wrap around to parse. He is opening up the code, if not done so already. He spoke to many of us about it at the Open Source Bridge conference 2011, ( #OSBridge11 or #OSB11). He is testing it on large scale by using Wikipedia, and will be speaking with archive.org for potential application. The testing process is fast and reliable. I was thinking this might be useful, even if just conceptually, for your objective with investigating journalism documents.
Best,
Teresa Boze
Concept | Connections, NW
I see a lot of interesting articles on your page. You have to spend a lot of time
writing, i know how to save you a lot of work, there is a
tool that creates high quality, SEO friendly posts in couple of seconds, just type in google – k2 unlimited content
I read a lot of interesting posts here. Probably you spend
a lot of time writing, i know how to save you a lot of time,
there is an online tool that creates high quality, SEO friendly posts in minutes, just
search in google – k2seotips unlimited content
Really really awesome thank you!
Andreas from germany
Thanks admin for this great article and i promise to share it
with my friends. Rank you site high in search engines in Two days >>> https://goo.gl/zcALt3