I just got a pile of money to build a piece of state-of-the-art open-source visualization software, to allow journalists and curious people everywhere to make sense of enormous document dumps, leaked or otherwise.
Huzzah!
Now I am looking for a pair of professional developers to make it a reality. It won’t be hard for the calibre of person I’m trying to find to get some job, but I’m going to try to convince you that this is the best job.
The project is called Overview. You can read about it at overview.ap.org. It’s going to be a system for the exploration of large to very large collections of unstructured text documents. We’re building it in New York in the main newsroom of The Associated Press, the original all-formats global news network. The AP has to deal with document dumps constantly. We download them from government sites. We file over 1000 freedom of information requests each year. We look at every single leak from Wikileaks, Anonymous, Lulzsec. We’re drowning in this stuff. We need better tools. So does everyone else.
So we’re going make the killer app for document set analysis. Overview will start with a visual programming language for computational linguistics algorithms. Like Max/MSP for text. The output of that will be connected to some large-scale visualization. All of this will be backed by a distributed file store and computed through map-reduce. Our target document set size is 10 million. The goal is to design a sort of visualization sketching system for large unstructured text document sets. Kinda like Processing, maybe, but data-flow instead of procedural.
We’ve already got a prototype working, which we pointed at the Wikileaks Iraq and Afghanistan data sets and learned some interesting things. Now we have to engineer an industrial-strength open-source product. It’s a challenging project, because it requires production implementation of state-of-the-art, research-level algorithms for distributed computing, statistical natural language processing, and high-throughput visualization. And, oh yeah, a web interface. So people can use it anywhere, to understand their world.
Because that’s what this is about: a step in the direction of applied transparency. Journalists badly need this tool. But everyone else needs it too. Transparency is not an end in itself — it’s what you can do with the data that counts. And right now, we suck at making sense of piles of documents. Have you ever looked at what comes back from a FOIA request? It’s not pretty. Governments have to give you the documents, but they don’t have to organize them. What you typically get is a 10,000 page PDF. Emails mixed in with meeting minutes and financial statements and god-knows what else. It’s like being let into a decrepit warehouse with paper stacked floor to ceiling. No boxes. No files. Good luck, kiddo.
Intelligence agencies have the necessary technology, but you can’t have it. The legal profession has some pretty good “e-discovery” software, but it’s wildly expensive. Law enforcement won’t share either. There are a few cheapish commercial products but they all choke above 10,000 documents because they’re not written with scalable, distributed algorithms. (Ask me how I know.) There simply isn’t an open, extensible tool for making sense of huge quantities of unstructured text. Not searching it, but finding the patterns you didn’t know you were looking for. The big picture. The Overview.
So we’re making one. Here are the buzzwords we are looking for in potential hires:
- We’re writing this in Java or maybe Scala. Plus JavaScript/WebGL on the client side.
- Be a genuine computer scientist, or at least be able to act like one. Know the technologies above, and know your math.
- But it’s not just research. We have to ship production software. So be someone who has done that, on a big project.
- This stuff is complicated! The UX has to make it simple for the user. Design, design, design!
- We’re open-source. I know you’re cool with that, but are you good at leading a distributed development community?
And that’s pretty much it. We’re hiring immediately. We need two. It’s a two-year contract to start. We’ve got a pair of desks in the newsroom in New York, with really nice views of the Hudson river. Yeah, you could write high-frequency trading software for a hedge fund. Or you could spend your time analyzing consumer data and trying to get people to click on ads. You could code any of a thousand other sophisticated projects. But I bet you’d rather work on Overview, because what we’re making has never been done before. And it will make the world a better place.
For more information, see :
- Writeups in Nieman Journalism Lab, O’Reilly Radar, Journalism.co.uk
- Video of a talk and live demo of the prototype.
- The official job posting.
Thanks for your time. Please contact jstray@ap.org if you’d like to work on this.
Very interesting and exciting project!
We can create a really great iPad App for the output of your end product.
Keep us in mind. We develop iPad/iPhone apps. Have 6 in the app store, two additional B2B apps and a game on the way.
Folks in the field would love to have an iPad with this stuff easily accessible, not just a Web interface but a really cool touch UI interface.
Let us know if you are interested in an initial discussion.
Larry Brambrut
770-778-9762
Jonathan: A link in your article, the one labeled “overview.ap.org”, is broken.
Larry: Implying web pages cannot do touch UI?
Want. Now. Can you donate scanners too? :-0
hello!,I ike your writing so so much! share we communicate
extra approximately your article on AOL? I need an expert in this area too resolve mmy problem.
May be that’s you! Looking forward to pewer you.
Feel free too visit my web page gsa search engine ranker
discount auto (Marjorie)
гуманный веб ресурс Купить свидетельство, диплом курсов экономистов Астрахань
Есть такая услуга – добровольное медицинское обслуживание .
Она предполагает, что вы вносите небольшую сумму за абонемент и ходит на прием целый год не платя за каждый прием.
Однако опросы показывают, что лишь 5% жителей города знают о такой услуге.
Почему?
Потому что клиникам намного выгодней сдирать с людей деньги за каждое посещение.
А если какой-нибудь сотрудник клиники попытается рассказать про добровольное медицинское обслуживание клиенту – это сулит ему увольнением.
Эта информация уже спровоцировала множество возмущений, после того как информацию об услуге рассекретил один возмущенный врач.
Его уволили , после того, как он предложил ДМО постоянному клиенту.
Страшно, что официальные положения по ДМО находятся в открытом доступе, просто находили на эту информацию только случайные люди.
Как отстоять свои права?
О правилах предоставления услуги и обязанностях частных клиник можно узнать, сделав запрос в Яндексе: “добровольное медицинское обслуживание”.
Именно обслуживание, а не страхование.
34j5c6h86