Twitter is Not Reality, Even in Guatemala

Guatemalans took to the streets in protest over the alleged murder of a prominent attorney by the country’s president, and an unrelated man was arrested for tweeting about it. The protests were reportedly organized on Facebook and other social networking sites, and streamed live to the world by laptop. Xeni Jardin of Boing Boing has been reporting from Guatemala directly for the past two weeks, and in an essay two days ago she calls this the “Twitter Revolution“. I love the story of new technology enabling mass social dissent and change, but I’m not at all sure it’s true. Sorely missing from Xeni’s narrative is the role of other communication networks — like good old fashioned word-of-mouth — and the demographics of internet access in a poor country.

The background: Attorney Rodrigo Rosenberg was shot while riding his bicycle on May 10th, just a few days after recording a video message which begins,

If you are watching this message, it is because I was assassinated by President Álvaro Colom.

The video implicates not only the president but the major state-owned bank, and indeed much of the current government, and there were mass protests in the capital city. Xeni has been covering the story from Guatemala since the 20th, and I can only commend her for actually being there. However, her coverage has focussed on the role of the internet in these protests.

Google is not reality and Twitter is not reality in exactly the same way that television is not reality. Part of the reason that Middle-Eastern peasants have such a warped view of America is that they too watch Desperate Housewives (via satellite or bootleg VCD), but never get the chance to actually meet some Americans. To them, all American women are blonde and slutty. There’s no reason to believe that we’re not getting a similarly warped view of other cultures when we watch their internet.

Continue reading Twitter is Not Reality, Even in Guatemala

Escaping the News Hall of Mirrors

We live in a cacaphony of news, but most of it is just echoes. Generating news is expensive; collecting it is not. This is the central insight of the news aggregator business model, be it a local paper that runs AP Wire and Reuters stories between ads, or web sites like Topix, Newser, and Memeorandum, or for that matter Google News. None of these sites actually pay reporters to research and write stories, and professional journalism is in financial crisis. Meanwhile there are more bloggers, but even more re-blogging. Is there more or less original information entering the web this year than last year? No one knows.

A computer could answer this question. A computer could trace the first, original source of any particular article or statement. The effect would be like donning special glasses in the hall of mirrors that is current news coverage, being able to spot the true sources without distraction from reflections. The required technology is nearly here.

This is more than geekery if you’re in a position of needing to know the truth of something. Last week I was researching a man named Michael D. Steele, after reading a newly leaked document containing his name. Steele gained fame as one of the stranded commanders in Black Hawk Down, but several of his soldiers later killed three unarmed Iraqi men. I rapidly discovered many news stories (1, 2, 3, 4, 5, 6, 7, etc.) claiming that Steele had ordered his men to “kill all military-age males.” This is a serious accusation, and widely reprinted — but no number of news articles, blog posts, and reblogs can make a false statement more true. I needed to know who first reported this statement, and its original source.

Continue reading Escaping the News Hall of Mirrors

How Many World Wide Webs Are There?

newblog-crop

How much overlap is there between the web in different languages, and what sites act as gateways for information between them? Many people have constructed partial maps of the web (such as the  blogosphere map by Matthew Hurst, above) but as far as I know, the entire web has never been systematically mapped in terms of language.

Of course, what I actually want to know is, how connected are the different cultures of the world, really? We live in an age where the world seems small, and in a strictly technological sense it is. I have at my command this very instant not one but several enormous international communications networks; I could email, IM, text message, or call someone in any country in the world. And yet I very rarely do.

Similarly, it’s easy to feel like we’re surrounded by all the international information we could possibly want, including direct access to foreign news services, but I can only read articles and watch reports in English. As a result, information is firewalled between cultures; there are questions that could very easily be answered by any one of tens or hundreds of millions of native speakers, yet are very difficult for me to answer personally. For example, what is the journalistic slant of al-Jazeera, the original one in Arabic, not the English version which is produced by a completely different staff?  Or, suppose I wanted to know what the average citizen of Indonesia thinks of the sweatshops there, or what is on the front page of the Shanghai Times today– and does such a newspaper even exist? What is written on the 70% of web pages that are not in English?

Continue reading How Many World Wide Webs Are There?

Intelligent News Agents, With Real New

You cannot read all of the news, every day. There is simply too much information for even a dedicated and specialized observer to consume it all, so someone or something has to make choices. Traditionally, we rely on some other person to tell us what to see: the editor of a newspaper decides what goes on the front page, the reviewer tells us what movies are worth it. Recently, we have been able to distribute this mediation process across wider communities: sites like Digg, StumbleUpon, or Slashdot all represent the collective opinions of thousands of people.

The next step is intelligent news agents. Google (search, news, reader, etc.) can already be configured to deliver to us only that information we think we might want to see. It’s not hard to imagine much more sophisticated agents that would scour the internet for items of interest.

In today’s context, it’s easy to see how such agents could actually be implemented. Sophisitacted customer preference engines are already capable of telling us what products we might like to consume — the best example is Amazon’s recommendation engine. It’s not a big leap to imagine using the same sort of algorithms to model the kinds of blog articles, web pages, youtube videos, etc. that we might enjoy consuming, and then deliver these things to us.

There is a serious problem with this. You’re going to get exactly what you ask for, and only that.

True, we all do this already. We read books and consume media which more or less confirm our existing opinions. This effect is visible as clustering in what we consume, as in this example of Amazon sales data for political books in 2008.

Social network graph of Amazon sales of political books, 2008

This image is from a beautiful analysis by orgnet.com. Basically, people buy either the red books or the blue books, but usually not both. The same sorts of patterns hold for movies, blogs, newspapers, ideologies, religions, and human beliefs of all kinds. This is a problem; but at least you can usually see the other color of books when you walk into Borders. If we end up relying on trainable agents for all of our information, we risk completely blacking out anything that disagrees with what we already believe.

I propose a simple solution. Automatic network analyses like the one above — of books, or articles, or web pages — could easily pinpoint the information sources that would expose me to the maximum novelty in the minimum time. If my goal is to gain a deep understanding of the entire scope of human discourse, rather than just the parts of it I already agree with, then it would be very simple to program my agent to bring to me exactly those things that would most rapidly give me insight into those regions of information space which are most vital and least known to me. I imagine some metric like “highest degree node most distant from the nodes I’ve already visited” would would work handily.

You can infer a lot about somewhat from the information they currently consume. If my agent noticed that I was a liberal, it could make me understand the conservative world-view, and vice-versa. If my agent detected that I was ignorant of certain crucial aspects of Chinese culture and politics, it could reccomend a primer article. Or it might deduce that I needed to understand just slightly more physics to participate meaningfully in the climate change debate, or decide (based on my movie viewing habits) that it was high time I review the influential films of Orson Welles. Of course, I might in turn decide that I actually, truly, don’t care about film at all; but the very act of excluding specific subjects or categories of thought would force us, consciously, to admit to the boundaries of our mental worlds.

We could program our information gathering systems to challenge us, concisely and effectively, if we so want. Intelligent agents could be mere sycophants, or they could be teachers.

What Foxmarks Knows about Everyone

I recently installed Foxmarks, a Firefox extension that automatically synchronizes your web bookmarks across all the computers you might use. Refreshingly, the developers got it right: the plug-in is idiot-simple and works flawlessly.

This is accomplished through a central server, which means a lot of bandwidth, hardware, reliability costs, etc. In short, it’s not a completely cheap service to provide. As there is no advertising either in the plug-in or on the site (yet?) I began to wonder how they planned to pay for all this. I found my answer on their About Us page:

We are hard at work analyzing over 300 million bookmarks managed by our systems to help users discover sites that are useful to them. By combining algorithmic search with community knowledge-sharing and the wisdom of crowds, our goal is to connect users with relevant content.

Of course.

There is a lesson here: knowledge of something about about someone is fundamentally different than knowledge of something about everyone. As with Google, Amazon, or really any very large database of information over millions of users, there are extremely valuable patterns that only occur between people. The idea is as old as filing, but the web takes this to a whole new level, especially if you can convince huge numbers of people to voluntarily give up their information.

So far, I haven’t said anything new. What I am suggesting is a shift in thinking. Rather than being concerned primarily about our individual privacy rights when we fill out a form full of personal details, perhaps we should be pondering what powers we are handing over by letting a private entity see these large-scale intra-individual patterns — patterns that they can choose to hide from everyone else’s view, naturally.

I am beginning to wonder very seriously about the growing disparity between public and private data-mining capability. Is this an acceptable concentration of power? What effects does this have on a society?