Jun 08 2010

Short doesn’t mean shallow

Writing style needs to change to take advantage of the hyperlink. That’s the message I want to inject into the discussion about whether deep, long-form writing can survive online — especially long-form journalism. Many people assert that articles need to be shorter online than they are in print, and Nic Carr even famously argues that the internet is making us stupid by destroying our attention span.

But I don’t think the web is shallow at all. I think it’s the deepest medium ever invented, with incredible potential for telling complex, irreducible stories — or at least it can be, if you don’t treat it like print.

Here’s how a long story works in print:

The yellow bits are the unique information that can only be found on that page. The other bits of writing are there to provide the context needed to make sense of the whole , or to get the reader up to speed if they don’t know the backstory. In print, this is absolutely necessary, because a print story is a self-contained object. There is no where else the reader can go to look up an unfamiliar term, to check out a sub-plot, or to investigate the history. That is why we have context paragraphs. That is why we have little definitions of terms that might be unfamiliar to some, and explanations of who a source is and why we should believe them.

Online is different. We can move the context, verification, and background out of the main story, paring the piece down to a thing of streamlined beauty — but all the depth is still there, via links, for anyone who wants it.

When I try to understand how the internet changes communication, one of the points I keep coming back to is the personal nature of online media: it’s now possible to present a different experience to every single reader (user? viewer?) Choosing whether or not to follow a link is a simple way for a reader to tailor the presentation to what they already know, and to indulge their own curiosity. And links let you skip the boring bits.

We’re used thinking of “an article” as a self-contained unit of story. It’s not. The component parts of a story might even be written at different times, published on different sites, or created by different authors.

And now our diagram looks like the web actually looks, lots of pages from different people about different aspects of a very complex world. That is the medium we write in, not some simulation of a stack of paper, no matter what your word processor shows you. Of course, the web also allows fully interactive stories, but it’s often forgotten that hypertext itself is an interactive medium — or it can be, when we put the right links in the right places. People have been experimenting with non-linear stories for decades, but given that a generation or two has now grown up with hypertext, it’s probably time to let our storytelling style grow up too.

Online, short doesn’t necessarily mean shallow. We’re just measuring “depth” wrong when we only look at a single article. That’s not how people actually consume the web, and we shouldn’t force them to.

18 responses so far

May 30 2010

Internet as information democracy, or new media news monopolies?

There was a dream that the internet would mean the end of the media gatekeeper; that anyone could get their message out without having to get the attention and approval of the media powers that be. This turns out to be not quite the case.

I took data from the Project form Excellence in Journalism’s State of the News Media 2010 report to create this chart showing the market share of the top 20 news web sites. In theory, the internet busts media monopolies by allowing anyone to publish for free. And there’s no doubt it’s been disruptive. But according to data from Nielsen, the top 7% of 4600 news and information sites get 80% of traffic (from American viewers.) We see a big concentration of power, as the rapid falloff in the chart above shows, and much of it still belongs to “old media.”

Organizations such as CNN, Fox, the New York Times and USA Today rank in the top 20. But so do new media giants AOL, Google News, The Huffington Post and Yahoo.com, which is the biggest news site of all.

(It’s also interesting to note that many of the top 20 new media news sites produce little or none of their own news; in the extreme case Google News produces no stories at all of its own. While some see aggregation as parasitic, I think it’s obvious that it delivers a tremendously valuable service to readers.)

For better or worse, the ability to publish anything nearly for free hasn’t meant the end of big media monopolies. It’s simply shifted the landscape and the power balance.

The limiting factor to getting your message out is no longer having access to an expensive printing press or a TV station. It’s attention: how many minutes of time can you get from how many people? In this game, brand still matters hugely. There are only so many URLs a person can remember, only so many sites they can check in a day.

You have an audience, or you don’t. Mindshare is now the barrier to entry in the media world. Perhaps it always was, though I daresay it was easier to get viewers to check out your new television network when there were only 13 channels. Online, the number of channels is infinite for all intents and purposes; a single person will never exhaust them all.

Which is not to say that the internet has changed nothing. We have seen over and over that bottom-up effects can propel something to mass attention, with no big company behind them. This is often called “going viral,” but that’s not quite a broad enough description of the effect. In many cases, what happens is that something becomes just popular enough to get picked up by mainstream media, who then propel it into the spotlight.

And what this PEJ top 20 list doesn’t take into account is that people now get online news from lots and lots of sources other than news websites.

Facebook is now the most widely used news reading program. It’s also now the #1 site on the internet. Should it top this chart of news sources? Meanwhile, Twitter has become a primary news source for very many people. And then there are mobile news apps, some of which belong to old media news organizations and some of which don’t. The richness of news distribution systems today is well captured in another PEJ report on the “participatory news consumer.”

So has the internet made it easier to get non-mainstream messages out? I think the answer can only be yes. But don’t expect that anyone will be reading your alternative narratives just because you’ve put them online. Your best bet to to be heard still lies with a small number of very large companies. And although the internet per se is relatively uncensored in many countries, commercial gatekeepers like Apple and Facebook own important dedicated channels, and both of them engage in censorship (1, 2).

4 responses so far

Feb 08 2010

In Xinjiang, the Internet is Guilty Until Proven Innocent

chinese_ff_logo

We are witnessing the birth of a new kind of internet censorship in the Xinjiang province of Western China: the kind where a web site must be specifically allowed, instead of specifically disallowed.

China’s largest province was disconnected from the world completely, including a shutdown of phones and SMS, after hundreds of people were killed in separatist protests by the Uyghur minority people in July. Today, the Far West Blog reports that 27 more web sites have been allowed through the previously complete internet block. Wow. A whole 27. That brings the total number of extra-provincial sites accessible to Xinjiang residents to 31, and all of them are inside China.

The Chinese government maintains that the US-based “World Uyghur Congress” instigated the riots from overseas using the internet and SMS. No communications, no riots, the logic goes. And perhaps this is true, if myopic (fascinating debate on this here).

But there is something very wrong about opening up sites one by one like this, despite the fact that state-run Xinhua news agency is playing it up as communications being “restored”. The current Xinjian policy represents a new and extremely troubling flavor of censorship: rather than some sites being blocked, some sites are allowed. This is a white list, as opposed to the usual black list; the default is now “no”. Bearing in mind that personal satellite dishes are illegal in China, this means the government has complete control over the information that people are exposed to. This is just like the pre-internet era in any number of times and places, really, but that doesn’t make it any better.

At least text messaging, including international text messaging, was restored two weeks ago.

According to Far West Blog, here is what you now get from the outside world if you live in Xinjiang:

  • 7 News Sites (including China Daily and CCTV)
  • 4 Travel Sites (including Ctrip and Air China)
  • 3 Business & Finance Sites
  • 3 Telecom Sites (all three major Chinese carriers)
  • 2 Shopping Sites (including Taobao, China’s version of eBay)
  • 2 Computer Service Sites (so you can update your anti-virus)
  • 2 Gaming Sites (more flash games…yippee)
  • 2 Education Sites (study materials for students and help for teachers)
  • 1 Fashion Site

Yes, this also means no IM, no Skype, no email, no nothing outside of the province. “I have had to sit here and endure a frustrating feeling that we are now living in the stone ages,” says Far West Blog writer Josh.

These 31 sites seem ridiculously limited, and these limits (no email!) would severely hamper business in the affluent Eastern provinces. Xinjiang has only 20 million people, so perhaps China can more or less do without it for a while. But what if the national firewall let through only, say, the top 10,000 or 100,000 currently uncensored international sites? How much easier it would be to prevent some pesky overseas message board from cropping up to corrupt Chinese minds! Why, your world-censoring work would practically be done for you, and almost no one would be the wiser.

Let’s hope that this isn’t a precedent.

UPDATE: There are rumours, based on government statements in December, that a national whitelist is planned. Nothing definitive yet.

One response so far

Oct 04 2009

Advertisers Smoking Crack, and the Future of Journalism According to Leo Laporte

Leo Laporte of This Week in Tech gave a truly marvelous talk on Friday about how his online journalism model works. The first half of the talk is all about how TWIT moved from TV to podcasting and became profitable, and includes such gems as

Advertisers have been smoking the Google and Facebook crack. And they no longer want that shakeweed that the [TV] networks are offering.

The second half is in many ways even better, when Leo takes questions from the audience and discusses topics such as the future of printing news on dead trees

Maybe there will always be [paper] news, but it will be brought to you by your butler who has ironed it out carefully for you. It will be the realm of the rich person.

and the “holy calling” of being a journalist:

You reporters are really the monks of the information world. You labour in obscurity. You have to be driven by passion because  you’re paid nothing. And you sleep on rocks.

He goes on to discuss the necessity of bidirectional communication, Twitter as the “emerging nervous system” of the net, etc. — all the standard new media stuff, but put very succinctly by someone who has deep experience in both old and new media. Very information-dense and enlightening!

No responses yet

Sep 27 2009

Why We Need Open Search, and How to Make Money Doing It

Anything that’s hard to put into words is hard to put into Google. What are the right keywords if I want to learn about 18th century British aristocratic slang? What if I have a picture of someone and I want to know who it is?  How to I tell Google to count the number of web pages that are written in Chinese?

We’ve all lived with Google for so long that most of us can’t even conceive of other methods of information retrieval. But as computer scientists and librarians will tell you, boolean keyword search is not the end-all. There are other classic search techniques, such as latent semantic analysis which tries to return results which are “conceptually similar” to the user’s query, even if the relevant documents don’t contain any of the search terms. I also believe that full-scale maps of the online world are important, I would like to know which web sites act as bridges between languages, and I want tools to track the source of statements made online. These sorts of applications might be a huge advance over keyword search, but large-scale search experiments are, at the moment, prohibitively expensive.

datacenter

The problem is that the web is really big, and only a few companies have invested in the hardware and software required to index all of it. A full crawl of the web is expensive and valuable, and all of the companies who have one (Google, Yahoo, Bing, Ask, SEOmoz) have so far chosen to keep their databases private. Essentially, there is a natural monopoly here. We would like a thousand garage-scale search ventures to bloom in the best Silicon Valley tradition, but it’s just too expensive to get into the business.

DotBot is the only open web index project I am aware of. They are crawling the entire web and making the results available for download via BitTorrent, because

We believe the internet should be open to everyone. Currently, only a select few corporations have access to an index of the world wide web. Our intention is to change that.

Bravo! However, a web crawl is a truly enormous file. The first part of the DotBot index, with just 600,000 pages, clocks in at 3.2 gigabytes. Extrapolating to the more than 44 billion pages so far crawled, I estimate that they currently have 234 terabytes of data. At today’s storage technology prices of about $100 per terabyte, it would cost $24,000 just to store the file. Real-world use also requires backups, redundancy, and maintenance, all of which push data center costs to something closer to $1000 per terabyte. And this says nothing of trying to download a web crawl over the network — it turns out that sending hard drives in the mail is still the fastest and cheapest way to move big data.

Full web indices are just too big to play with casually; there will always be a very small number of them.

I think the solution to this is to turn web indices and other large quasi-public datasets into infrastructure: a few large companies collect the data and run the servers, other companies buy fine-grained access at market rates. We’ve had this model for years in the telecommunications industry, where big companies own the lines and lease access to anyone who is willing to pay.

The key to the whole proposition is a precise definition of access. Google’s keyword “access” is very narrow. Something like SQL queries would expand the space of expressible questions, but you still couldn’t run image comparison algorithms or do the computational linguistics processing necessary for true semantic search. The right way to extract the full potential of a database is to run arbitrary programs on it, and that means the data has to be local.

The only model for open search that works both technologically and financially is to store the web index on a cloud, let your users run their own software against it, and sell the compute cycles.

It is my hope that this is what DotBot is up to. The pieces are all in place already: Amazon and others sell cheap cloud-computing services, and the basic computer science of large-scale parallel data processing is now well understood. To be precise, I want an open search company that sells map-reduce access to their index. Map-reduce is a standard framework for breaking down large computational tasks into small pieces that can be distributed across hundreds or thousands of processors, and Google already uses it internally for all their own applications — but they don’t currently let anyone else run it on their data.

I really think there’s money to be made in providing open search infrastructure, because I really think there’s money to be made in better search. In fact I see an entire category of applications that hasn’t yet been explored outside of a few very well-funded labs (Google, Bellcore, the NSA): “information engineering,” the question of what you can do with all of the world’s data available for processing at high speed. Got an idea for better search? Want to ask new questions of the entire internet? Working on an investigative journalism story that requires specialized data-mining? Code the algorithm in map-reduce, and buy the compute time in tenth-of-a-second chunks on the web index cloud. Suddenly, experimentation is cheap — and anyone who can figure out something valuable to do with a web index can build a business out of it without massive prior investment.

The business landscape will change if web indices do become infrastructure. Most significantly, Google will lose its search monopoly. Competition will probably force them to open up access their web indices, and this is good. As Google knows, the world’s data is exceedingly valuable — too valuable to leave in the hands of a few large companies. There is an issue of public interest here. Fortunately, there is money to be made in selling open access. Just as energy drives change in physical systems, money drives changes in economic systems. I don’t know who is going to do it or when, but open search infrastructure is probably inevitable. If Google has any sense, they’ll enter the search infrastructure market long before they’re forced (say,  before Yahoo and Bing do it first.)

Let me know when it happens. There are some things I want to do with the internet.

3 responses so far

Sep 22 2009

Requiem For The Front Page

Oh Front Page, your days are clearly numbered. For generations all eyes were upon you; you set the public agenda, and advertisers loved you best. In the tumult of the world, your voice carried above all others, and we needed you. You told us when the war ended, and when The Beatles came to town.

But you are in your autumn now.

TheFrontPageisDead

We know that your children killed you, though they did not mean it. In the age of the scribe, it seemed that anyone could own a printing press. But now, Front Page, we talk online about the monopoly you once claimed. Some will pine for newsprint, but paper is just too expensive, too heavy and static.

But this is not about paper. This is about the way you lived your life, your insistence on a space that you and you alone controlled. You tried to move online, Front Page, but your model would not yield and your children ate your lunch. Google News chooses from the best, while Digg lets us choose for ourselves. There will always be reporters — those who assemble the narratives —  but there may not always be editors. Your stubborn insistence on one for all made us question your purpose.

We loved you and you ignored us! Advertisers deserted you first; they were very quick to understand that reader information could be leveraged into relevance. Google itself was built on this model. Meanwhile Amazon and iTunes grasped that efficiencies of delivery had moved the money to the infinite niche. But you admitted none of this, Front Page, and also you did not see that people live in networks, that our friends know what is important to us.

Why would you not give us what we wanted? No one questions your integrity, the standards of journalism you uphold. No one questions that we, the public, need to be told at least as much as we need to be listened to. But suddenly we could talk back, and you weren’t listening. You insisted that we go to you instead of just coming to us. Why did you not use our input to customize the agenda? You could have spawned Facebook applications and iPhone applications and even innovative social RSS readers that determined our interests and automatically delivered ten million personalized headlines! (And their ads.)

You had everything you needed, and this was your unforgivable sin. A hundred years ago you built the Associated Press to feed you, the prototype of distributed journalism. This could have been the beginning, if you had embraced more than the cream of international stories, if you had realized how cheap local reporting could be. Those long tail stories could be vastly cheaper, Front Page, if you embraced more sources, if you fought for transparency instead of access, if you taught citizens to be journalists instead of insisting that they can’t. You could have set the standards and franchised the platform. But instead of finding innovative ways to gather the news and innovative ways to deliver it to us, even now you fight hard to be seen less!

Instead of owning the aggregators and bringing to them the wisdom of an old hand, you scoffed at Digg, at Google, at Memeorandum. Why are there still so many news sites without a panel of  “Share This” links beneath each story? Why are we not allowed to speak to the New York Times with user ratings buttons? Your mannerisms are quaint as hoop-skirts, Front Page.

We know also that your less reputable cousin is only slightly younger, and the world will never listen to Television as their parents did. The internet will devour Broadcast too; in only a few more years bandwidth will be cheap enough for anyone to run their own station. We know that upcoming content analysis algorithms will soon make video search a reality, and we know that the RSS future will soon disaggregate Television News just as it only recently disaggregated you.

Front Page, your children are brash, but they are filled with the energy of youth. They have inherited a world you never foresaw, and they are hopeful in a way you are not. It is their world now. You must guide them, but you must let them have it.

Much as we loved you, your time has passed.

One response so far

Sep 12 2009

Hong Kong is Not Quite China

IMG_0454IMG_0455

The “Pillar of Shame” is faces: faces in agony, anonymous faces, dead faces. It stands in the plaza of the Student Union of the University of Hong Kong as a monument to the 1989 Tiananmen Square Massacre in which hundreds of pro-democracy demonstrators were killed by the Chinese government. It was installed by students on the tenth anniversary of this sorry event, and the police did not stop them.

This would never fly on the mainland. In China, web searches, blog posts, foreign news broadcasts, and even instant messages about the Tiananamen Square massacre are very closely censored. To erect a monument to something that officially did not happen is unthinkable, not to mention severely punishable.

But Hong Kong is different.

The island was a British colony by treaty with the Chinese for 100 years. The 1997 handover to the Chinese government was peaceful, and Hong Kong became the “Hong Kong Special Administrative Region.” This means that it is under different law. In effect, the Hong Kong Basic Law — drafted jointly by the British and Chinese in the late 1980s — is a completely different constitution for the region. Hong Kong and China even require different entry visas and have different immigration procedures. In particular, mainland Chinese residents are not allowed to live here permanently.

And they might want to! Hong Kong residents enjoy freedom of speech, freedom of the press, and guarantees against unwarranted search or detention. The chief executive is not democratically elected, but the legislative council is. The internet is not censored and the economy is officially capitalist.

This arrangement provides a strange vantage point for China observers; it’s China, but also not-China. It’s free, and you can do things here you could never get away with on the mainland. For example, Rebecca MacKinnon of the University of Hong Kong has published some wonderful research on the Chinese internet censorship regime.

The Basic Law is guaranteed by the PRC for 50 years, until 2047. What happens then is anyone’s guess.

One response so far

Jan 11 2009

What Internet Censorship Looks Like, Part 2

The Turkish Government censors internet access from within the country, as I discovered yesterday when attempting to access YouTube from the Turkish town of Selçuk, as this screenshot shows (click to enlarge):

web-censorship-in-turkey

The English text on this page reads: “Access to this web site is banned by ‘TELEKOMÜNİKASYON İLETİŞİM BAŞKANLIĞI’ according to the order of: Ankara 1. Sulh Ceza Mahkemesi, 05.05.2008 of 2008/402″

Just to complete the irony, I was looking for a video of the Oscar Grant shooting when I first discovered this “blocked site” page.

Continue Reading »

2 responses so far

Dec 22 2008

What do you Edit on Wikipedia?

When you edit Wikipedia, what do you write about? Did you sit in the front row or the back row as a child? Did you grow up on science fiction, were you an activist in college? Did no one understand you, or have you always been perfectly normal? Tell me, because I want to know who’s in this conversation.

Continue Reading »

One response so far

Dec 16 2008

Knowing is Not Enough

Wikipedia will save the world. Information is tolerance. When the internet succeeds and all humanity finally has egalitarian access to all information everywhere, a new era of enlightenment will dawn.

Oh really?

Continue Reading »

No responses yet

Next »