What Can We Learn From the Network Structure of Wikipedia Authors?

The edit network for “telephone tapping” shows a bipartite structure, indicating that the topic is controversial (image from Brandes et al.)

An interesting new paper defines the “edit network” of a Wikipedia article by drawing edges to indicate that one person has deleted or restored text written by another. While it’s always fun to look at pictures, the surprise here is that we can verify that the resulting graph structure really does tell us something useful about the article. In this study, articles with a more “bipolar” edit network — meaning that the authors split into basically two camps who routinely undid each other’s edits — were also much more likely to appear on a manually-maintained list of controversial pages.

Although there has been previous work on network mapping of Wikipedia in particular (and of course volumes of work on social networks in general) I find this paper interesting because it tries very carefully to understand whether the pictures mean anything. Like all science, what you find depends on where you look, and the practitioner of network analysis has an absurd amount of freedom to define what a “node” is, what an “edge” is, and how the resulting graph is visually laid out (since the point of a map is a visual representation, it’s very important that graphical properties such as distance, size, color, etc. have the right sort of metaphorical relationships to the more abstract properties we are trying to understand.)

For example, one might try to identify people who hold similar opinions by analyzing who “interacts” with whom and looking for clusters. In this case the nodes are people, and the edges are “interactions.” But what is an “interaction?” A hallway conversation? A co-authored paper? An appearance on the same talk show? If we draw an edge between two people if they’ve ever stood in the same room, we won’t necessarily get a good map of who “agrees” with whom. It is therefore very important to make sure that your definition of an “edge” properly embodies the question you are trying to ask — in this case a question about “similar opinions.” (The most cogent critique of my COIN Policy Author Graph made exactly this point.)

This is a real and serious methodological problem in network analysis, and it’s made worse by our preconceptions. Suppose we suspect that republicans and democrats read different books. We might look for a definition of “edge” that we can apply to sales or reading data, and choose the one that gives the cleanest separation of people into two distinct groups. This makes pretty pictures, but it’s not clear that we can learn anything from such an exercise: all we’ve done is thrown out all the evidence that didn’t prove the notion we had already decided upon.

Back to Wikipedia: the paper is titled “Network Analysis of Collaboration Structure in Wikipedia,” and was written by Ulrik Brandes, Patrick Kenis, Jürgen Lerner and Denise van Raaij, who wanted to know if the structure of the edit network for a particular article could tell us something about where that topic fits into a broader discourse. In particular, they decided to see if they could produce a numerical measure of an article’s “controversiality” by measuring how close the graph is to being bipartite, that is, whether the authors tended to split into two camps, each of whom routinely deleted words written by the other. It is important to note that both the text processing that mines the data and the selection criteria which define an edge are very complex — meaning that other criteria which may not give “good results” are effectively excluded from this study. For this reason, it’s very important to have something to compare their graph metrics against, and they do: the manually-maintained list of controversial pages.

To test whether high values of the bipolarity indicator point to controversy in authors’ opinions, we computed the bipolarity of articles linked from the page Wikipedia:List of controversial issues, with our hypothesis being that bipolarity is high on those controversial articles and lower on non-controversial ones. … To compare controversial articles with non-controversial ones that did receive enough attention, we have chosen so-called featured articles which are listed on the page Wikipedia:Featured articles.

…

The bipolarity index of controversial articles is [statistically] signiﬁcantly higher than the bipolarity of featured articles. Thus, the controversy of topics is indeed reﬂected in the edit behavior on the associated Wikipedia article.

All that, and they have many cool pictures too; as the authors discuss, there’s probably a wealth of data in Wikipedia edit networks — just looking at the maps, they instantly appear so meaningful. But we are all prone to apophenia, so it’s nice to see that, little by little, we are figuring out how to test our theories about what we can actually learn from network analyses.

Jonathan Stray

Information, culture, and belief

What Can We Learn From the Network Structure of Wikipedia Authors?

One thought on “What Can We Learn From the Network Structure of Wikipedia Authors?”

Leave a Reply