The edit network for “telephone tapping” shows a bipartite structure, indicating that the topic is controversial (image from Brandes et al.)
An interesting newÂ paperÂ defines the “edit network” of a Wikipedia article by drawing edges to indicate that one person has deleted or restored text written by another. While it’s always fun to look at pictures, the surprise here is that we can verify that the resulting graph structure really does tell us something useful about the article. In this study, articles withÂ a more “bipolar” edit network — meaning that the authors split into basically two camps who routinely undid each other’s edits — were also much more likely to appear on aÂ manually-maintained list ofÂ controversialÂ pages.
Although there has been previous work on network mapping of Wikipedia in particular (and of course volumes of work on social networks in general) I find this paper interesting because it tries very carefully to understand whether the picturesÂ mean anything. Like all science, what you find depends on where you look, and the practitioner of network analysis has an absurd amount of freedom to define what a “node” is, what an “edge” is, and how the resulting graph is visually laid out (since the point of a map is a visual representation, it’s very important that graphical properties such as distance, size, color, etc. have the right sort of metaphorical relationships to the more abstract properties we are trying to understand.) Â
For example, one might try to identify people who hold similar opinions by analyzing who “interacts” with whom and looking for clusters. In this case the nodes are people, and the edges are “interactions.” But what is an “interaction?” A hallway conversation? A co-authored paper? Â An appearance on the same talk show? If we draw an edge between two people if they’ve ever stood in the same room, we won’t necessarily get a good map of who “agrees” with whom. It is therefore very important to make sure that your definition of an “edge” properly embodies the question you are trying to ask — in this case a question about “similar opinions.” Â (The most cogent critique of my COIN Policy Author Graph made exactly this point.)
This is a real and serious methodological problem in network analysis, and it’s made worse by our preconceptions. Suppose we suspect that republicans and democrats read different books. We might look for a definition of “edge” that we can apply to sales or reading data, and choose the one that gives the cleanest separation of people into two distinct groups. This makes pretty pictures, but it’s not clear that we canÂ learn anything from such an exercise: all we’ve done is thrown out all the evidence that didn’t prove the notion we had already decided upon.
Back to Wikipedia: theÂ paperÂ is titled “Network Analysis of Collaboration Structure in Wikipedia,” and was written by Ulrik Brandes,Â Patrick Kenis,Â JÃ¼rgen Lerner andÂ Denise van Raaij, whoÂ wanted to know if the structure of the edit network for a particular article could tell us something about where that topic fits into a broader discourse. In particular, they decided to see if they could produce a numerical measure of an article’s “controversiality” by measuring how closeÂ the graph isÂ to beingÂ bipartite, that is, whether the authors tended to split into two camps, each of whom routinely deleted words written by the other. It is important to note that both the text processing that mines the data and the selection criteria which define an edge are very complex — meaning that other criteria which may not give “good results” are effectively excluded from this study. Â For this reason, it’s very important to have something to compare their graph metrics against, and they do: theÂ manually-maintained list ofÂ controversialÂ pages.
To test whether high values of the bipolarity indicatorÂ Â point to controversy in authorsâ€™ opinions, we computed theÂ bipolarity of articles linked from the page Wikipedia:ListÂ of controversial issues, with our hypothesis being thatÂ bipolarity is high on those controversial articles and lowerÂ on non-controversial ones. … To compareÂ controversial articles with non-controversial ones that didÂ receive enough attention, we have chosen so-called featuredÂ articles which are listed on the page Wikipedia:FeaturedÂ articles.Â
The bipolarity index of controversial articles is [statistically] signiï¬cantly higher than the bipolarity of featured articles. Thus,Â the controversy of topics is indeed reï¬‚ected in the edit behavior on the associated Wikipedia article.Â