In Common Knowledge: An Ethnography of Wikipedia, Dariusz Jemielniak—User:Pundit on the English and Polish Wikipedias and a steward—discusses Wikipedia from the standpoint of an experienced editor and administrator who is also a university professor specializing in management and organizations. In Virtual Reality: Just Because the Internet Told You, How Do You Know It's True?, journalism professor and author Charles Seife presents a more broadly themed work reminding us to question the reliability of information found throughout the Internet; he cites Wikipedia as a prime example of a website whose contents contain enough misinformation to warrant caution before relying on the information on the site.
Jemielniak's book is an academic discussion of Wikipedia; he does not aim to present either a "how-to" guide for editors and readers or a complete history of the project. He states that his "book is a result of long-term, reflexive participative ethnographic research" performed as a "native anthropologist." (p. 193) (The word "ethnographic" in this context refers not to ethnicity in the quasi-racial sense, but to the study of a subgroup of the population—here, the subgroup that actively edits Wikipedia.) By this, Jemielniak means that he has spent several years as a Wikipedian, has introspected about his experiences throughout that time through the lens of his academic background, and has now written up his findings and conclusions. I don't think he means that he became active in Wikipedia for the purpose of doing research about it, although it seems quite possible that he started thinking about combining his editing hobby and his professional interests fairly early in his wiki-career.
I cannot pretend to evaluate Common Knowledge as a work of anthropology or of organizational management science. As a general reader and a Wikipedian, I found the book interesting as a compilation of incidents in Wikipedia's history, some of which I was already familiar with and some of which were new to me, and as a reminder of some issues the project faces as it moves forward. Non-academic readers may find the book lacking in a unifying theme, beyond that Wikipedia plays an important role in the world today that warrants academic study of its culture and communities. Jemielniak recently stated (on a Wikipediocracy thread) that "I wrote this book for academic research purposes, I absolutely have no hope of high sales (and honestly, I'll be surprised if it goes beyond 500 copies)." The book has been praised by Jimmy Wales, Clay Shirky, Jonathan Zittrain, and Zygmunt Bauman and it deserves to sell well over 500 copies, but it won't make be making the wiki-best-seller list either.
The eight chapters of Common Knowledge discuss basic rules governing Wikipedia, different roles contributors take on within the project, dispute resolution processes, and the nature of project leadership. The topics are illustrated with examples of disputes or controversies drawn primarily from English Wikipedia history (though controversies about actions by Jimbo Wales on Wikimedia Commons and Wikiversity are also mentioned). The incidents Jemielniak discusses are presented in detail and accurately, but some of them are ten years old and don't necessarily reflect the project's practices or realities today. For example, Jemielniak reviews the bitter and protracted disagreement on En-WP regarding when the historical German-language name "Danzig" should be used for the city now located in Poland and known as Gdańsk. Perhaps aided by his own geographical and historical background, he does an excellent job of presenting the history of the dispute, surveying the arguments for the different points of view, and explaining why the dispute-resolution process ultimately reached the result it did. He does not, however, discuss whether the Wikipedia of 2014 would address the same issue, if it were arising anew, in the same fashion that the much younger Wikipedia of 2003-2004 did.
Jemielniak also doesn't spend much time discussing how lessons learned from Wikipedia dispute-resolution experiences can be used to minimize future disputes or to improve future decision-making. I find this unfortunate, but I can't call it a fault of the book, both because ethnography is descriptive rather than prescriptive, and more importantly because the failure to take stock of dispute-resolution successes and failures has struck me for years as a project-wide myopia. In the 13½ years of English Wikipedia there have been, in round numbers, a billion edit-wars, yet no one knows whether most edit-wars get resolved by civil discussion reaching a consensus on the optimal wording, or by one side's giving up and wandering away (or sometimes by everyone's ultimately losing interest and wandering away). Similarly, the English Wikipedia Arbitration Committee has decided several hundred cases since 2004, and community discussions on noticeboards have resolved thousands more content and conduct disputes, yet no one ever seems to have gone back and conducted any systematic review of which approaches to dispute-resolution worked better than others. That's a different book that ought to be written, although it too risks selling fewer than 500 copies.
Speaking of ArbCom (which I'm prone to do since I've served on ours since 2008), Jemielniak mentions the Arbitration Committees of both the Polish Wikipedia and the English Wikipedia. He opens the book with an account of a Polish Wikipedia arbitration case that resulted in his being blocked from Po-WP for one day. He claims that in retrospect he accepts the ruling against him, but his account of the dispute makes that ruling sound terribly unfair—a cynical gesture of evenhandedness, but meted out to editors who didn't deserve to be treated evenhandedly. (But of course those of us who can't read Polish will never hear the other side of the story.)
The book's mentions of En-WP ArbCom are sound, but dated. He discusses the historical origin of the Committee as an extension of the original authority of Jimmy Wales, and cites a handful of Committee decisions, the most recent of which is an unusual case-motion from 2009. He does not spend much time on the current role of the Committee. That's actually a very defensible omission, because at least on English Wikipedia (I can't speak for other projects), while ArbCom has other responsibilities (some of which most of us don't particularly want), the importance of the Arbitration Committee as an arbitration committee has radically declined in the past few years. (I've discussed this decline here.) So Jemielniak's not spending nearly as much space discussing arbitration as one might expect in a book about Wikipedia hierarchies, leadership, and dispute resolution turns out to be a reasonable decision, but one that is not explained.
Although the academic style of Common Knowledge (and the price of the book) will deter some readers, Wikipedians who want a taste of Jemielniak's thinking about the project can find it in a recent article he contributed to Slate, "The Unbearable Bureaucracy of Wikipedia". In this article, aimed at a general rather than an academic audience, Jemielniak posits that Wikipedia's "increasingly legalistic atmosphere is making it impossible to attract and keep the new editors the site needs." It's a thoughtful article that identifies a significant issue, and its more direct approach accompanied by concrete suggestions make this article more accessible than Common Knowledge for non-specialist readers. All of us who want Wikipedia to thrive, which requires that the project welcome newcomers and facilitate their becoming regular editors, can hope for more such wisdom from this Pundit.
By contrast to Jemielniak's academic treatment specific to Wikipedia, Charles Seife—the author of Zero, Alpha and Omega, and Proofiness—has written a more broadly themed book about the unreliability of information found throughout the Internet. "Just because the Internet told you," the subtitle asks, "how do you know it's true?" Now at one level, the fact that the Internet contains a fair amount of misinformation is not breaking news; "Someone is wrong on the internet" became a meme and then a cliché for a reason. Lots of us think we're sophisticated enough to avoid falling into the kinds of traps that Seife warns us about—but the warnings in Seife's book are important and timely nevertheless.
Wikipedia is just one of the many online sources of bad information that Seife discusses, but for obvious reasons it's the one I'll focus on here. Seife catalogs a dozen instances in which deliberate misinformation was introduced into Wikipedia. Such misinformation is inserted into Wikipedia, perhaps every day, by a miscellaneous array of pranksters, hoaxers, vandals, defamers, and in a few instances by Wikipedia critics conducting so-called "breaching experiments" to see how long a falsehood placed in Wikipedia stays in Wikipedia. (Such experiments are not permitted; see also Wikipedia:Do not create hoaxes.) Some of Seife's examples will be well-known to "Signpost" readers, such as the Colbert-inspired tripling of elephants and the Bicholim Conflict; others were new to me, such as AC Omonia Nicosia and the Edward Owens hoax.
Experienced Wikipedians are well-aware of this problem, as are our critics. English Wikipedia, in what can equally be considered admirable self-criticism or self-absorbed navel-gazing, contains discussions of hoaxes on Wikipedia; we also have a lengthy List of hoaxes on Wikipedia; and another compilation recently appeared on a critic site here. (Wikipediocracy link)
Misinformation in the media has always been with us (Tom Burnham's books were favorites of mine growing up, and I'm mildly dismayed that Burnham's name comes up a redlink), but it certainly is possible to spread false information more rapidly online than it was in the analog era. Of course, it is possible to spread correct information more rapidly as well. A particular problem is misinformation posted on Wikipedia—and elsewhere all over the Internet—with the purpose of doing harm to someone. (A prime example of this sort of thing is the Qworty fiasco that unfolded last year.) Any falsehoods in article content damage the credibility and usefulness of the encyclopedia we are collaboratively writing, but intentional falsehoods posted by a subject's personal or political or ideological enemies with the malicious intent to defame or damage a living person do so tenfold. I am confident that well over 99% of Wikipedia pages are free of intentional falsehoods—yet no one can deny that Wikipedia articles must still contain far too many lies, damn lies, and sadistics.
Neither Seife nor Jemielniak say much about the biographies of living persons policy and its enforcement, although many Wikipedians, including myself, have long thought fair treatment of our article subjects to be the central ethical issue affecting the project. I know that when I've been defamed online I didn't enjoy it, and that Wikipedia BLP subjects feel the same way when their number-one Google hit has been edited in nasty ways by their personal or political or ideological enemies. (The good news is that when I or others spot defamation on Wikipedia we are often able to do something about it; I've often wished that I had an "edit" and a "delete" button that I could use on the rest of the Internet.)
Seife's discussion of misinformation on Wikipedia focuses on intentionally false information, but a greater number of inaccuracies are introduced by editors who make honest mistakes than by hoaxers and vandals. Sometimes mistakes are made by good editors who inadvertently type the wrong word or misread a source. Other times, we encounter a good-faith editor who wants to help build Wikipedia but, at least in a given topic-area, simply doesn't know what he or she is talking about. Wikipedia has no systematic system of quality control beyond surmounting the bar for deletion, at least until one seeks to bring an article to the mainpage or have it rated (at which point various sorts of flyspecking take place—some of which can be overdone, but that's another discussion). On English Wikipedia today, there are dedicated noticeboards to address conflict-of-interest issues, evaluate the reliability of sources, solve copyright problems (some quite abstruse), keep fringe theories in check, and put a stop to edit-warring. I've never seen anyone wonder why there's no dedicated noticeboard where one goes for help in figuring out whether questionable information in an article is accurate or not.
Despite the falsehoods he identifies, all of which have now been removed, Seife acknowledges that "by some measures one can argue that Wikipedia is roughly as accurate as its paper-and-ink competitors." (p. 29) He cites the well-known 2005 Nature article comparing the accuracy of Wikipedia's scientific content to that of a canonical, traditional reference source, the Encyclopedia Britannica. One continues to read of comparisons of Wikipedia with traditional library reference books (see Reliability of Wikipedia). The Wikipedia community should certainly aspire for our encyclopedia to land on the favorable side of such comparisons. I think that on balance it does.
But "Wikipedia vs. Britannica" is no longer the right question, or at least not the only right question. At least equally relevant today is how Wikipedia's completeness and fairness and accuracy compare, not only to traditional media sources, but to the other information available on the Internet. Wikipedia has evolved as part of, not independent of, the Internet as a whole. And it is the Internet as a whole, not just Wikipedia, that has changed the population's information-searching habits, so that today when one needs or wants to look something up, one does so on the computer or a handheld device rather than in a book or a (hard-copy) journal or newspaper. In the unlikely event that Wikipedia (and all of its mirrors and derivatives) were to disappear tomorrow (and not be replaced by a similar site), our readers from schoolchildren to senior citizens would not revert to the habits of 25 years ago and start trooping to the library or even the reference shelves in their living rooms when they wanted to check a fact. (I am not saying this is a good thing or a bad thing, though it has elements of both; it is simply a truth.)
Instead, people in the wikiless world would still perform the same Google searches that today bring up their subject's Wikipedia article as a top-ranking hit. They would find the same results, minus Wikipedia, and they would look at the other top-ranking hits on their subject instead. Would those pages, on average, provide better-written, better-sourced, more accurate, and more fair coverage of their subject than the corresponding Wikipedia pages? And to the extent the answer is yes, how do we link the best of that content to become accessible from Wikipedia? A future Wikipedia scholar may wish to focus more on these questions (and produce another 495-copy-selling book).
Seife rather kindly refrains from discussing in the book, as an example of a questionable Wikipedia page, his own BLP. Predictably, that page is the first Google hit on Seife's name (his own webpage at NYU is second). Unfortunately, the article bears a prominent, disfiguring banner at the top of the page, proclaiming that:
Now, no well-informed reader of Wikipedia would take this pronouncement alleging that Charles Seife is an ill-written article as a reflection against Charles Seife. (If anything, the obvious circular reasoning suggests sloppiness in the crafting of the tag.) After all, the reader would know that Charles Seife wouldn't have written the article and, as a matter of our conflict-of-interest guidelines, is discouraged from editing the article at all, much less improving its overall editorial quality. Nonetheless, it isn't exactly encouraging that in the 13 months since an anonymous IP editor added that tag, no one has improved the article enough to resolve the quality concern and remove the tag. If I were notable enough to warrant a Wikipedia BLP and this were the state of it for over a year, I think I'd have the right to be ticked off. (Cynical aside to editors interested in Wikipedia's public relations: improve the BLPs of journalists likely to cover us.)
Meanwhile, in a recent radio interview—which is well worth listening to—Seife claims that Wikipedia gets four or five facts of his life wrong (not controversial claims, he says, just basic facts, though he doesn't name them), which knowing about the COI guideline he didn't fix. (Aside to Charles Seife: let me know about the non-controversial fixes needed and I'll make them myself. You won't need to go to The New Yorker à la Philip Roth.)
The bottom line on these two books: Wikipedians should read (and think carefully about) Jemielniak's Slate article, but only the hardier ones among us will gain the full benefit of his book, although all of us should thank him for writing it. More Wikipedians will enjoy Seife's book, though only a sliver of it is about Wikipedia, and perhaps everyone should listen to his radio interview, although for many of us both the book and interview will reinforce, rather than challenge, our existing views about the reliability of the information that surrounds us.
A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.
Kim Osman has performed a fascinating study[1] on the three 2013 failed proposals to ban paid advocacy editing in the English language Wikipedia. Using a Constructivist Grounded Theory approach, Osman analyzed 573 posts from the three main votes on paid editing conducted in the community in November, 2013. She found that editors who opposed the ban felt that existing policies of neutrality and notability in WP already covered issues raised by paid advocacy editing, and that a fair and accurate encyclopedia article could be achieved by addressing the quality of the edits, not the people contributing the content. She also found that a significant challenge to any future policy is that the community 'is still not clear about what constitutes paid editing'.
Osman uses these results to argue that there has been a transition in the values of the English language Wikipedia editorial community from seeing commercial involvement as direct opposition to Wikipedia's core values (something repeated at the institutional level by the Wikimedia Foundation and Jimmy Wales who see a bright line between paid and unpaid editing) to an acceptance of paid professions and a resignation to their presence.
Osman argues that the romantic view of Wikipedia as a system somehow apart from the commercial market that characterized earlier depictions (such as those by Yochai Benkler) has been diluted in recent years and that sustainability in the current environment is linked to a platform's ability to integrate content across multiple places and spaces on the web. Osman also argues that these shifts reflect wider changes in assumptions about commerciality in digital media and that the boundaries between commercial and non-profit in the context of peer production are sometimes fuzzy, overlapping and not clearly defined.
Osman's close analysis of 573 posts is a valuable contribution to the ongoing policy debate about the role of paid editing in Wikipedia and will hopefully be used to inform future debates.
To build multilingual dictionaries to and from every language is combinatorially a lot of work. If one uses triangulation–if A means B, and B means C, then A means C (see figure)–then a lot of the work can be done by machine. A large closed-source effort did this in 2009[supp 1], but a new paper by Ács[2] defends "while our methods are inferior in data size, the dictionaries are available on our website"[supp 2]. Their approach used the translation tables from 53 Wiktionaries, to make 19 million inferred translations more than the 4 million already occurring in Wiktionary. The researchers steered clear of several classical problems like polysemy, one word having multiple meanings, by using a machine learning classifier. The features used in the classifier were based on the graph-theoretic attributes of each possible word pair. For instance, if two or more languages can be an intermediate "pivot" language for translation, that turned out to be a good indicator of a valid match. In order to test the precision of these translations, manual spot checking was done and found a precision of 47.9% for newly found word-pairs versus 88.4% for random translations coming out of Wiktionary. As for recall, which tested the coverage of a collection of 3,500 common words, 83.7% of words were accounted for by automatic triangulation in the top 40 languages. That means that right now if we were to try and make a 40-language pocket phrasebook to travel around most of the world just using Wiktionary, about 85% of the time there would be a translation, and it would be between 50-85% correct.
This performance would likely need to increase before any results could be operationalized and contributed back into Wiktionary. However, given the fact that the code used to parse and compare 43 different Wiktionaries was also released on GitHub[supp 3], that goal is a possibility. It's yet another testament to the open ecosystem to see a Wikimedia project along with Open Researcher efforts make a resource to rival a closed standard. While Ács' research isn't the holy grail of translation between arbitrary languages, it cleverly mixes established theory and open data, and then contributes it back to the community.
A new study[3] by Tran and Christen is the latest example of academic research on vandalism detection which has been developed over the years[supp 4] in the context of the PAN workshop[supp 5], where researchers develop both corpus data and tools to uncover plagiarism, authorship, and the misuse of social media/software. This work should be of interests to both researchers and Wikipedians because of (a) the need to detect vandalism and (b) the interesting question whether such vandalism-fighting data and tools are transferable or portable from one language version to another. Both the vandalism-fighting corpus and tools have both practical and theoretical implications for understanding the cross-lingual transfer in knowledge and bots.
In 2010 and 2011, Wikipedia vandalism detection competitions were included by the PAN as workshops. It started with Martin Potthast's work on building the free-of-charge PAN Wikipedia vandalism corpus, PAN-WVC-10 for research, which compiled 32452 edits based on 28468 Wikipedia articles, among which 2391 vandalism instances were identified by human coders recruited from Amazon's Mechanical Turk[supp 6]. In 2011, a larger crowdsourced corpus of 30,000+ Wikipedia edits is released in three languages: English, German, and Spanish[supp 7], with 65 features to capture vandalism.
Based on even larger datasets of over 500 million revisions across five languages (en:English, de:German, es:Spanish, fr:French, and ru:Russian), Tran & Christen's latest work adds to the efforts by applying several supervised machine learning algorithms from the Scikit-learn toolkit[supp 8], including Decision Tree (DT), Random Forest (RF), Gradient Tree Boosting (GTB), Stochastic Gradient Descent (SGD) and Nearest Neighbour (NN).
What Tran & Christen confirm from their findings is that "distinguishing the vandalism identified by bots and users show statistically significant differences in recognizing vandalism identified by users across languages, but there are no differences in recognizing the vandalism identified by bots" (p.13) This demonstrates human beings can recognize a much wider spectrum of vandalism than bots, but still bots are shown to be trainable to be more sophisticated to capture more and more nonobvious cases of vandalism.
Tran & Christen try to further make the case for the benefits of cross language learning of vandalism. They argue that the detection models are generalizable, based on the positive results of transferring the machine-learned capacity from English to other smaller Wikipedia languages. While they are optimistic, they acknowledge such generalization has at best been proven among some of the languages they studied (these languages are all Roman-alphabet-based languages except for Russian), and the poor performance of the Russian language model. Thus, Tran & Christen rightly point out the need for research on non-English and especially non-European language versions. They also recognize that many word based features are no longer useful for some languages such as Mandarin Chinese, because of tokenization and other language-specific issues.
Tran & Christen call for next research projects to include languages such as Arabic and Mandarin Chinese to complete the United Nations working set of languages. It will be interesting to see how such research projects can be executed and how the greater Wikipedia research and editor community can help and/or use such research efforts.
A conference paper titled "Reader Preferences and Behavior on Wikipedia"[4] deals with the under-studied population of Wikipedia readers. The paper provides a useful literature review on the few studies about reading preference of that group. The researchers used publicly available page view data, and more interestingly, were able to obtain browsing data (such as time spend by a reader on a given page). Since such data is unfortunately not collected by Wikipedia, the researchers obtained this data through volunteers using a Yahoo! toolbar. The authors used Wikipedia:Assessment classes to gauge article's quality.
The paper offers valuable findings, including important insights to the Wikipedia community, namely that "the most read articles do not necessarily correspond to those frequently edited, suggesting some degree of non-alignment between user reading preferences and author editing preference". This is not a finding that should come as much surprise, considering for example the high percentage of quality military history articles produced by the WikiProject Military History, one of the most active if not the most active wikiproject in existence - and of how little importance this topic is to the general population. Statistics on topics popularity and quality of corresponding articles can be seen in Table 1, page 3 of the article. Figure 1 on page 4 is also of interest, presenting a matrix of articles grouped by popularity and length. For example, the authors identify the area of "technology" as the 4th most popular, but the quality of its articles lags behind many other fields, placing it around the 9th place. It would be a worthwhile exercise for the Wikipedia community to identify popular articles that are in need of more attention (through revitalizing tools like Wikipedia:Popular pages, perhaps using code that makes WikiProject popular pages listing work?) and direct more attention towards what our readers want to read about (rather than what we want to write about). Finally, the authors also identify different reading patterns, and suggest how those can be used to analyze article's popularity in more detail.
Overall, this article seems like a very valuable piece of research for the Wikipedia community and the WMF, and it underscores why we should reconsider collecting more data on our readers' behavior. In order to serve our readers as best as we can, more information on their browsing habits on Wikipedia could help to produce more valuable research like this project.
An article[5] in "Business Horizons", written in a very friendly prose (not a common finding among academic works), looks at Wikipedia (as well as some other forms of collaborative, Web 2.0 media) from the business perspective of a public relations/marketing studies. Of particular interest to the Wikipedia community is the authors goal of presenting "the three bases of getting your entry into Wikipedia, as well as a set of guidelines that help manage the potential Wikipedia crisis that might happen one day." The authors correctly recognize that Wikipedia has policies that must be adhered to by any contributors, though a weakness of the paper is that while it discusses Wikipedia concepts such as neutrality, notability, verifiability, and conflict of interest, it does not link to them. The paper provides a set of practical advice on how to get one's business entry on Wikipedia, or how to improve it. While the paper does not suggest anything outright unethical, it is frank to the point of raising some eyebrows. While nobody can disagree with advice such as "as a rule of thumb, try to remain as objective and neutral as possible" and "when in doubt, check with others on the talk page to determine whether proposed changes are appropriate", given the lack of consensus among Wikipedia's community on how to deal with for-profit and PR editors, other advice such as "maximize mentions in other Wikipedia entries" (i.e. gaming WP:RED), "be associated with serious contributors...leverage the reputation of an employee who is already a highly active contributor... [befriend Wikipedians in real life]", "When correcting negative information is not possible, try counterbalancing it by adding more positive elements about your firm, as long as the facts are interesting and verifiable", "...you might edit the negative section by replacing numerals (99) with words (ninety-nine), since this is also less likely to be read. Add pictures to draw focus away from the negative content" might be seen as more controversial, falling into the gaming the system gray area. The "Third, get help from friends and family" section in particular seems to fall foul of meatpuppetry.
In the end, this is an article worth reading in detail by all interested in the PR/COI topics, though for better or worse, the fact that it is closed access will likely reduce its impact significantly. On an ending note, one of the two article's co-authors has a page on Wikipedia at Andreas Kaplan, which was restored by a newbie editor in 2012, two years after its deletion, has been maintained by throw-away SPAs, and this reviewer cannot help but notice that it still seems to fail Wikipedia:Notability (academics)...
In 2012, the authors of this paper[6] have given out over a hundred barnstars to the top 1% most active Wikipedians, and concluded that such awards improve editors productivity. This time they repeated this experiment while broadening their sample size to the top 10% most active editors. After excluding administrators and recently inactive editors, they handed out 300 barnstars "with a generic positive text that expressed community appreciation for their contributions", divided between the 91st–95th, 96th–99th, and 100th percentiles of the most active editors (this corresponds to an average of 282, 62 and 22 edits per month) and then tracked the activity of those editors, as well as of the corresponding control sample which did not receive any award. The experiment was designed to test the hypothesis that less active contributors will be responsive to rewards, similar to the most highly-active contributors from the prior research.
The authors found, however, that rewarding less productive editors did not stimulate higher subsequent productivity. They note that while the top 1% group responded to an award with an increase in productivity (measured at a rather high 60% increase), less productive subjects did not change their behavior significantly. The researchers also noted that while some of the top 1% editors received an additional award from other Wikipedians, not a single subject from the less active group was a recipient of another award.
The researchers conclude that "this supports the notion that peer production’s incentive structure is broadly meritocratic; we did not observe contributors receiving praise or recognition without having first demonstrated significant and substantial effort." While this will come as little surprise to the Wikipedia community, their other observation - that outside the top 1% of editors, awards such as barnstars have little meaningful impact - is more interesting.
Further, the authors found that while rewarding the most active editors tends to increase their retention ratio, it may counter-intuitively decrease the retention ratio of the less active editors. The authors propose the following explanation: "Premature recognition of their work may convey a different meaning to these contributors; instead of signaling recognition and status in the eyes of the community, these individuals may perceive being rewarded as a signal that their contributions are sufficient, for the time being, or come to expect being rewarded for their contributions." They suggest that this could be better understood through future research. For the community in general, it raises an interesting question: how should we recognize less active editors, to make sure that thanking them will not be taken as "you did enough, now you can leave"?
A list of other recent publications that could not be covered in time for this issue – contributions are always welcome for reviewing or summarizing newly published research.
Another hoax on the English Wikipedia was uncovered this week—not by any thorough investigation, but through the self-disclosure of an anonymous change made when the editors were in their sophomore year of college. The deliberate misinformation had been in the article for over five years with plenty of individuals noticing, but not one suspected its authenticity. This leads to one obvious question: how many more are there?
Amelia Bedelia is a fictional character used by children's book author Peggy Parish and her nephew Herman Parish, who stepped in to continue the series after the former's death in 1988. Bedelia is over 50 years old and is literal-minded to the extreme. According to publisher HarperCollins, "When she makes a sponge cake, she puts in real sponges. When she weeds the garden, she replants the weeds. And when she pitches a tent, she throws it into the woods!" The New York Times Book Review noted that "No child can resist Amelia and her literal trips through the minefield of the English language—and no adult can fail to notice that she's usually right when she's wrong." Writer Cynthia Samuels continued:
“ | Much as another beloved children's hero, Curious George (always in scrapes because of his curiosity), often saves the day because of his special monkey talents - so Amelia redeems herself through her special talent. She can cook. Just when things are at their darkest, or dustiest, or weediest, Amelia pulls out a little this and a pinch of that and comes up with the best meal in town. Like George, Amelia is forgiven because of her special gifts. Certainly a child could wish for no less. | ” |
However, Peggy Parish would likely be the first to tell her readers that her main character was not based on a maid in Cameroon.
Nor did she spend some "formative years" there.
Yet this is precisely what the Wikipedia article on Amelia Bedelia had said since January 2009: "Amelia Bedelia's character is based on a maid in Cameroon, where the author spent some time during her formative years. Her vast collection of hats, notorious for their extensive plumage, inspired Parish to write an assortment of tales based on her experiences in North Africa."
The hoax was only revealed when EJ Dickson, a journalist and one of the two original hoax editors, noticed a series of tweets including one from Jay Caspian Kang, an editor for the New Yorker, that highlighted the text Dickson wrote five years earlier. In her words, "It was total bullshit ... It was the kind of ridiculous, vaguely humorous prank stoned college students pull, without any expectation that anyone would ever take it seriously." Her co-editor Evan continued, "I feel like we sort of did it with the intention of seeing how fast it would take to get it taken down [by Wikipedia editors]".
Their edits were removed after Dickson publicized her edits in the Daily Dot.
Hoaxes have a lengthy history on Wikipedia. The longest-lasting hoax was a two-sentence, obscure biography of Gaius Flavius Antoninus, who was supposedly a Roman politician who helped assassinate Julius Caesar in 44BCE.
At least 23 known hoaxes have lasted for five to six years, including an article on an equally obscure alleged war between Portugal and the Maratha Empire of modern-day India. Wikipedia editor A-b-a-a-a-a-a-a-b-a, who is now indefinitely blocked, wrote that this "Bicholim conflict" took place in 1640–41 and the resulting peace treaty played a major role in Portugal's keeping control over Goa until the 1960s. At the time it was exposed as a hoax, the meticulously created article had held good article status for five years. It was over 4300 words long, and had about 150 citations.
Numerous hoaxes have existed for shorter amounts of time. Among the most colorful was another painstakingly detailed entry on the Upper Peninsula War. Boasting 23 references in its bibliography, this fake article chronicled a struggle between the United States, Canada, and nascent separatists in Michigan spawned from a disputed territorial line in the Upper Peninsula. It ended with the massacre of numerous Canadian troops (along with 80–120 civilians suspected of being Canadian co-conspirators), and the arrest and execution of Michigan's governor.
This fantastical story turned out to be a success story for Wikipedia: the hoax, despite the effort that had been put into it, was caught, nominated for deletion within a week of creation, and disposed of.
With this latest hoax revelation, how many more are out there? An op-ed published in the Signpost last year argued that studies show Wikipedia is very accurate and false information is near the level of statistical irrelevance. When hoaxes do occur, they "have reached great prominence, true, but they are small in number, and they can be caught." According to the author, "Wikipedia is generally fairly effective (if not perfect) at keeping its information clean and rid of errors."
Yet just by itself, the Bedelia hoax caused a number of others to be revealed in comment threads discussing the case, including false ghost stories and a new origin story for the corporate name Verizon. Dickson's article also referenced a prior hoax regarding the alleged inventor of S'mores; one of those claimed inventors even had their own biography article which was deleted last year, but not before being cited in a number of books. How many more remain hidden in plain sight?
Though not a defense, these problems of falling for false information are not new. John McIntrye, a copyeditor for the The Baltimore Sun and a noted critic of Wikipedia, also wrote about this latest hoax, and noted that those who were duped showed a "hardly novel" combination of laziness and gullibility, as demonstrated long ago by H.L. Mencken's 1917 Bathtub hoax.
Still, as EJ Dickson's article concluded, "I learned from my inadvertent Wikipedia hoax ... not that Wikipedia itself isn't reliable, but that ... many people believe it is." Numerous examples of Bedelia's alleged Cameroonian origins have been written about by scholars, bloggers, academics, and apparently even the current author of the series himself, who reportedly told a journalist in 2009 that the character was based on "a French colonial maid in Cameroon." The fact that these hoaxes are not caught for such a long time does not mean they cannot be caught—a discerning editor looking for questionable claims and lack of citations may spot them.
But the average reader using Wikipedia will likely not.
The Wikimedia Education Program currently spans 60 programs around the world. Students and instructors participate at almost every level of education. Subjects covered include law, medicine, arts, literature, information science, biology, history, psychology, and many others. This Signpost series presents a snapshot of the Wikimedia Global Education Program as it exists in 2014. We interviewed participants and facilitators from the United States and Canada, Serbia, Israel, the Arab World, and Mexico, in addition to the Wikimedia Foundation.
Based on emails with Samir El-Sharbaty, member of the Egypt Wikimedians user group which was approved in July 2014 by the Affiliations Committee
Congratulations for getting user group recognition from the Affiliations Committee. Does the user group plan to involve itself with the Wikipedia Education Program in Egypt, and if yes, how?
How would you describe the current WEP program in Egypt?
How many high schools and universities participate in WEP in Egypt, and how many instructors and students participate?
Which languages of Wikipedia do students read and edit?
How much student activity is translation and how much is new prose?
Besides Wikipedia, do students or instructors contribute to other Wikimedia projects like Wiktionary, Wikisource, or Commons?
How many Egyptian Wikimedia volunteers assist students and instructors?
Is there anything else that ‘’Signpost’’ readers should know about the Education Program in Egypt?
Based on emails and a Skype interview with Tighe Flanagan, WMF Arab World Wikipedia Education Program Manager
Can you describe how the Education Program started in the Arab World?
How many instructors and students currently participate in the program?
Which countries currently participate?
What grade levels are the students who participate?
As you probably know, Wikipedia editors are predominantly male in most languages. Approximately what percentage of the students who participate in the Arab World education program are female?
How are instructors and students trained to use Wikipedia?
Do students and instructors usually use VisualEditor?
What kinds of assignments do students receive when using Wikipedia in the classroom? For example, are they translating, editing existing articles, or creating new articles? Which languages do they use?
Has the program received any endorsements from governments of countries that are participating?
How do you expect the program to develop in the next few years?
Is there anything else that you would like Signpost readers to know about the program?
Languages
Outreach
Biggest challenges
Cultural norms: celebrations and physical artifacts
Going forward
We indeed moved far away from football this week, and further into much more serious issues of war and death. The Israel-Palestinian conflict continues to dominate the news, and the top 10, with Gaza Strip (#4), Israel (#9), and Hamas (#10). The top 25 also includes Palestine (#15) and Israeli–Palestinian conflict itself (#17). Death also lies behind the popularity of James Garner (#1), the American actor who died on 19 July, Malaysia Airlines Flight 17 (#3), and Deaths in 2014 (#8).
We have Reddit to thank for some less serious topics of interest, including a funny story about songwriter Tom Lehrer (#5), as well as how land mine (#7) areas in the Falkland Islands have become penguin sanctuaries. Actress Rose Leslie (#21) made the top 25 simply because Reddit noticed she grew up in a castle. It's worth noting that earlier this week The New York Times was asking "Can Reddit Grow Up?", about that site's efforts to develop a mature business model. Considering that Reddit and Google Doodles are without peer in their ability to direct traffic, at least to Wikipedia, it stands to reason that someone will figure out how to leverage that site's massive audience.
For the full top 25 list, see WP:TOP25. See this section for an explanation for any exclusions.
For the week of 20 to 26 July 2014, the ten most popular articles on Wikipedia, as determined from the report of the 5,000 most viewed pages, were:
Rank | Article | Class | Views | Image | Notes |
---|---|---|---|---|---|
1 | James Garner | 1,160,042 | This American actor died on July 19 at age 86 of a heart attack. Garner starred in several popular television series over more than five decades, including Maverick and The Rockford Files. He also starred in more than 50 films. | ||
2 | Fifty Shades of Grey | 579,935 | This 2011 erotic romance novel by E. L. James (pictured) is one of the biggest best sellers of the past decade. It is being adapted into a movie directed by Sam Taylor-Wood so that even more people can experience it. On July 24, the movie trailer for the film was released, which is no doubt why this article was so popular this week. | ||
3 | Malaysia Airlines Flight 17 | 576,750 | The tragic shooting down of this passenger aircraft over Eastern Ukraine drops one spot this week. Although it seems likely that Russian-backed insurgents, who recently downed some Ukrainian planes in the same area, mistook the Boeing 777 for a Ukrainian military plane, a full investigation of the crash needs to be completed. That continues to be hampered by the lack of government authority and ongoing fighting in the region, leading to news reports about the efforts made to simply transport bodies out of the area, as well as disturbing claims of scavenging of passenger belongings by local residents. | ||
4 | Gaza Strip | 508,624 | The latest round of fighting between Israel and Hamas, part of a very long and complicated history of conflict, keeps this article on the list for the second straight week. The military operation is dubbed "Operation Protective Edge" though our article on the conflict is now filed under 2014 Israel–Gaza conflict. | ||
5 | Tom Lehrer | 507,403 | This American singer-songwriter, satirist, and mathematician was the subject of a very popular Reddit thread this week. As Reddit noticed, when Lehrer was asked at age 84 by hip-hop artist 2 Chainz for permission to sample a song he wrote 60 years ago, Lehrer responded: "As sole copyright owner of 'The Old Dope Peddler', I grant you motherfuckers permission to do this. Please give my regards to Mr. Chainz, or may I call him 2?" | ||
6 | 2014 Commonwealth Games | 487,610 | The 2014 edition of the Commonwealth Games kicked off on 23 July in Glasgow, Scotland, and will run through 3 August. Almost 5,000 athletes from 71 different nations and territories will be competing in 18 sports, including Lawn Bowls. | ||
7 | Land mine | 438,852 | Reddit also caused a huge spike in the popularity of this article on 25 July, when a "Today I Learned" thread noted that areas around landmines laid near the sea during the Falklands War (1982) have become favorite penguins sanctuaries, as penguins do not weigh enough to detonate the mines, and can breed free of human interference. The sanctuaries have proven so popular and lucrative for ecotourism that removal efforts have been opposed. | ||
8 | Deaths in 2014 | 408,553 | The list of deaths in the current year is always a popular article. In addition to James Garner (#1), deaths this week included (and this is a random sample, truly): Indian actor Kadhal Dhandapani (July 20), English female aviator and World War II military pilot Lettice Curtis (July 21), American football player Robert Newhouse (July 22), American swimmer and 1932 Olympics gold medal winner Helen Johns (July 23), South Korean vionlist Ik-Hwan Bae (July 24), American author Bel Kaufman (July 25), and Ukrainian mayor Oleh Babayev (July 26). | ||
9 | Israel | 396,605 | Up from #14 last week. As with #4, the latest round of fighting between Israel and Hamas is no doubt the cause of the popularity of this article this week. | ||
10 | Hamas | 396,081 | Up from #17 last week, giving the recent conflict three of the top ten spots this week. Sadly, this popularity, and the bloodshed causing it, is likely to continue. |
It took 396,081 views to make the Top 10 this week, down substantially from the 467,674 views needed last week. In the greater raw WP:5000 stats, 158 articles received over 100,000 views this week, with The Big Bang Theory (#158) the last to do so. William Shakespeare (#587) was the last to break 50,000 views; Los Angeles Lakers (#2239) last to hit 25,000; and United States Navy SEALs and Jazz tied for last (#4999) on the WP:5000, with 16,068 views.
Reader comments
Two featured articles were promoted this week.
Four featured lists were promoted this week.